Apple Silicon ML, without Python.

The only pure-Go binding to Apple's MLX. 24+ model architectures, training and inference, single static binary. Linux CUDA supported for selected workflows.

Pre-1.0. Source private today, available for review on request. Used in production by skiff for local inference.

What it does

Core MLX runtime — arrays, autograd, compile. Exposed as small Go packages (mlx, mlx/nn, mlx/compile).
Models and training — 24+ architectures across language, vision, multimodal. Full training, inference, and LoRA fine-tuning paths.
Quantization — AWQ, GPTQ, and DWQ. Run quantized 7B models on Apple Silicon at interactive speeds.
Production serving — HTTP server with graceful shutdown, bounded concurrency, health endpoints, and GPU trace capture.
cgo-free — Metal access through apple via purego. Single static binary. No C toolchain in the build.

mlx-go is the Go-native compute foundation for on-device inference on Apple Silicon. Built on apple's Metal bindings, profiled with gputrace, used by skiff for local inference. The structural claim: if MLX is Apple's standard ML runtime, then the only pure-Go binding to it becomes infrastructure any Go shop can ship without pulling in a Python runtime.

source private repo, available for review on request — tmc@tmc.dev

docs in progress

contact tmc@tmc.dev