Apple Silicon ML, without Python.
The only pure-Go binding to Apple's MLX. 24+ model architectures, training and inference, single static binary. Linux CUDA supported for selected workflows.
Pre-1.0. Source private today, available for review on request. Used in production by skiff for local inference.
What it does- Core MLX runtime — arrays, autograd, compile. Exposed as small Go packages (
mlx,mlx/nn,mlx/compile). - Models and training — 24+ architectures across language, vision, multimodal. Full training, inference, and LoRA fine-tuning paths.
- Quantization — AWQ, GPTQ, and DWQ. Run quantized 7B models on Apple Silicon at interactive speeds.
- Production serving — HTTP server with graceful shutdown, bounded concurrency, health endpoints, and GPU trace capture.
- cgo-free — Metal access through apple via
purego. Single static binary. No C toolchain in the build.
mlx-go is the Go-native compute foundation for on-device inference on Apple Silicon. Built on apple's Metal bindings, profiled with gputrace, used by skiff for local inference. The structural claim: if MLX is Apple's standard ML runtime, then the only pure-Go binding to it becomes infrastructure any Go shop can ship without pulling in a Python runtime.
source private repo, available for review on request — tmc@tmc.dev
docs in progress
contact tmc@tmc.dev