Bringing AI to the Edge
We focus on deploying AI models directly on consumer devices — making AI faster, private, and freely accessible to everyone.
Per-layer weight streaming from NVMe storage enables models exceeding device memory to run on iPad and iPhone. 88% peak memory reduction with verified bandwidth scaling.
306-run empirical study on Qwen3.5-35B-A3B MoE with Apple M2 Ultra. SD provides 1.18–1.30× speedup despite <4% acceptance through batch verification amortization of memory bandwidth.
Systematic benchmarking of 7 GGUF quantization levels and speculative decoding for Qwen3.5 on three Apple Silicon machines, establishing Q6_K as the Pareto-optimal choice and a ≥2.5× draft/target speed ratio as the SD viability rule.
Integrating finance, diet, mood, and reading data entirely on consumer Apple Silicon, producing emergent cross-domain insights with zero data leakage.
Fused matrix-vector kernels enabling concurrent ANE batch prefill + GPU decode on Apple Silicon for Qwen3.5 models.
Benchmarking CoreML ANE prefill + MLX GPU decode for Qwen3.5 on Apple Silicon, with four inference strategies compared.
Native Swift implementation of Qwen3 TTS 0.6B for real-time, on-device speech synthesis.
Multi-stage compression pipeline for deploying Gemma 3 4B VLM on consumer hardware.
Exploring memory optimization techniques for the MLX framework on Apple Silicon.
AtomGradient is an independent research institution building the future of on-device AI. We conduct novel research in model compression, hardware-aware inference, and personal data integration — then ship those breakthroughs as free products that run entirely on your devices.
We believe intelligence belongs at the edge. Every model we build runs locally. Every product we ship is free. Every byte of your data stays on your device.