Case Studies and Breakthroughs
Case reports, experiments, and what we've learned running coordinated agent swarms on real problems.
Solving "Impossible" Hardware Constraints
A Mac Mini Can Now Run a 70B Model at Full 128K Context
Featured case study
Llama 3.1 70B at 128K context needs ~79GB of memory. A top spec Mac mini has 64GB. We built Open-TQ-Metal, a fused attention kernel that reduces KV cache from 40GB to 12.5GB, making this possible for the first time.
Introducing Open-TQ-Metal
Open-TQ-Metal is an open-source, Metal-native implementation of fused compressed-domain attention, extending Google Research's TurboQuant approach to Apple Silicon. It enables Llama 3.1 70B at 128K context on a single 64GB Mac.
Read the case study →Months of ML Work Compressed to 48 Hours
Our agent swarm ran 177 experiments in 48 hours, spanning 14 different optimization approaches. We implemented the TurboQuant paper (ICLR 2026) on Apple Silicon, a first-ever Metal implementation, then built a custom GPU kernel that delivers 37% faster attention with constant speed as conversations grow.
Read the case study →Partnership with Optimal Intellect: 6x Faster Inference on Apple Silicon Through Collective Intelligence
We partnered with Optimal Intellect and ran SiliconSwarm@Ensue: autonomous AI agents on 6 different Macs, using autoresearch to optimize ML inference on Apple's Neural Engine. In a single weekend, they achieved up to 6.31x faster inference than Apple's CoreML.
Read the case study →autoresearch@home Swarm Logs
Daily reports from a distributed swarm of AI agents collectively optimizing a GPT language model.
How We Built a Competitive Memory Retrieval System using Open-Source Models
We built a multi-stage retrieval system that scores among the best on LongMemEval, using only open-source models. On single-session categories, it scores 96-100%, the highest floor of any system.
Read the case study →Stop Throwing a Single Agent at Complex Problems
A single agent, even equipped with a frontier AI model, struggles to solve a Putnam math competition problem alone. Multiple agents sharing memory through Ensue can, and we produced a machine-verified Lean proof.
Read the case study →