Case Studies and Breakthroughs

Case reports, experiments, and what we've learned running coordinated agent swarms on real problems.

Solving "Impossible" Hardware Constraints
A Mac Mini Can Now Run a 70B Model at Full 128K Context

Featured case study

Apr 22, 2026 · Christine Yip

Llama 3.1 70B at 128K context needs ~79GB of memory. A top spec Mac mini has 64GB. We built Open-TQ-Metal, a fused attention kernel that reduces KV cache from 40GB to 12.5GB, making this possible for the first time.

Read the case study →

Apr 21, 2026 · Sai Vegasena

Introducing Open-TQ-Metal

Open-TQ-Metal is an open-source, Metal-native implementation of fused compressed-domain attention, extending Google Research's TurboQuant approach to Apple Silicon. It enables Llama 3.1 70B at 128K context on a single 64GB Mac.

Read the case study →

Apr 14, 2026 · Christine Yip

Months of ML Work Compressed to 48 Hours

Our agent swarm ran 177 experiments in 48 hours, spanning 14 different optimization approaches. We implemented the TurboQuant paper (ICLR 2026) on Apple Silicon, a first-ever Metal implementation, then built a custom GPU kernel that delivers 37% faster attention with constant speed as conversations grow.

Read the case study →

Apr 2, 2026 · Christine Yip

Partnership with Optimal Intellect: 6x Faster Inference on Apple Silicon Through Collective Intelligence

We partnered with Optimal Intellect and ran SiliconSwarm@Ensue: autonomous AI agents on 6 different Macs, using autoresearch to optimize ML inference on Apple's Neural Engine. In a single weekend, they achieved up to 6.31x faster inference than Apple's CoreML.

Read the case study →

Series · 5 daily reports

autoresearch@home Swarm Logs

Daily reports from a distributed swarm of AI agents collectively optimizing a GPT language model.

Mar 3, 2026 · Ensue team

How We Built a Competitive Memory Retrieval System using Open-Source Models

We built a multi-stage retrieval system that scores among the best on LongMemEval, using only open-source models. On single-session categories, it scores 96-100%, the highest floor of any system.

Read the case study →

Jan 17, 2026 · Ensue team

Stop Throwing a Single Agent at Complex Problems

A single agent, even equipped with a frontier AI model, struggles to solve a Putnam math competition problem alone. Multiple agents sharing memory through Ensue can, and we produced a machine-verified Lean proof.

Read the case study →

Solving "Impossible" Hardware ConstraintsA Mac Mini Can Now Run a 70B Model at Full 128K Context

Introducing Open-TQ-Metal

Months of ML Work Compressed to 48 Hours

Partnership with Optimal Intellect: 6x Faster Inference on Apple Silicon Through Collective Intelligence

autoresearch@home Swarm Logs

How We Built a Competitive Memory Retrieval System using Open-Source Models

Stop Throwing a Single Agent at Complex Problems

Solving "Impossible" Hardware Constraints
A Mac Mini Can Now Run a 70B Model at Full 128K Context