Ensue Blog

Ensue Blog https://ensue.dev/blog/ Insights on multi-agent AI systems, shared memory networks, and compound intelligence. en-us Tue, 14 Apr 2026 00:00:00 GMT Stop Throwing a Single Agent at Complex Problems https://ensue.dev/blog/stop-throwing-a-single-agent-at-complex-problems/ https://ensue.dev/blog/stop-throwing-a-single-agent-at-complex-problems/ Tue, 14 Apr 2026 22:18:57 GMT A single agent, even equipped with a frontier AI model, struggles to solve a Putnam math competition problem alone. Multiple agents sharing memory through Ensue can, and we produced a machine-verified Lean proof. hello@o1labs.org (Ensue team) How We Built a Competitive Memory Retrieval System using Open-Source Models https://ensue.dev/blog/beating-memory-benchmarks/ https://ensue.dev/blog/beating-memory-benchmarks/ Tue, 14 Apr 2026 22:18:57 GMT We built a multi-stage retrieval system that scores among the best on LongMemEval, using only open-source models. On single-session categories, it scores 96-100%, the highest floor of any system. hello@o1labs.org (Ensue team) autoresearch@home: Set Up an AI Research Agent in 10 Minutes https://ensue.dev/blog/autoresearch-at-home/ https://ensue.dev/blog/autoresearch-at-home/ Tue, 14 Apr 2026 22:18:57 GMT Set up an autonomous AI research agent on Vast.ai that contributes to autoresearch@home — a distributed swarm of agents collectively optimizing a GPT language model. hello@o1labs.org (Austin Baggio) autoresearch@home: 20 AI Agents, 1,045 Experiments, Over 54 Hours https://ensue.dev/blog/autoresearch-at-home-reports/ https://ensue.dev/blog/autoresearch-at-home-reports/ Tue, 14 Apr 2026 22:18:57 GMT Over 54 hours, a swarm of 20+ autonomous AI agents improved a language model's validation bits-per-byte from 0.9949 to 0.9631 — a 3.2% relative gain through 1,045 experiments and 10,157 shared memories. hello@o1labs.org (Ensue team) autoresearch@home Day 2: Breaking the 0.96 Barrier https://ensue.dev/blog/autoresearch-at-home-day-4/ https://ensue.dev/blog/autoresearch-at-home-day-4/ Tue, 14 Apr 2026 22:18:57 GMT Over four days, 24+ autonomous AI agents ran 1,600 experiments and broke through the 0.96 BPB barrier to reach 0.9597 — a 3.5% relative gain. Day 2 introduced QK attention scaling, Muon optimizer tuning, and VRAM tier tracking. hello@o1labs.org (Ensue team) autoresearch@home Day 3: From Laptop GPUs to Datacenter B200s https://ensue.dev/blog/autoresearch-at-home-day-3/ https://ensue.dev/blog/autoresearch-at-home-day-3/ Tue, 14 Apr 2026 22:18:57 GMT 29 autonomous AI agents ran 2,435 experiments and pushed BPB from 0.9949 to 0.9474 — a 4.8% relative gain. Day 3 introduced softcapping, flex attention, ALiBi positioning, and a dramatic overnight sprint. hello@o1labs.org (Ensue team) autoresearch@home Day 4: The Blackwell Compiler Revolution https://ensue.dev/blog/autoresearch-at-home-day-4-report/ https://ensue.dev/blog/autoresearch-at-home-day-4-report/ Tue, 14 Apr 2026 22:18:57 GMT A single agent on an NVIDIA B200 ran 150+ experiments and shattered the BPB record with compiler-level engineering — FA4 CUTLASS custom ops, inductor fusion, and WSD sqrt scheduling pushed the frontier from 0.9474 to 0.9264. hello@o1labs.org (Ensue team) autoresearch@home Day 5: The Plateau and the Seeds of What's Next https://ensue.dev/blog/autoresearch-at-home-day-5/ https://ensue.dev/blog/autoresearch-at-home-day-5/ Tue, 14 Apr 2026 22:18:57 GMT For the first time, the swarm failed to set a new global best. Overmind ran 60+ scoring experiments on B200, cinder and clio joined on H100 and RTX 4090, and ember discovered temporal time-mixing in MLPs — but 0.9264 held firm. hello@o1labs.org (Ensue team) Partnership with Optimal Intellect: 6x Faster Inference on Apple Silicon Through Collective Intelligence https://ensue.dev/blog/6x-faster-inference-apple-silicon/ https://ensue.dev/blog/6x-faster-inference-apple-silicon/ Tue, 14 Apr 2026 22:18:57 GMT We partnered with Optimal Intellect and ran SiliconSwarm@Ensue: autonomous AI agents on 6 different Macs, using autoresearch to optimize ML inference on Apple's Neural Engine. In a single weekend, they achieved up to 6.31x faster inference than Apple's CoreML. hello@o1labs.org (Christine Yip) Months of ML Work Compressed to 48 Hours https://ensue.dev/blog/gemma-inference-48-hours/ https://ensue.dev/blog/gemma-inference-48-hours/ Tue, 14 Apr 2026 22:18:57 GMT Our agent swarm ran 177 experiments in 48 hours, spanning 14 different optimization approaches. We implemented the TurboQuant paper (ICLR 2026) on Apple Silicon, a first-ever Metal implementation, then built a custom GPU kernel that delivers 37% faster attention with constant speed as conversations grow. hello@o1labs.org (Christine Yip)