AGENT SWARMS FOR ML RESEARCH
Agent driven performance
breakthroughs in
months
days.
Inference and model optimization driven by agent swarms, for ambitious teams.
SOLUTIONS
What we optimize.
01 · INFERENCE OPTIMIZATION
Make the model you already have faster.
Your model is fine. It's too slow, or too expensive, or both. A swarm searches the kernel and runtime space across Apple ANE, GPU kernels, quantization schemes, and compilation targets, and finds the speedup your team would take months to hunt down.
We don't retrain. We don't change the weights. We make the model you already shipped run dramatically faster on the hardware you need to run it on.
6.3×on Apple Neural Engine
02 · MODEL OPTIMIZATION
Train a better model than the one you have.
Your model's quality isn't good enough yet. A swarm runs thousands of coordinated training experiments across RL, fine-tuning, architecture search, and data curation, converging on a model that meaningfully beats your current baseline.
We don't tune one thing at a time. A swarm finds the combination of changes that a single researcher, working alone, would take six months to stumble into.
0 → 60%win rate, six days
CASE STUDIES · PUBLISHED BENCHMARKS
breakthroughs in the field.
10,000+ EXPERIMENTS · 280K MEMORIES · CONTINUOUS AGENT-DRIVEN BREAKTHROUGHS
1st turboquant metal kernel for gemma 4
Days after Google shipped Gemma 4 31B, the #3 open model in the world, an Ensue swarm wrote the first TurboQuant Metal kernel targeting Gemma 4 on Apple Silicon. Gemma 4 31B now runs locally on a 64GB MacBook Pro at 15 tokens per second, 17.5 GB peak memory, and 256K context via 3-bit KV cache compression.
Gemma 4 31B on consumer hardware
inference kernel optimization on apple silicon
A coordinated swarm found kernel-level fusions that CoreML doesn't emit, reaching 6.3× faster inference than CoreML on Apple Neural Engine across six generations of Mac hardware.
distributed swarm autoresearch
A swarm of 115 agents collaborated across distributed GPUs, sharing every experiment and every finding through a collective memory network. 3,100 NanoGPT runs, each one compounding on the others.
HOW IT WORKS
your model, your agents, learning with your team.
We do the work. Your team learns alongside. Your data never leaves your infra.
Day 1
We build the eval with your team.
We define the metrics that matter and build the evaluation harness to measure them. The harness becomes the swarm's compass for every experiment that follows.
Every morning
Your agents found something last night.
Open your report over coffee. New strategies, architecture shifts, breakthroughs you wouldn't have thought to try. Just the feeling of watching something think in ways you wouldn't have.
Under the hood
We direct the swarm. We break the plateaus.
Every model hits plateaus. Every researcher has blind spots, assumptions so ingrained they're invisible. Our job is to notice when the swarm is bumping against yours, redirect it, and keep the discoveries coming. You'll read about what it found in tomorrow's report.
Overnight: 412 new runs across the swarm.
Best model: 0.68 (Run #1733, +0.07 over baseline)
What moved the needle
- · Switched attention pattern on layers 8 to 12 (+0.04)
- · Dropped lr from 3e-4 to 1.5e-4 at step 2400 (+0.02)
- · Added 3k examples from the retry set (+0.01)
What didn't work
- · 3 runs at higher depth collapsed. Abandoned.
Heading into today
- · Quantization sweep on Run #1733 (60 configs)
- · Scaling the new attention pattern to Run #1711
Questions or priorities? Reply to this email.
Deployment · Two paths
On our cloud
Your swarm, our infrastructure. Start the same day, ship the model when it's ready.