AGENT SWARMS FOR ML RESEARCH
Agent driven performance
breakthroughs in monthsdays.
Inference and model optimization driven by agent swarms, for ambitious teams.
SOLUTIONS
What we optimize.
01 · INFERENCE OPTIMIZATION
Make the model you already have faster.
Your model is fine. It's too slow, or too expensive, or both. A swarm searches the kernel and runtime space across Apple ANE, GPU kernels, quantization schemes, and compilation targets, and finds the speedup your team would take months to hunt down.
We don't retrain. We don't change the weights. We make the model you already shipped run dramatically faster on the hardware you need to run it on.
6.3×on Apple Neural Engine
02 · MODEL OPTIMIZATION
Train a better model than the one you have.
Your model's quality isn't good enough yet. A swarm runs thousands of coordinated training experiments across RL, fine-tuning, architecture search, and data curation, converging on a model that meaningfully beats your current baseline.
We don't tune one thing at a time. A swarm finds the combination of changes that a single researcher, working alone, would take six months to stumble into.
0 → 60%win rate, six days
CASE STUDIES · PUBLISHED BENCHMARKS
breakthroughs in the field.
10,000+ EXPERIMENTS · 280K MEMORIES · CONTINUOUS AGENT-DRIVEN BREAKTHROUGHS
open-tq-metal: fused attention for 70B at 128K
Open-source fused compressed-domain attention on Apple Silicon. Custom Metal kernels compress the KV cache from 40 GB to 12.5 GB, enabling Llama 3.1 70B at 128K context on a single 64 GB Mac - a configuration no other framework can reach.
inference kernel optimization on apple silicon
A coordinated swarm found kernel-level fusions that CoreML doesn't emit, reaching 6.3× faster inference than CoreML on Apple Neural Engine across six generations of Mac hardware.
distributed swarm autoresearch
A swarm of 115 agents collaborated across distributed GPUs, sharing every experiment and every finding through a collective memory network. 3,100 NanoGPT runs, each one compounding on the others.
HOW IT WORKS
your model, your agents, learning with your team.
We do the work. Your team learns alongside. Your data never leaves your infra.
Day 1
We build the eval with your team.
We define the metrics that matter and build the evaluation harness to measure them. The harness becomes the swarm's compass for every experiment that follows.
Every morning
Your agents found something last night.
Open your report over coffee. New strategies, architecture shifts, breakthroughs you wouldn't have thought to try. Just the feeling of watching something think in ways you wouldn't have.
Under the hood
We direct the swarm. We break the plateaus.
Every model hits plateaus. Every researcher has blind spots, assumptions so ingrained they're invisible. Our job is to notice when the swarm is bumping against yours, redirect it, and keep the discoveries coming. You'll read about what it found in tomorrow's report.
Overnight: 412 new runs across the swarm.
Best model: 0.68 (Run #1733, +0.07 over baseline)
What moved the needle
- · Switched attention pattern on layers 8 to 12 (+0.04)
- · Dropped lr from 3e-4 to 1.5e-4 at step 2400 (+0.02)
- · Added 3k examples from the retry set (+0.01)
What didn't work
- · 3 runs at higher depth collapsed. Abandoned.
Heading into today
- · Quantization sweep on Run #1733 (60 configs)
- · Scaling the new attention pattern to Run #1711
Questions or priorities? Reply to this email.
Deployment · Two paths
On our cloud
Your swarm, our infrastructure. Start the same day, ship the model when it's ready.