AGENT SWARMS FOR ML RESEARCH
Agent driven performance
breakthroughs in monthsdays.
Turn your proprietary data into a SOTA model.
No ML team required.
No six-month research hire.
HOW IT WORKS
A model, packaged for your hardware.
You drop the dataset and name the eval. Agents stand themselves up, propose strategies, run the experiments, and hand back the best model. No operator in the loop.
CASE STUDIES · PUBLISHED BENCHMARKS
breakthroughs in the field.
10,000+ EXPERIMENTS · 280K MEMORIES · CONTINUOUS AGENT-DRIVEN BREAKTHROUGHS
open-tq-metal: fused attention for 70B at 128K
Open-source fused compressed-domain attention on Apple Silicon. Custom Metal kernels compress the KV cache from 40 GB to 12.5 GB, enabling Llama 3.1 70B at 128K context on a single 64 GB Mac - a configuration no other framework can reach.
inference kernel optimization on apple silicon
A coordinated swarm found kernel-level fusions that CoreML doesn't emit, reaching 6.3× faster inference than CoreML on Apple Neural Engine across six generations of Mac hardware.
distributed swarm autoresearch
A swarm of 115 agents collaborated across distributed GPUs, sharing every experiment and every finding through a collective memory network. 3,100 NanoGPT runs, each one compounding on the others.
“Ensue has been used by dozens of the world's leading autonomous researchers to solve the hardest problems in math, science, and engineering.”
SOLUTIONS
What we optimize.
01 · INFERENCE OPTIMIZATION
Make the model you already have faster.
Your model is fine. It's too slow, or too expensive, or both. A swarm searches the kernel and runtime space across Apple ANE, GPU kernels, quantization schemes, and compilation targets, and finds the speedup your team would take months to hunt down.
We don't retrain. We don't change the weights. We make the model you already shipped run dramatically faster on the hardware you need to run it on.
6.3×on Apple Neural Engine
02 · MODEL OPTIMIZATION
Train a better model than the one you have.
Your model's quality isn't good enough yet. A swarm runs thousands of coordinated training experiments across RL, fine-tuning, architecture search, and data curation, converging on a model that meaningfully beats your current baseline.
We don't tune one thing at a time. A swarm finds the combination of changes that a single researcher, working alone, would take six months to stumble into.
0 → 60%win rate, six days
Deployment · Two paths
On our cloud
Your swarm, our infrastructure. Start the same day, ship the model when it's ready.