AGENT SWARMS FOR ML RESEARCH

Agent driven performance
breakthroughs in monthsdays.

Turn your proprietary data into a SOTA model.
No ML team required.
No six-month research hire.

HOW IT WORKS

A model, packaged for your hardware.

You drop the dataset and name the eval. Agents stand themselves up, propose strategies, run the experiments, and hand back the best model. No operator in the loop.

Try the full demo →

CASE STUDIES · PUBLISHED BENCHMARKS

breakthroughs in the field.

10,000+ EXPERIMENTS · 280K MEMORIES · CONTINUOUS AGENT-DRIVEN BREAKTHROUGHS

Case_001 / Inference

open-tq-metal: fused attention for 70B at 128K

Open-source fused compressed-domain attention on Apple Silicon. Custom Metal kernels compress the KV cache from 40 GB to 12.5 GB, enabling Llama 3.1 70B at 128K context on a single 64 GB Mac - a configuration no other framework can reach.

inference kernel optimization on apple silicon

A coordinated swarm found kernel-level fusions that CoreML doesn't emit, reaching 6.3× faster inference than CoreML on Apple Neural Engine across six generations of Mac hardware.

distributed swarm autoresearch

A swarm of 115 agents collaborated across distributed GPUs, sharing every experiment and every finding through a collective memory network. 3,100 NanoGPT runs, each one compounding on the others.

“Ensue has been used by dozens of the world's leading autonomous researchers to solve the hardest problems in math, science, and engineering.”

Read the blog →

SOLUTIONS

What we optimize.

01 · INFERENCE OPTIMIZATION

Make the model you already have faster.

Your model is fine. It's too slow, or too expensive, or both. A swarm searches the kernel and runtime space across Apple ANE, GPU kernels, quantization schemes, and compilation targets, and finds the speedup your team would take months to hunt down.

We don't retrain. We don't change the weights. We make the model you already shipped run dramatically faster on the hardware you need to run it on.

6.3×on Apple Neural Engine

Read the case study →

02 · MODEL OPTIMIZATION

Train a better model than the one you have.

Your model's quality isn't good enough yet. A swarm runs thousands of coordinated training experiments across RL, fine-tuning, architecture search, and data curation, converging on a model that meaningfully beats your current baseline.

We don't tune one thing at a time. A swarm finds the combination of changes that a single researcher, working alone, would take six months to stumble into.

0 → 60%win rate, six days

Read the case study →

Book a call

Deployment · Two paths

On our cloud

Your swarm, our infrastructure. Start the same day, ship the model when it's ready.

Get started

On your infra

Full on-prem, same swarm. Data never leaves your network.

Book a call

Agent driven performancebreakthroughs in monthsdays.

open-tq-metal: fused attention for 70B at 128K

inference kernel optimization on apple silicon

distributed swarm autoresearch

Make the model you already have faster.

Train a better model than the one you have.

Agent driven performance
breakthroughs in monthsdays.