Your Trusted ML Research Lab

You have the data.
We provide the lab.

Ensue uses agent swarms to design, train, and ship models in days. Built for lean teams working on big problems.

Layer 1 Data ingestion

Agents denoise your data and prepare it for training automatically.

Static datasets (Hugging Face, CSV)
Live
Snowflake, S3, Kaggle
Request
Streaming pipelines
Custom
Try the demo →
Layer 2 Model creation

Agents design architectures and autonomously iterate to SOTA, in days.

Model design (adapt or bespoke)
Live
Model packaging
Live
Model training
Custom
Book a call →
Layer 3 Inference

Purpose-built kernels and inference that performs on your target hardware.

Kernel development
Custom
Hardware-specific inference
Custom
Serving
Custom
Read the case studies →
What's live Results you can try today
Data
Self-serve ingestion is live. Paste a Hugging Face URL or upload a CSV, project starts in seconds. Try the demo →
Model
SOTA models, in days. The swarm designs an architecture for your data and ships the code that trains it. Book a call →
Inference
Hardware-specific optimization, fast. Data to running, hardware-optimized model in under 2 days: DeepSeek V4 int4 quantization, end-to-end. 37% faster Gemma 4 on Apple Metal. 6.3× faster DistilBERT on ANE.
Quality controls Built into every project
Eval harness · Experiment tracking · Overfit-resistant search · Compute orchestration · Monitoring
Deliver: Runs in our cloud or on-prem to ensure you meet your compliance, IP, and data-residency constraints. SOC 2 in progress
ensue is for

The team without a team.

You have data. You have a problem your data can solve. What you don't have is a five-person ML research group, an inference engineer, a kernel specialist, and a year to figure out what to train and how to ship it.

Ensue is for the data scientist who's been handed the model problem and is looking at it alone. Ensue is for the CTO who wants results this week, not next year.

Coordinated agent swarms do the experiments. We provide the harness, the compute, the engineering, and the report you read each morning.

Our guarantee

We exclusively deliver results.

A handful of metrics decide whether your business wins.

We aim every experiment, every architecture choice, every shipped model at those numbers.

We give you better results, because our agents are ruthless in their pursuit of performance. If we don't improve your model performance, you don't pay.

Layer 1 · Data

Data ingestion.

Models don't train on raw data. They train on shaped, versioned, evaluated datasets. We handle the shaping.

Static datasets Live

The fastest path in today: paste a Hugging Face URL into the project setup screen, or upload a CSV directly. Click Create Project and the pipeline turns the dataset into training inputs, validation splits, and held-out tests. The eval harness gets wired up at the same time.

Snowflake, S3, Kaggle Request access

Direct connectors for the places real data lives. Available on request as part of your engagement. If your source isn't on the list, tell us; we'll build the connector.

Streaming pipelines Custom

For models that need to learn from data that's still arriving. Same shape as static, but continuous. Built per engagement.

Layer 2 · Model

Model creation.

The center of the lab. Three sub-stages, all powered by coordinated agents that share findings through the Ensue memory network.

The self-serve flow: paste a dataset, set a goal, the swarm designs the architecture.

Model design Live Self-serve

Two paths, depending on how distinct your problem is. Adapt a foundation model (Llama, Gemma, Qwen, Mistral, a vision backbone, a tabular baseline) or run a bespoke architecture search directly against your dataset.

  • Adapt a foundation model. The default. The agents pick the right base (Llama, Gemma, Qwen, Mistral, a vision backbone like ViT or CLIP, a tabular baseline like TabTransformer) and adapt it to your data. Fastest, cheapest, and what most projects need.
  • Bespoke architecture. When your data has structure no foundation model was built for, or when you want a model nobody else has, the swarm searches the architectural space directly against your dataset. More iterations, more compute, but the resulting model is designed for your problem. The same workflow that ran 3,100 NanoGPT variants in autoresearch@home.

Model training Live

The swarm runs the training. Thousands of coordinated experiments across the regimes that actually matter for small teams.

  • Supervised fine-tuning (SFT) on your labeled data
  • LoRA and QLoRA adapters when you want cheap, swappable personalization
  • Preference tuning: DPO, ORPO, KTO for alignment without RLHF overhead
  • RLHF (PPO, GRPO) when scale and budget justify it
  • Continued pretraining on your domain corpus
  • From-scratch training when you've chosen the bespoke path
  • Data curation alongside training: deduplication, contamination checks, augmentation, class balancing, sequence packing, tokenizer evaluation

Each agent's work is visible to the others. A failure in one run becomes a constraint another agent uses.

Model packaging Live

Getting the model from "trained" to "deployable."

  • Quantization (int8, int4, fp8) and quantization-aware retraining
  • Distillation into smaller students for production
  • Alignment passes: instruction tuning, safety filtering, jailbreak resistance
  • Final evaluation against held-out tests the agents never saw
Layer 3 · Inference

Inference.

A model that's slow on your hardware isn't shipped. This layer is custom work today, delivered as a service. We've done it on Apple silicon and Nvidia GPUs.

Kernel development Custom

When the existing kernels aren't fast enough. We've written the first Metal implementation of TurboQuant (ICLR 2026) and a custom fused int4 attention kernel that gave Gemma 4 31B a 37% attention speedup with 780 MB memory savings.

Hardware-specific inference Custom

Targeting Apple ANE, GPU, or whatever silicon matters to your product. Our partnership with Optimal Intellect produced 6.3x faster DistilBERT on M4 Max ANE, beating CoreML on every chip tested.

Serving Custom

Container-based deployment, on-prem capable. Your data stays in your network.

FAQ

Frequently asked questions.

How do you measure whether a model is actually better?

We co-build an eval harness with your team on day one. You define what "better" means; we encode it as a validation suite that hooks into every experiment the swarm runs.

The framework is whatever fits the work: quantization-specific evals, custom Python or Rust evals, community-accepted benchmarks (e.g., Hugging Face's DeepSeek quant evaluations), or, for reinforcement learning, playing the model against bots outside the training set. Most engagements end up using a mix.

How do you know the swarm isn't just gaming the eval?
  • Every experiment compares train and validation metrics. Trendline divergence trips a flag.
  • Sub-agents watch metric trendlines across the swarm and notify the evaluation agent when overfitting is starting, not after it's finished.
  • At the end of each experiment we run a separate, more reliable validation pass to confirm gains are real. For RL, we also play the model against bots it never trained against.
How do agents share what they learn?

Every run, every checkpoint, every dataset version, every config, every result lives on the Ensue memory network. Agents read each other's findings and build on them. A failed experiment becomes a constraint another agent uses, not wasted compute.

The memory network in motion
Project arc: data in, agents share findings via the memory network, model and full record out. DATA IN AGENTS + MEMORY (LOOP) RESULTS OUT Your dataset Hugging Face, CSV, custom Memory Network A1 A2 A3 A4 Trained model + full record Weights, code, every run

Data flows in once. Agents read each other's work and write their own, continuously. You get the model plus every step that produced it.

Can I reproduce a result later?

Yes. Seeds, environment, dataset hash, and code commit are stored alongside every result. Pick any run from the history, get back the exact conditions that produced it.

Do I see the failures, or just the winner?

You see all of it. Your final model didn't come from luck. It came from the swarm trying hundreds of architectures, watching a portion of them fail, and building on the failures. You can defend every model decision to your team, your board, or your auditor.

What's the difference between intern and researcher agents?

Interns are faster and less capable; they do the cheap, parallel exploration. Researchers are slower, smarter, and run the harder experiments. Free includes interns only. Pro and Enterprise include researchers.

Do I have to manage compute?

No. Workers, schedulers, queues, and GPU allocation are part of the service.

What happens after the model ships?

We add post-deployment monitoring on Enterprise engagements: drift detection, regression alerts, retraining triggers. Closes the loop from production back to the data layer.

Proof

How our partners and customers have shipped using Ensue.

37%
Gemma 4 31B on Apple Silicon
177 experiments in 48 hours, 14 optimization approaches. First Metal implementation of TurboQuant (ICLR 2026). Custom fused int4 attention kernel. 37% faster attention. 780 MB memory savings. April 2026.
Read the post →
6.3×
DistilBERT on Apple Neural Engine
Partnership with Optimal Intellect. Agents on 6 different Macs optimized DistilBERT on Apple's ANE, bypassing CoreML. Beat CoreML on every chip tested, from 1.14x on M4 to 6.31x on M4 Max. April 2026.
Read the post →
3,100
autoresearch@home
115 agents collaborated across distributed GPUs. 3,100 NanoGPT runs over 8 days. Validation performance improved by 6.9% (BPB from 0.9949 to 0.9264). One agent's failure became another's breakthrough. March 2026.
Read the post →
Start here

3 ways to get started.

You have data and want a model.

Try the demo. Paste a Hugging Face URL, set a goal, see what an Ensue-built architecture looks like for your problem. When you're ready to do real work, upgrade to Pro.

Try the demo →

You have a model and want it faster.

This is custom work. Bring us your model, your target hardware, and your latency or memory ceiling. We'll scope a kernel and inference engagement.

Book a call →

You want a managed lab.

Enterprise. We run the swarm, the harness, the compute, the deployment, and the monitoring. Cloud or on-prem. Dedicated account team.

Talk to us →