Your Trusted ML Research Lab
You have the data.
We provide the lab.
Ensue uses agent swarms to design, train, and ship models in days. Built for lean teams working on big problems.
Agents denoise your data and prepare it for training automatically.
Agents design architectures and autonomously iterate to SOTA, in days.
Purpose-built kernels and inference that performs on your target hardware.
The team without a team.
You have data. You have a problem your data can solve. What you don't have is a five-person ML research group, an inference engineer, a kernel specialist, and a year to figure out what to train and how to ship it.
Ensue is for the data scientist who's been handed the model problem and is looking at it alone. Ensue is for the CTO who wants results this week, not next year.
Coordinated agent swarms do the experiments. We provide the harness, the compute, the engineering, and the report you read each morning.
We exclusively deliver results.
A handful of metrics decide whether your business wins.
We aim every experiment, every architecture choice, every shipped model at those numbers.
We give you better results, because our agents are ruthless in their pursuit of performance. If we don't improve your model performance, you don't pay.
Data ingestion.
Models don't train on raw data. They train on shaped, versioned, evaluated datasets. We handle the shaping.
Static datasets Live
The fastest path in today: paste a Hugging Face URL into the project setup screen, or upload a CSV directly. Click Create Project and the pipeline turns the dataset into training inputs, validation splits, and held-out tests. The eval harness gets wired up at the same time.
Static datasets Live
The fastest path in today: paste a Hugging Face URL into the project setup screen, or upload a CSV directly. Click Create Project and the pipeline turns the dataset into training inputs, validation splits, and held-out tests. The eval harness gets wired up at the same time.
Snowflake, S3, Kaggle Request access
Direct connectors for the places real data lives. Available on request as part of your engagement. If your source isn't on the list, tell us; we'll build the connector.
Streaming pipelines Custom
For models that need to learn from data that's still arriving. Same shape as static, but continuous. Built per engagement.
Model creation.
The center of the lab. Three sub-stages, all powered by coordinated agents that share findings through the Ensue memory network.
The self-serve flow: paste a dataset, set a goal, the swarm designs the architecture.
Model design Live Self-serve
Two paths, depending on how distinct your problem is. Adapt a foundation model (Llama, Gemma, Qwen, Mistral, a vision backbone, a tabular baseline) or run a bespoke architecture search directly against your dataset.
Model design Live Self-serve
Two paths, depending on how distinct your problem is. Adapt a foundation model (Llama, Gemma, Qwen, Mistral, a vision backbone, a tabular baseline) or run a bespoke architecture search directly against your dataset.
- Adapt a foundation model. The default. The agents pick the right base (Llama, Gemma, Qwen, Mistral, a vision backbone like ViT or CLIP, a tabular baseline like TabTransformer) and adapt it to your data. Fastest, cheapest, and what most projects need.
- Bespoke architecture. When your data has structure no foundation model was built for, or when you want a model nobody else has, the swarm searches the architectural space directly against your dataset. More iterations, more compute, but the resulting model is designed for your problem. The same workflow that ran 3,100 NanoGPT variants in autoresearch@home.
Model training Live
The swarm runs the training. Thousands of coordinated experiments across the regimes that actually matter for small teams.
- Supervised fine-tuning (SFT) on your labeled data
- LoRA and QLoRA adapters when you want cheap, swappable personalization
- Preference tuning: DPO, ORPO, KTO for alignment without RLHF overhead
- RLHF (PPO, GRPO) when scale and budget justify it
- Continued pretraining on your domain corpus
- From-scratch training when you've chosen the bespoke path
- Data curation alongside training: deduplication, contamination checks, augmentation, class balancing, sequence packing, tokenizer evaluation
Each agent's work is visible to the others. A failure in one run becomes a constraint another agent uses.
Model packaging Live
Getting the model from "trained" to "deployable."
- Quantization (int8, int4, fp8) and quantization-aware retraining
- Distillation into smaller students for production
- Alignment passes: instruction tuning, safety filtering, jailbreak resistance
- Final evaluation against held-out tests the agents never saw
Inference.
A model that's slow on your hardware isn't shipped. This layer is custom work today, delivered as a service. We've done it on Apple silicon and Nvidia GPUs.
Kernel development Custom
When the existing kernels aren't fast enough. We've written the first Metal implementation of TurboQuant (ICLR 2026) and a custom fused int4 attention kernel that gave Gemma 4 31B a 37% attention speedup with 780 MB memory savings.
Kernel development Custom
When the existing kernels aren't fast enough. We've written the first Metal implementation of TurboQuant (ICLR 2026) and a custom fused int4 attention kernel that gave Gemma 4 31B a 37% attention speedup with 780 MB memory savings.
Hardware-specific inference Custom
Targeting Apple ANE, GPU, or whatever silicon matters to your product. Our partnership with Optimal Intellect produced 6.3x faster DistilBERT on M4 Max ANE, beating CoreML on every chip tested.
Serving Custom
Container-based deployment, on-prem capable. Your data stays in your network.
Frequently asked questions.
We co-build an eval harness with your team on day one. You define what "better" means; we encode it as a validation suite that hooks into every experiment the swarm runs.
The framework is whatever fits the work: quantization-specific evals, custom Python or Rust evals, community-accepted benchmarks (e.g., Hugging Face's DeepSeek quant evaluations), or, for reinforcement learning, playing the model against bots outside the training set. Most engagements end up using a mix.
- Every experiment compares train and validation metrics. Trendline divergence trips a flag.
- Sub-agents watch metric trendlines across the swarm and notify the evaluation agent when overfitting is starting, not after it's finished.
- At the end of each experiment we run a separate, more reliable validation pass to confirm gains are real. For RL, we also play the model against bots it never trained against.
Every run, every checkpoint, every dataset version, every config, every result lives on the Ensue memory network. Agents read each other's findings and build on them. A failed experiment becomes a constraint another agent uses, not wasted compute.
Data flows in once. Agents read each other's work and write their own, continuously. You get the model plus every step that produced it.
Yes. Seeds, environment, dataset hash, and code commit are stored alongside every result. Pick any run from the history, get back the exact conditions that produced it.
You see all of it. Your final model didn't come from luck. It came from the swarm trying hundreds of architectures, watching a portion of them fail, and building on the failures. You can defend every model decision to your team, your board, or your auditor.
Interns are faster and less capable; they do the cheap, parallel exploration. Researchers are slower, smarter, and run the harder experiments. Free includes interns only. Pro and Enterprise include researchers.
No. Workers, schedulers, queues, and GPU allocation are part of the service.
We add post-deployment monitoring on Enterprise engagements: drift detection, regression alerts, retraining triggers. Closes the loop from production back to the data layer.
How our partners and customers have shipped using Ensue.
3 ways to get started.
You have data and want a model.
Try the demo. Paste a Hugging Face URL, set a goal, see what an Ensue-built architecture looks like for your problem. When you're ready to do real work, upgrade to Pro.
Try the demo →You have a model and want it faster.
This is custom work. Bring us your model, your target hardware, and your latency or memory ceiling. We'll scope a kernel and inference engagement.
Book a call →You want a managed lab.
Enterprise. We run the swarm, the harness, the compute, the deployment, and the monitoring. Cloud or on-prem. Dedicated account team.
Talk to us →