MIT offshoot Liquid AI releases blueprint for enterprise-grade small-model training
In July 2025, Liquid AI, a startup founded by MIT computer scientists, launched its Liquid Foundation Models series 2 (LFM2), aiming to revolutionize on-device artificial intelligence. With a focus on delivering the fastest on-device models on the market, LFM2 utilizes a unique “liquid” architecture that enhances both training and inference efficiency. The initial release offered dense checkpoints at 350M, 700M, and 1.2B parameters, showcasing a hybrid architecture primarily based on gated short convolutions. Impressively, LFM2 outperformed competitors like Qwen3 and Llama 3.2 in quality and CPU throughput, making it clear that enterprises no longer need to sacrifice performance for real-time, privacy-preserving AI solutions on devices such as phones and laptops.
Since its launch, Liquid AI has expanded its offerings by introducing specialized variants for various tasks and domains, including a small video ingestion model and an edge-focused deployment stack known as LEAP. The recent publication of a comprehensive 51-page technical report on arXiv reveals the intricate details of the LFM2 architecture, including its training data mixture and post-training pipeline. Unlike previous open models, LFM2 provides a repeatable recipe that allows organizations to train their own small, efficient models tailored to specific hardware and deployment needs. This approach addresses the real-world constraints enterprises face, such as latency budgets and thermal limitations, ensuring that the models can function effectively in production environments.
The LFM2 models are designed with operational reliability in mind, utilizing a training pipeline that emphasizes structured learning rather than sheer scale. Key features include extensive pre-training, a decoupled knowledge distillation objective, and a three-stage post-training sequence aimed at improving instruction following and tool-use behaviors. This makes LFM2 models more practical for enterprise applications, enabling them to manage complex tasks and adhere to structured formats effectively. Furthermore, the introduction of multimodal variants, such as LFM2-VL for video and LFM2-Audio for audio processing, highlights Liquid AI’s commitment to creating efficient, on-device solutions that do not rely on cloud resources. As enterprises increasingly adopt hybrid AI architectures, LFM2 stands out as a foundational technology that empowers organizations to leverage both local and cloud-based AI capabilities without compromising performance or privacy.
When Liquid AI, a startup f
ounded by MIT computer scientists back in 2023
, introduced
its Liquid Foundation Models series 2 (LFM2) in July 2025
, the pitch was straightforward: deliver the fastest on-device foundation models on the market using the new “liquid” architecture, with training and inference efficiency that made small models a serious alternative to cloud-only large language models (LLMs) such as OpenAI’s GPT series and Google’s Gemini.
The initial release shipped dense checkpoints at 350M, 700M, and 1.2B parameters, a hybrid architecture heavily weighted toward gated short convolutions, and benchmark numbers that placed LFM2 ahead of similarly sized competitors like Qwen3, Llama 3.2, and Gemma 3 on both quality and CPU throughput. The message to enterprises was clear: real-time, privacy-preserving AI on phones, laptops, and vehicles no longer required sacrificing capability for latency.
In the months since that launch, Liquid has expanded LFM2 into a broader product line — adding
task-and-domain-specialized variants
, a
small video ingestion and analysis model
, and an
edge-focused deployment stack called LEAP
— and positioned the models as the control layer for on-device and on-prem agentic systems.
Now, with
the publication of the detailed, 51-page LFM2 technical report on arXiv
, the company is going a step further: making public the architecture search process, training data mixture, distillation objective, curriculum strategy, and post-training pipeline behind those models.
And unlike earlier open models, LFM2 is built around a repeatable recipe: a hardware-in-the-loop search process, a training curriculum that compensates for smaller parameter budgets, and a post-training pipeline tuned for instruction following and tool use.
Rather than just offering weights and an API, Liquid is effectively publishing a detailed blueprint that other organizations can use as a reference for training their own small, efficient models from scratch, tuned to their own hardware and deployment constraints.
A model family designed around real constraints, not GPU labs
The technical report begins with a premise enterprises are intimately familiar with: real AI systems hit limits long before benchmarks do. Latency budgets, peak memory ceilings, and thermal throttling define what can actually run in production—especially on laptops, tablets, commodity servers, and mobile devices.
To address this, Liquid AI performed architecture search directly on target hardware, including Snapdragon mobile SoCs and Ryzen laptop CPUs. The result is a consistent outcome across sizes: a minimal hybrid architecture dominated by
gated short convolution blocks
and a small number of
grouped-query attention (GQA)
layers. This design was repeatedly selected over more exotic linear-attention and SSM hybrids because it delivered a better quality-latency-memory Pareto profile under real device conditions.
This matters for enterprise teams in three ways:
Predictability.
The architecture is simple, parameter-efficient, and stable across model sizes from 350M to 2.6B.
Operational portability.
Dense and MoE variants share the same structural backbone, simplifying deployment across mixed hardware fleets.
On-device feasibility.
Prefill and decode throughput on CPUs surpass comparable open models by roughly 2Ă— in many cases, reducing the need to offload routine tasks to cloud inference endpoints.
Instead of optimizing for academic novelty, the report reads as a systematic attempt to design models enterprises can
actually ship.
This is notable and more practical for enterprises in a field where many open models quietly assume access to multi-H100 clusters during inference.
A training pipeline tuned for enterprise-relevant behavior
LFM2 adopts a training approach that compensates for the smaller scale of its models with structure rather than brute force. Key elements include:
10–12T token pre-training
and an additional
32K-context mid-training phase
, which extends the model’s useful context window without exploding compute costs.
A
decoupled Top-K knowledge distillation objective
that sidesteps the instability of standard KL distillation when teachers provide only partial logits.
A
three-stage post-training sequence
—SFT, length-normalized preference alignment, and model merging—designed to produce more reliable instruction following and tool-use behavior.
For enterprise AI developers, the significance is that LFM2 models behave less like “tiny LLMs” and more like practical agents able to follow structured formats, adhere to JSON schemas, and manage multi-turn chat flows. Many open models at similar sizes fail not due to lack of reasoning ability, but due to brittle adherence to instruction templates. The LFM2 post-training recipe directly targets these rough edges.
In other words: Liquid AI optimized small models for
operational reliability
, not just scoreboards.
Multimodality designed for device constraints, not lab demos
The LFM2-VL and LFM2-Audio variants reflect another shift: multimodality built around
token efficiency
.
Rather than embedding a massive vision transformer directly into an LLM, LFM2-VL attaches a SigLIP2 encoder through a connector that aggressively reduces visual token count via PixelUnshuffle. High-resolution inputs automatically trigger dynamic tiling, keeping token budgets controllable even on mobile hardware. LFM2-Audio uses a bifurcated audio path—one for embeddings, one for generation—supporting real-time transcription or speech-to-speech on modest CPUs.
For enterprise platform architects, this design points toward a practical future where:
document understanding happens directly on endpoints such as field devices;
audio transcription and speech agents run locally for privacy compliance;
multimodal agents operate within fixed latency envelopes without streaming data off-device.
The through-line is the same: multimodal capability without requiring a GPU farm.
Retrieval models built for agent systems, not legacy search
LFM2-ColBERT extends late-interaction retrieval into a footprint small enough for enterprise deployments that need multilingual RAG without the overhead of specialized vector DB accelerators.
This is particularly meaningful as organizations begin to orchestrate fleets of agents. Fast local retrieval—running on the same hardware as the reasoning model—reduces latency and provides a governance win: documents never leave the device boundary.
Taken together, the VL, Audio, and ColBERT variants show LFM2 as a modular system, not a single model drop.
The emerging blueprint for hybrid enterprise AI architectures
Across all variants, the LFM2 report implicitly sketches what tomorrow’s enterprise AI stack will look like:
hybrid local-cloud orchestration
, where small, fast models operating on devices handle time-critical perception, formatting, tool invocation, and judgment tasks, while larger models in the cloud offer heavyweight reasoning when needed.
Several trends converge here:
Cost control.
Running routine inference locally avoids unpredictable cloud billing.
Latency determinism.
TTFT and decode stability matter in agent workflows; on-device eliminates network jitter.
Governance and compliance.
Local execution simplifies PII handling, data residency, and auditability.
Resilience.
Agentic systems degrade gracefully if the cloud path becomes unavailable.
Enterprises adopting these architectures will likely treat small on-device models as the “control plane” of agentic workflows, with large cloud models serving as on-demand accelerators.
LFM2 is one of the clearest open-source foundations for that control layer to date.
The strategic takeaway: on-device AI is now a design choice, not a compromise
For years, organizations building AI features have accepted that “real AI” requires cloud inference. LFM2 challenges that assumption. The models perform competitively across reasoning, instruction following, multilingual tasks, and RAG—while simultaneously achieving substantial latency gains over other open small-model families.
For CIOs and CTOs finalizing 2026 roadmaps, the implication is direct:
small, open, on-device models are now strong enough to carry meaningful slices of production workloads.
LFM2 will not replace frontier cloud models for frontier-scale reasoning. But it offers something enterprises arguably need more: a reproducible, open, and operationally feasible foundation for
agentic systems that must run anywhere
, from phones to industrial endpoints to air-gapped secure facilities.
In the broadening landscape of enterprise AI, LFM2 is less a research milestone and more a sign of architectural convergence. The future is not cloud or edge—it’s both, operating in concert. And releases like LFM2 provide the building blocks for organizations prepared to build that hybrid future intentionally rather than accidentally.