US Tech & AI

Ai2’s Olmo 3 family challenges Qwen and Llama with efficient, open reasoning and customization

By Eric November 21, 2025

The Allen Institute for AI (Ai2) has recently unveiled the latest version of its Olmo family of large language models, Olmo 3, in response to the growing demand for customized AI solutions and transparency in model training. This new release emphasizes openness and adaptability, allowing organizations to utilize models that align closely with their specific needs. Olmo 3 comes in three distinct versions: Olmo 3-Think, which is designed for advanced reasoning and operates in both 7B and 32B parameters; Olmo 3-Base, optimized for tasks such as programming and long-context reasoning; and Olmo 3-Instruct, tailored for instruction-following and multi-turn dialogues. Notably, Olmo 3-Think is being touted as the “first-ever fully open 32B thinking model,” capable of generating explicit reasoning-chain-style content, and features an impressive context window of 65,000 tokens, making it ideal for complex projects and extensive documentation.

In a recent interview, Noah Smith, Ai2’s senior director of NLP research, highlighted the importance of transparency in AI, particularly for organizations operating in regulated environments. He noted that clients are increasingly seeking models that provide assurances about their training data and methodologies, a stark contrast to some competitors who have faced criticism for lack of transparency. By offering open-source models under the Apache 2.0 license, Ai2 ensures that enterprises can fully grasp the training processes behind Olmo 3, including the ability to retrain the model with proprietary data. This flexibility allows businesses to tailor the AI to meet specific queries, enhancing its utility and effectiveness. The Olmo 3 models have been pre-trained on the extensive Dolma 3 dataset, which comprises six trillion tokens from diverse sources, including web data and scientific literature, specifically optimizing for coding tasks.

Ai2 asserts that Olmo 3 represents a significant advancement in the realm of open-source large language models, particularly when compared to other models developed outside of China. The efficiency of the base Olmo 3 model is noteworthy, as it reportedly utilizes 2.5 times less compute power per token than its predecessors, resulting in lower energy consumption and costs. Preliminary benchmarks suggest that Olmo 3 outperforms various other open-source models, including those from Stanford and LLM360, although specific performance metrics were not disclosed. The release of Olmo 3 not only reinforces Ai2’s commitment to transparency and customization but also positions it as a competitive player in the rapidly evolving AI landscape, particularly for enterprises looking for reliable and adaptable AI solutions. Developers can access these models through platforms like Hugging Face and the Ai2 Playground, further enhancing the model’s reach and applicability across different sectors.

The
Allen Institute for AI (Ai2)
hopes to take advantage of an increased demand for customized models and enterprises seeking more transparency from AI models with its latest release.
Ai2 made the latest addition to its Olmo family of large language models available to organizations, continuing to focus on openness and customization.
Olmo 3 has a longer context window, more reasoning traces and is better at coding than its previous iteration. This latest version, like the other Olmo releases, is open-sourced under the Apache 2.0 license. Enterprises will have complete transparency into and control over the training data and checkpointing.
Ai2 will release three versions of Olmo 3:
Olmo 3- Think in both 7B and 32B are considered the flagship reasoning models for advanced research
Olmo 3- Base also in both parameters, which is ideal for programming, comprehension, math and long-context reasoning. Ai2 said this version is “ideal for continued pre-training or fine-tuning
Olmo 3-Instruct in 7B that is optimized for instruction following, multi-turn dialogue and tool use
The company said Olmo 3- Think is the “first-ever fully open 32B thinking model that generates explicit reasoning-chain-style content.” Olmo-3 Think also has a long context window of 65,000 tokens, perfect for longer-running agentic projects or reasoning over longer documents.
Noah Smith, Ai2’s senior director of NLP research, told VentureBeat in an interview that many of its customers, from regulated enterprises to research institutions, want to use models that give them assurance about what went into the training.
“The releases from our friends in the tech world are very cool and super exciting, but there are a lot of people for whom data privacy control over what goes into the model, how the models train and other constraints on how the model can be used as front of mind,” said Smith.
Developers can access the models on Hugging Face and the Ai2 Playground.
Transparency and customization
Smith said models like Olmo 3, which the company believes any organization using its models has to have control over and mold in the way that best works for them.
“We don’t believe in one-size-fits-all solutions,” Smith said. It’s a known thing in the world of machine learning that if you try and build a model that solves all the problems, it ends up not being really the best model for any one problem. There aren’t formal proofs of that, but it’s a thing that old timers like me have kind of observed.”
He added that models with the ability to specialize “are maybe not as flash as getting high scores on math exams” but offer more flexibility for enterprises.
Olmo 3 allows enterprises to essentially retrain the model by adding to the data mix it learns from. The idea is that businesses can bring in their proprietary sources to guide the model in answering specific company queries. To help enterprises during this process, Ai2 added checkpoints from every major training phase.
Demand for model customization has grown as enterprises that cannot build their own LLMs want to create company-specific or industry-focused models. Startups like
Arcee
have
begun offering
enterprise-focused, customizable small models.
Models like Olmo 3, Smith said, also give enterprises more confidence in the technology. Since Olmo 3 provides the training data, Smith said enterprises can trust that the model did not ingest anything it shouldn’t have.
Ai2 has always claimed to be committed to greater transparency, even launching a tool called
OlmoTrace in April
that can track a model’s output directly back to the original training data. The company releases open-sourced models and posts its code to repositories like GitHub for anyone to use.
Competitors like Google and OpenAI have
faced criticism from developers
over moves that hid raw reasoning tokens and chose to summarize reasoning, claiming that they now resort to “debugging blind” without transparency.
Ai2 pretrained Olmo 3 on the six-trillion-token open source dataset, Dolma 3. The dataset encompasses web data, scientific literature and code. Smith said they optimized Olmo 3 for code, compared to the focus on math for Olmo 2.
How it stacks up
Ai2 claims that the Olmo 3 family of models represents a significant leap for truly open-source models, at least for open-source LLMs developed outside China. The base Olmo 3 model trained “with roughly 2.5x greater compute efficiency as measured by GPU-hours per token,” meaning it consumed less energy during pre-training and costs less.
The company said the Olmo 3 models outperformed other open models, such as Marin from Stanford, LLM360’s K2, and Apertus, though Ai2 did not provide figures for the benchmark testing.
“Of note, Olmo 3-Think (32B) is the strongest fully open reasoning model, narrowing the gap to the best open-weight models of similar scale, such as the Qwen 3-32B-Thinking series of models across our suite of reasoning benchmarks, all while being trained on 6x fewer tokens,” Ai2 said in a press release.
The company added that Olmo 3-Instruct performed better than Qwen 2.5, Gemma 3 and Llama 3.1.

Ai2’s Olmo 3 family challenges Qwen and Llama with efficient, open reasoning and customization

Related Articles

The best smart rings for tracking sleep and health

Creating a glass box: How NetSuite is engineering trust into AI

EU investigates Google over AI-generated summaries in search results