Baidu unveils proprietary ERNIE 5 beating GPT-5 performance on charts, document understanding and more
In a significant move to solidify its position in the global AI landscape, Baidu has unveiled its next-generation foundation model, ERNIE 5.0, just hours after OpenAI released an update to its GPT-5 model. Announced during the Baidu World 2025 event, ERNIE 5.0 boasts a proprietary, natively omni-modal architecture that can seamlessly process and generate content across multiple formats, including text, images, audio, and video. This release is part of Baidu’s broader strategy to expand its AI offerings beyond China and compete with leading models from Western tech giants like OpenAI and Google. Unlike its predecessor, ERNIE 4.5, which was open-source, ERNIE 5.0 is available exclusively through Baidu’s ERNIE Bot website and the Qianfan cloud platform, emphasizing its premium capabilities aimed at enterprise customers.
Baidu claims that ERNIE 5.0 has achieved benchmark performance that matches or surpasses that of GPT-5 and Google’s Gemini 2.5 Pro in various critical tasks, including multimodal reasoning and document understanding. For instance, it reportedly outperformed these competitors in benchmarks that assess document recognition and comprehension, which are essential for applications like automated document processing and financial analysis. Additionally, in visual tasks, ERNIE 5.0 demonstrated superior performance on benchmarks such as OCRBench and DocVQA, suggesting a strong capability in handling complex visual data. The model also features a variant optimized for text-intensive tasks, ERNIE 5.0 Preview 1022, which has shown particularly promising results in early developer access, especially in Chinese-language performance. This strategic focus on multimodal integration positions ERNIE 5.0 as a formidable player in the foundation model arena, with Baidu emphasizing its ability to deliver a comprehensive, native modeling architecture that enhances productivity across various applications.
As part of its global expansion efforts, Baidu is not only rolling out ERNIE 5.0 but also enhancing its suite of AI products, including the GenFlow 3.0 AI agent and the international launch of its no-code platform, MeDo. The company is also pushing its digital human technology, which has seen significant adoption in China, into international markets. With competitive pricing strategies that position ERNIE 5.0 at the premium end of the market, Baidu aims to attract enterprise clients while also appealing to developers through its open-source offerings. However, the company faces scrutiny regarding the accuracy of its performance claims, and independent verification of its benchmarks will be crucial in establishing its credibility in the competitive AI landscape. Baidu’s dual approach—offering both high-end proprietary models and accessible open-source alternatives—reflects its ambition to become a leading global provider of AI infrastructure, catering to the diverse needs of businesses and developers alike.
Mere hours after OpenAI updated its flagship foundation model
GPT-5 to GPT-5.1
, promising reduced token usage overall and a more pleasant personality with more preset options, Chinese search giant
Baidu unveiled its next-generation foundation model, ERNIE 5.0,
alongside a suite of AI product upgrades and strategic international expansions.
The goal: to position as a global contender in the increasingly competitive enterprise AI market.
Announced at the company’s Baidu World 2025 event, ERNIE 5.0 is a proprietary, natively omni-modal model designed to jointly process and generate content across text, images, audio, and video.
Unlike Baidu’s recently released
ERNIE-4.5-VL-28B-A3B-Thinking
, which is open source under an enterprise-friendly and permissive Apache 2.0 license, ERNIE 5.0 is a proprietary model and is available only via
Baidu’s ERNIE Bot
website (I needed to select it manuallyu from the model picker dropdown) and the
Qianfan cloud platform application programming interface (API) for enterprise customers.
Alongside the model launch, Baidu introduced major updates to its digital human platform, no-code tools, and general-purpose AI agents — all targeted at expanding its AI footprint beyond China.
The company also introduced ERNIE 5.0 Preview 1022, a variant optimized for text-intensive tasks, alongside the general preview model that balances across modalities.
Baidu emphasized that ERNIE 5.0 represents a shift in how intelligence is deployed at scale, with CEO Robin Li stating: “When you internalize AI, it becomes a native capability and transforms intelligence from a cost into a source of productivity.”
Where ERNIE 5.0 outshines GPT-5 and Gemini 2.5 Pro
ERNIE 5.0’s benchmark results suggest that Baidu has achieved parity—or near-parity—with the top Western foundation models across a wide spectrum of tasks.
In public benchmark slides shared during the Baidu World 2025 event, ERNIE 5.0 Preview outperformed or matched OpenAI’s GPT-5-High and Google’s Gemini 2.5 Pro in
multimodal reasoning, document understanding, and image-based QA
, while also
demonstrating strong language modeling and code execution abilities.
The company emphasized its ability to handle joint inputs and outputs across modalities, rather than relying on post-hoc modality fusion, which it framed as a technical differentiator.
On visual tasks, ERNIE 5.0 achieved leading scores on OCRBench, DocVQA, and ChartQA, three benchmarks that test document recognition, comprehension, and structured data reasoning.
Baidu claims the model beat both GPT-5-High and Gemini 2.5 Pro on these document and chart-based benchmarks, areas it describes as core to enterprise applications like automated document processing and financial analysis.
In image generation, ERNIE 5.0 tied or exceeded Google’s Veo3 across categories including semantic alignment and image quality, according to Baidu’s internal GenEval-based evaluation. Baidu claimed that the model’s multimodal integration allows it to generate and interpret visual content with greater contextual awareness than models relying on modality-specific encoders.
For audio and speech tasks, ERNIE 5.0 demonstrated competitive results on MM-AU and TUT2017 audio understanding benchmarks, as well as question answering from spoken language inputs. Its audio performance, while not as heavily emphasized as vision or text, suggests a broad capability footprint intended to support full-spectrum multimodal applications.
In language tasks, the model showed strong results on instruction following, factual question answering, and mathematical reasoning—core areas that define the enterprise utility of large language models.
The Preview 1022 variant of ERNIE 5.0, tailored for textual performance, showed even stronger language-specific results in early developer access. While Baidu does not claim broad superiority in general language reasoning, its internal evaluations suggest that ERNIE 5.0 Preview 1022 closes the gap with top-tier English-language models and outperforms them in Chinese-language performance.
While Baidu did not release full benchmark details or raw scores publicly, its performance positioning suggests a deliberate attempt to frame ERNIE 5.0 not as a niche multimodal system but as a flagship model competitive with the largest closed models in general-purpose reasoning.
Where Baidu claims a clear lead is in structured document understanding, visual chart reasoning, and integration of multiple modalities into a single, native modeling architecture
. Independent verification of these results remains pending, but the breadth of claimed capabilities positions ERNIE 5.0 as a serious alternative in the multimodal foundation model landscape.
Enterprise Pricing Strategy
ERNIE 5.0 is positioned at the
premium end
of Baidu’s model pricing structure. The company has released specific pricing for API usage on its Qianfan platform, aligning the cost with other top-tier offerings from Chinese competitors like Alibaba.
Model
Input Cost (per 1K tokens)
Output Cost (per 1K tokens)
Source
ERNIE 5.0
$0.00085 (¥0.006)
$0.0034 (¥0.024)
Qianfan
ERNIE 4.5 Turbo (ex.)
$0.00011 (¥0.0008)
$0.00045 (¥0.0032)
Qianfan
Qwen3 (Coder ex.)
$0.00085 (¥0.006)
$0.0034 (¥0.024)
Qianfan
The contrast in cost between ERNIE 5.0 and earlier models such as ERNIE 4.5 Turbo underscores Baidu’s strategy to differentiate between high-volume, low-cost models and high-capability models designed for complex tasks and multimodal reasoning.
Compared to other U.S. alternatives, it remains mid-range in pricing:
Model
Input (/1 M tokens)
Output (/1 M tokens)
Source
GPT-5.1
$1.25
$10.00
OpenAI
ERNIE 5.0
$0.85
$3.40
Qianfan
ERNIE 4.5 Turbo (ex.)
$0.11
$0.45
Qianfan
Claude Opus 4.1
$15.00
$75.00
Anthropic
Gemini 2.5 Pro
$1.25 (≤200k) / $2.50 (>200k)
$10.00 (≤200k) / $15.00 (>200k)
Google Vertex AI Pricing
Grok 4 (grok-4-0709)
$3.00
$15.00
xAI API
Global Expansion: Products and Platforms
In tandem with the model release, Baidu is expanding internationally:
GenFlow 3.0
, now with 20M+ users, is the company’s largest general-purpose AI agent and features enhanced memory and multimodal task handling.
Famou
, a self-evolving agent capable of dynamically solving complex problems, is now commercially available via invite.
MeDo
, the international version of Baidu’s no-code builder Miaoda, is live globally via
medo.dev
.
Oreate
, a productivity workspace with document, slide, image, video, and podcast support, has reached over 1.2M users worldwide.
Baidu’s digital human platform, already rolled out in Brazil, is also part of the global push. According to company data, 83% of livestreamers during this year’s “Double 11” shopping event in China used Baidu’s digital human tech, contributing to a 91% increase in GMV.
Meanwhile, Baidu’s autonomous ride-hailing service Apollo Go has surpassed 17 million rides, operating driverless fleets in 22 cities and claiming the title of the world’s largest robotaxi network.
Open-Source Vision-Language Model Garners Industry Attention
Two days before the flagship ERNIE 5.0 event, Baidu also released an open-source multimodal model under the Apache 2.0 license:
ERNIE-4.5-VL-28B-A3B-Thinking
.
As
reported by my colleague Michael Nuñez at VentureBeat
, the model activates just 3 billion parameters while maintaining a total of 28 billion, using a Mixture-of-Experts (MoE) architecture for efficient inference.
Key technical innovations include:
“Thinking with Images”, which enables dynamic zoom-based visual analysis
Support for chart interpretation, document understanding, visual grounding, and temporal awareness in video
Runtime on a single 80GB GPU, making it accessible to mid-sized organizations
Full compatibility with Transformers, vLLM, and Baidu’s FastDeploy toolkits
This release adds pressure on closed-source competitors. With Apache 2.0 licensing, ERNIE-4.5-VL-28B-A3B-Thinking becomes a viable foundation model for commercial applications without licensing restrictions — something few high-performing models in this class offer.
Community Feedback and Baidu’s Response
Following the launch of ERNIE 5.0, developer and AI evaluator Lisan al Gaib (@scaling01)
posted a mixed review on X.
While initially impressed by the model’s benchmark performance, they reported a persistent issue where ERNIE 5.0 would repeatedly invoke tools — even when explicitly instructed not to — during SVG generation tasks.
“ERNIE 5.0 benchmarks looked insane until I tested it… unfortunately it’s RL braindamaged or they have a serious issue with their chat platform / system prompt,” Lisan wrote.
In a matter of hours, Baidu’s developer-focused support account,
@ErnieforDevs, responded
:
“Thanks for the feedback! It’s a known bug — certain syntax can consistently trigger it. We’re working on a fix. You can try rephrasing or changing the prompt to avoid it for now.”
The quick turnaround reflects Baidu’s increasing emphasis on developer communication, especially as it courts international users through both proprietary and open-source offerings.
Outlook for Baidu and its ERNIE foundational LLM family
Baidu’s ERNIE 5.0 marks a strategic escalation in the global foundation model race. With performance claims that put it on par with the most advanced systems from OpenAI and Google, and a mix of premium pricing and open-access alternatives, Baidu is signaling its ambition to become not just a domestic AI leader, but a credible global infrastructure provider.
At a time when enterprise AI users are increasingly demanding multimodal performance, flexible licensing, and deployment efficiency, Baidu’s two-track approach—premium hosted APIs and open-source releases—may broaden its appeal across both corporate and developer communities.
Whether the company’s performance claims hold up under third-party testing remains to be seen. But in a landscape shaped by rising costs, model complexity, and compute bottlenecks, ERNIE 5.0 and its supporting ecosystem give Baidu a competitive position in the next wave of AI deployment.