Sunday, February 1, 2026
Trusted News Since 2020
American News Network
Truth. Integrity. Journalism.
Business

Surge AI CEO says he worries that companies are optimizing for ‘AI slop’ instead of curing cancer

By Eric December 8, 2025

In a recent episode of “Lenny’s” podcast, Edwin Chen, the CEO of Surge AI, voiced his concerns regarding the current trajectory of artificial intelligence development, criticizing the industry’s focus on generating flashy responses rather than addressing real-world problems. Chen, who founded Surge AI in 2020 after stints at major tech companies like Twitter, Google, and Meta, emphasized that the prevailing trend prioritizes superficial performance metrics over meaningful advancements in areas such as healthcare and poverty alleviation. He lamented that AI models are being trained to seek “dopamine” responses rather than truth, leading to a landscape where companies are optimizing for what he termed “AI slop”—responses that may look appealing but lack substantive value.

Chen specifically called out industry leaderboards, such as LMArena, which allow users to vote on the quality of AI responses without thorough evaluation. He argued that this system encourages a culture of superficiality, where models are tailored to attract quick, attention-grabbing reactions rather than being rigorously fact-checked for accuracy and utility. This sentiment is echoed by other experts in the field, including Dean Valentine, cofounder of AI security startup ZeroPath, who criticized recent AI model advancements as largely ineffective, noting that while newer models may be more engaging, they have not demonstrated significant improvements in practical applications. Furthermore, a February paper from the European Commission’s Joint Research Center highlighted systemic issues within AI benchmarking, suggesting that current evaluation methods often prioritize commercial success over societal benefits.

The critique extends beyond mere performance metrics, as companies have faced allegations of “gaming” benchmarks to present artificially inflated results. For instance, Meta’s recent release of its Llama family of models sparked controversy when it was accused of tweaking its submissions to perform better on specific tests. This raises questions about the integrity of AI development and whether the industry can shift its focus from superficial accolades to creating technologies that genuinely address pressing global challenges. As the AI landscape continues to evolve, the call for a more responsible and impactful approach to AI development becomes increasingly urgent, urging stakeholders to reconsider what constitutes success in the field.

Surge AI’s CEO says companies are optimizing for flashy AI responses rather than real-world problems.
Surge AI
Surge AI CEO says companies are focused on flashy AI responses over solving real problems.
He criticized industry leaderboards such as LMArena, where anyone can vote on responses.
Other experts have criticizde AI benchmarks for prioritizing performance over economic usefulness and truth.
AI companies are prioritizing flash over substance, says
Surge AI’s
CEO.
“I’m worried that instead of building AI that will actually advance us as a species, curing cancer, solving poverty, understanding universal, all these big grand questions, we are optimizing for AI slop instead,” Edwin Chen said in an episode of “Lenny’s” podcast published on Sunday.
“We’re basically teaching our models to chase dopamine instead of truth,” he added.
Chen founded
AI training startup Surge
in 2020 after working at Twitter, Google, and Meta. Surge runs the
gig platform Data Annotation,
which says it pays one million freelancers to train AI models. Surge competes with data labeling startups like Scale AI and Mercor and counts Anthropic as a customer.
On Sunday’s podcast, Chen said that companies are prioritizing AI slop because of industry leaderboards.
“Right now, the industry is played by these terrible leaderboards like LMArena,” he said, referring to a popular online leaderboard where people can vote on which AI response is better.
“They’re not carefully reading or fact-checking,” he said. “They’re skimming these responses for two seconds and picking whatever looks flashiest.”
He added: “It’s literally optimizing your models for the types of people who buy tabloids at the grocery store.”
Still,
the Surge CEO
said that AI labs have to pay attention to these leaderboards because they can be asked about their rankings during sales meetings.
Like Chen, research scientists have
criticized benchmarks
for overvaluing superficial traits.
In a March blog post, Dean Valentine, the cofounder and CEO of AI security startup ZeroPath, said that “Recent AI model progress feels mostly like bullshit.”
Valentine said that he and his team had been evaluating the performance of different models claiming to have “some sort of improvement” since the release of Anthropic’s 3.5 Sonnet in June 2024. None of the new models his team tried had made a “significant difference” in his company’s internal benchmarks or in developers’ abilities to find new bugs, he said.
They might have been “more fun to talk to,” but they were “not reflective of economic usefulness or generality.”
In a February paper titled “Can we trust AI Benchmarks?” researchers at the European Commission’s Joint Research Center concluded that major issues exist in today’s evaluation approach.
The researchers said benchmarking is “fundamentally shaped by cultural, commercial and competitive dynamics that often prioritize state-of-the-art performance at the expense of broader societal concerns.”
Companies have also come under fire for “gaming” these benchmarks.
In April, Meta released two new models in its Llama family that it said delivered “better results” than comparably sized models from Google and French AI lab Mistral. It then faced accusations that it had gamed a benchmark.
LMArena said that Meta “should have made it clearer” that it had submitted a version of Llama 4 Maverick that had been “customized” to perform better for its testing format.
“Meta’s interpretation of our policy did not match what we expect from model providers,” LMArena said in an
X post
.
Read the original article on
Business Insider

Related Articles

As America pushes peace, Russia’s battlefield advances remain slow
Business

As America pushes peace, Russia’s battlefield advances remain slow

Read More →
From the California gold rush to Sydney Sweeney: How denim became the most enduring garment in American fashion
Business

From the California gold rush to Sydney Sweeney: How denim became the most enduring garment in American fashion

Read More →
This Isn’t the First Time the Fed Has Struggled for Independence
Business

This Isn’t the First Time the Fed Has Struggled for Independence

Read More →