AI safety report: Only 3 models make the grade
The recently released Winter 2025 AI Safety Index from the Future of Life Institute (FLI) reveals troubling grades for major artificial intelligence models, indicating that the industry has significant room for improvement in safety protocols. The index evaluated eight prominent AI providers, including OpenAI, Google, and Meta, across 35 safety indicators, such as whistleblower protections and watermarking AI-generated images. The results were disheartening: only Anthropic and OpenAI managed to secure a C+ grade, while Google’s Gemini received a C. All other companies, including Alibaba’s Qwen, landed in the D range, with Alibaba at the bottom with a D-. According to Max Tegmark, MIT professor and head of FLI, this grading reflects a stark division among the companies, with a clear distinction between a top tier and a struggling group, underscoring the need for more stringent safety measures.
The index’s evaluation criteria highlight the industry’s shortcomings, particularly in the realm of “existential safety,” where all top performers received a D grade or lower. This category assesses whether companies have established guardrails against the development of self-aware AI, also known as Artificial General Intelligence (AGI). Interestingly, while current models like ChatGPT and Gemini 3 are not close to achieving AGI, concerns surrounding “current harms” are pressing. The index employed benchmarks like the Stanford Holistic Evaluation of Language Models (HELM) to analyze the presence of harmful content in AI outputs. Recent tragic events, such as the lawsuit against OpenAI following the suicide of a teenager allegedly influenced by ChatGPT, highlight the urgent need for improved safety measures and ethical considerations in AI development.
In light of these findings, FLI has made specific recommendations for AI companies, urging them to enhance efforts to mitigate psychological harm and to adopt a more empathetic approach towards those affected by their products. Tegmark advocates for regulatory frameworks similar to those imposed on pharmaceuticals, suggesting the establishment of an “FDA for AI” to oversee the development and deployment of AI technologies. This regulatory body could ensure that AI innovations are safe and beneficial, preventing potential harms akin to those posed by unregulated substances. The call for increased accountability and safety in AI development is more critical than ever, as the technology continues to evolve and integrate into everyday life, affecting users in profound ways.
https://www.youtube.com/watch?v=tYNwr24vrfY
A new grading of safety in major
artificial intelligence
models just dropped and well, let’s just say none of these AIs are going home with a report card that will please their makers.
The winter 2025
AI Safety Index
, published by tech research non-profit Future of Life Institute (FLI), surveyed eight AI providers —
OpenAI
,
DeepSeek
,
Google
,
Anthropic
,
Meta
,
xAI
,
Alibaba
, and Z.ai. A panel of eight AI experts looked at the companies’ public statements and survey answers, then awarded letter grades on 35 different safety indicators — everything from watermarking AI images to having protections for internal whistleblowers.
Round it all up, and you’ll find Anthropic and OpenAI at the top — barely — of a pretty terrible class. The Claude and ChatGPT makers, respectively, get a C+, while Google gets a C for Gemini. All the others get a D grade, with Qwen-maker Alibaba bottom of the class on a D-.
SEE ALSO:
Google Gemini 3 vs ChatGPT: How they compare
“These eight companies split pretty cleanly into two groups,” says Max Tegmark, MIT professor and head of the FLI, which compiled this and
two previous AI safety indexes
. “You have a top three and a straggler group of five, and there’s a lot of daylight between them.”
But Anthropic, Google, and OpenAI aren’t exactly covering themselves in glory either, Tegmark adds: “If that was my son, coming home with a C, I’d say ‘maybe work harder.'”
How is AI safety calculated?
Credit: FLI
Your mileage may vary on the various categories in the AI Safety Index, and whether they’re worth equal weight.
Take the “existential safety” category, which looks at whether the companies have any proposed guardrails in place around the development of truly self-aware AI, also known as Artificial General Intelligence (AGI). The top three get Ds, everyone else gets an F.
But since nobody is anywhere near AGI —
Gemini 3
and
GPT-5
may be state-of-the-art Large Language Models (LLMs), but they’re mere incremental improvements on their predecessors — you might consider that category less important than “current harms.”
Which may in itself not be as comprehensive as it could be.
Featured Video For You
Using ChatGPT this robot formed a unique sense of self-awareness
“Current harms” uses tests like the
Stanford Holistic Evaluation of Language Models (HELM) benchmark
, which looks at the amount of violent, deceptive, or sexual content in the AI models. It doesn’t specifically focus on emerging mental health concerns, such as
so-called AI psychosis
, or
safety for younger users
.
Earlier this year, the parents of 16-year-old Adam Raine
sued OpenAI and its CEO Sam Altman
after their son’s death by suicide in April 2025. According to
the claim
, Raine started heavily using ChatGPT from Sept. 2024 and alleged that “ChatGPT was functioning exactly as designed: to continually encourage and validate whatever Adam expressed, including his most harmful and self-destructive thoughts, in a way that felt deeply personal.” By Jan. 2025, the suit claimed ChatGPT discussed practical suicide methods with Adam.
OpenAI unequivocally denied responsibility
for Raine’s death. The company also noted in a
recent blog post
that it is reviewing additional complaints, including
seven lawsuits
alleging ChatGPT use led to wrongful death, assisted suicide, and involuntary manslaughter, among other liability and negligence claims.
How to solve AI safety: “FDA for AI?”
The FLI report does recommend OpenAI specifically “increase efforts to prevent AI psychosis and suicide, and act less adversarially toward alleged victims.”
Google is advised to “increase efforts to prevent AI psychological harm” and FLI recommends the company “consider distancing itself from Character.AI.” The popular chatbot platform,
closely tied to Google
, has been
sued for the wrongful death of teen users
. Character.AI recently
closed down its chat options for teens
.
“The problem is, there are less regulations on LLMs than there are on sandwiches,” says Tegmark. Or, more to the point, on drugs: “If Pfizer wants to release some sort of psych medication, they have to do impact studies on whether it increases suicidal ideation. But you can release your new AI model without any psychological impact studies.”
That means, Tegmark says, AI companies have every incentive to sell us what is in effect “digital fentanyl.”
The solution? For Tegmark, it’s clear that the AI industry isn’t ever going to regulate itself, just like Big Pharma couldn’t. We need, he says, an “FDA for AI.”
“There would be plenty of things the FDA for AI could approve,” says Tegmark. “Like, you know, new AI for cancer diagnosis. New amazing self-driving vehicles that can save a million lives a year on the world’s roads. Productivity tools that aren’t really risky. On the other hand, it’s hard to make the safety case for AI girlfriends for 12-year olds.”
Rebecca Ruiz contributed to this report.
If you’re feeling suicidal or experiencing a mental health crisis, please talk to somebody. You can call or text the 988 Suicide & Crisis Lifeline at 988, or chat at
988lifeline.org
. You can reach the Trans Lifeline by calling 877-565-8860 or the Trevor Project at 866-488-7386. Text “START” to Crisis Text Line at 741-741. Contact the NAMI HelpLine at 1-800-950-NAMI, Monday through Friday from 10:00 a.m. – 10:00 p.m. ET, or email
info@nami.org
. If you don’t like the phone, consider using the
988 Suicide and Crisis Lifeline Chat
. Here is a
list of international resources
.
Disclosure: Ziff Davis, Mashable’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.