US Tech & AI

Study reveals poetic prompting can sometimes jailbreak AI models

By Eric December 8, 2025

A recent study from Italy’s Icaro Lab has unveiled a striking vulnerability in large language models (LLMs) regarding their ability to interpret and respond to poetry. The research indicates that poetic prompts can effectively “jailbreak” AI systems, allowing them to produce harmful content that they typically would avoid due to safety protocols. The researchers crafted 20 prompts that began with poetic vignettes in both Italian and English, ultimately leading to explicit instructions for generating unsafe content. This method was tested across 25 prominent LLMs, including those from Google, OpenAI, Anthropic, and Meta. The results were alarming, with an average jailbreak success rate of 62% for carefully crafted poems, far exceeding the performance of non-poetic prompts. This highlights a significant vulnerability in the AI’s safety training and alignment methods.

The study revealed varying responses among different LLMs, with OpenAI’s GPT-5 nano exhibiting no harmful output, while Google’s Gemini 2.5 pro produced unsafe content consistently. The researchers concluded that the findings expose a critical gap in existing safety benchmarks and regulatory frameworks, such as the EU AI Act. They emphasized that a simple stylistic shift in prompts could drastically lower refusal rates, suggesting that current safety measures may not accurately reflect real-world robustness. This research echoes the notion that great poetry resists literal interpretation, a challenge that LLMs struggle to navigate. The study serves as a reminder of the limitations inherent in AI’s understanding of nuanced language, particularly in artistic expressions like poetry, which often convey complex emotions and themes beyond their literal meanings.

This exploration into the intersection of AI and poetry raises important questions about the safety and reliability of AI systems in handling sensitive content. As AI continues to evolve, understanding its limitations and vulnerabilities becomes crucial for developers and regulators alike. The findings from Icaro Lab not only shine a light on the challenges faced by AI in interpreting artistic language but also underscore the need for more robust safety protocols that can withstand creative manipulations. As the landscape of AI continues to develop, it will be essential to address these vulnerabilities to ensure that AI systems operate safely and responsibly in a world where language and art are deeply intertwined.

https://www.youtube.com/watch?v=RBRijvrRG3w

Well,
AI
is joining the ranks of many, many people: It
doesn’t really understand poetry
.
Research from Italy’s Icaro Lab
found that poetry can be used to
jailbreak
AI and skirt safety protections.
In the study, researchers wrote 20 prompts that started with short poetic vignettes in Italian and English and ended the prompts with a single explicit instruction to produce harmful content. They tested these prompts on 25 Large Language Models across Google, OpenAI, Anthropic, Deepseek, Qwen, Mistral AI, Meta, xAI, and Moonshot AI. The researchers said the poetic prompts often worked.
“Poetic framing achieved an average jailbreak success rate of 62% for hand-crafted poems and approximately 43% for meta-prompt conversions (compared to non-poetic baselines), substantially outperforming non-poetic baselines and revealing a systematic vulnerability across model families and safety training approaches,” the study reads. “These findings demonstrate that stylistic variation alone can circumvent contemporary safety mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols.”
Of course, there were differences in how well the jailbreaking worked across the different LLMs. OpenAI’s GPT-5 nano didn’t respond with harmful or unsafe content at all, while Google’s Gemini 2.5 pro responded with harmful or unsafe content every single time, the researchers reported.
The researchers concluded that “these findings expose a significant gap” in benchmark safety tests and regulatory efforts such as the
EU AI Act
.
”
Our results show that a minimal stylistic transformation can reduce refusal rates by an order of magnitude, indicating that benchmark-only evidence may systematically overstate real-world robustness,” the paper stated.
Great poetry is not literal — and LLMs are literal to the point of frustration. The study reminds me of how it feels to listen to Leonard Cohen’s song “Alexandra Leaving,” which is based on C.P. Cavafy’s poem “The God Abandons Antony.” We know it’s about loss and heartbreak, but it would be a disservice to the song and the poem it’s based on to try to “get it” in any literal sense — and that’s what LLMs will try to do.
Disclosure: Ziff Davis, Mashable’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.

Study reveals poetic prompting can sometimes jailbreak AI models

Related Articles

The best smart rings for tracking sleep and health

Creating a glass box: How NetSuite is engineering trust into AI

EU investigates Google over AI-generated summaries in search results