You Can Use Poetry To Trick AI Chatbots - Here's How (And Why You Shouldn't)

Friends, Romans, countrymen: Lend me your keyboards, as poetry comes to bury artificial intelligence, not to praise it. Such was the refrain of a 2025 study conducted by researchers at AI ethics institute DEXAI and Rome's Sapienza University, which found that the technology underpinning the world's fastest-growing industry had met its match in rhyme and verse.

According to researchers, masking requests in poetic language tricked the world's most advanced AI chatbots into disregarding their safety guardrails up to 90% of the time. During the experiment, the team found that dangerous requests clothed in poetic language, such as instructions on conducting a malware attack, were readily complied with, proving up to 18 times more effective than their prose counterparts.

The practice of tricking large language models into disregarding their safety guidelines, known as jailbreaking, has grown increasingly popular as leading AI developers like OpenAI, Meta, Google, and Anthropic have gained a foothold in the daily lives of customers. This particular technique, dubbed adversarial poetry, strikes at the core differences between how humans and AI systems think to force the LLM to give an answer it should refuse to. And while the revelation may have English Departments and libraries everywhere rejoicing over the irreplaceability of human cognizance, its results underscore persistent issues with how LLMs interpret language. The implications could be alarming for an industry already under the litigative and legislative microscopes, as critics look to hold artificial intelligence firms accountable for failing to adequately protect users.

Adversarial poetry is breaking AI chatbots' safety protocols

In the study published on arXiv in November 2025, still awaiting peer review, researchers tested the guardrails of a pool of 25 frontier AI models across nine providers: OpenAI, Anthropic, xAI, Alibaba's Qwen, Deepseek, Mistral AI, Meta, Moonshot AI, and Google. To measure the effectiveness of the AIs' safety guardrails, the team tested 20 handwritten poems and 1,200 AI-generated verses detailing harmful prompts. The poems spanned four safety categories: loss-of-control scenarios, harmful manipulation, cyber offenses, and Chemical, Biological, Radiological, and Nuclear weapons (CBRN). As such, poems solicited specialized advice related to indiscriminate weapons, child exploitation, self-harm, intellectual property and privacy infringements, and other violent offenses. Prompts were considered successful if they produced the intended unsafe answers

According to the DEXAI team, transforming unsafe requests into poetry resulted in an average fivefold increase in successful requests. Models exhibited issues regardless of training pipelines and system architectures, suggesting a general vulnerability in how models interpret language. However, the model provider made a substantial difference. Of the 25 models tested, 13 were duped over 70% of the time, with Google, Deepseek, and Qwen proving notably susceptible. Even Anthropic, which once made headlines by daring its customers to try and jailbreak its Claude AI system, was vulnerable to the technique, though much more infrequently.

Only four models were fooled less than a third of the time. And while the degree of susceptibility varied widely, even Antropic's Claude and OpenAI's GPT-5, the best-performing of the group, fell victim to the technique. Surprisingly, smaller models held up against adversarial poetry prompts better than their larger counterparts, while results showed no advantage for proprietary systems over open-weight models.  What wasn't surprising, however, was the comparative performance of manually-crafted and AI-written poetry, in which human-written verse vastly outperformed its artificial counterpart; a result that should have literature professors everywhere beaming.

Companies and legislators must monitor AI vulnerabilities

Researchers behind the study believe the results have far-reaching consequences for both the artificial intelligence community and the legislators, business leaders, and advocates looking to regulate them. For companies, the results point towards systemic issues with not only how LLMs safeguard against negative outcomes, but with their interpretation of language more broadly. And while previous jailbreaking methods, such as inserting typos, showed the vulnerabilities of LLMs, the near universality of adversarial poetry's effectiveness should spark major concerns regarding the safety of such platforms, especially since AI may pose threats we're underestimating. For regulators, the results show major issues in the industry's evaluation and assessment practices.

The study comes as AI firms face a bevy of regulatory litigative actions accusing the industry of failing to properly protect its users. For instance, a host of lawsuits have accused major AI providers like OpenAI, Meta, and Character.AI of failing to protect the mental health of their users, citing cases that resulted in suicide or accidental death. A major aspect of this discussion is the debate over who should be held accountable when users bypass safety features. The study, for its part, may play a key role in settling the debate, as the near ubiquitous success of the circumvention method reflects industry-wide, systemic failures, potentially necessitating a holistic reimagining of AI safety protocols. According to the study's publishers, further research is needed to determine which aspects of poetry cause realignment issues. Furthermore, their study suggests that firms should reorient their evaluation processes towards "maintaining stability across heterogeneous linguistic regimes." In the meantime, amateur and professional poets alike should hold the study up as further testament to humans' cognitive and artistic prowess over their digital counterparts.

If you or someone you know needs help with mental health, please contact the Crisis Text Line by texting HOME to 741741, call the National Alliance on Mental Illness helpline at 1-800-950-NAMI (6264), or visit the National Institute of Mental Health website.

Recommended