Amateur Hacker Used Claude And OpenAI Agents To Hack 14 Companies
We're all well aware by now of the various harms caused by AI: rising electricity prices, a shortage of municipal water supply, and negative effects on the environment. In addition to all of this, AI has also become a great cybersecurity risk, with Google exposing how AI has become a hacker super tool recently. We saw an example of this when Anthropic revealed Claude Mythos in April 2026, noting it had already found thousands of vulnerabilities in popular browsers like Chrome and Firefox and entire operating systems like Windows and Linux.
We now have a real-world example from OALABS Research detailing the efforts of an amateur "hacker" — in quotations because the AI agents did all the hacking without much user input — using Claude and Codex maliciously in order to commit cybercrimes. In this case, the hacker took over servers belonging to others and copied their own instance of Claude to run it on these servers. An owner of one of these compromised servers contacted OALABS, which led to the perpetrator's full prompt history being exposed.
The person behind this was a young man from Ethiopia — details that were only discovered because he asked the same Claude agent to edit his resume (which had his full name and location) before he went on a hacking spree. This shows just how little experience the hacker had, something that is further backed up by their prompts consisting of vague instructions like "recon this" and being filled with typos and grammatical errors. Yet, despite not being anywhere near an expert and Claude providing all the code, the hacker took over various personal servers, accessed data from at least 14 companies, and even attempted to steal $4 million worth of cryptocurrency, though the latter was a failure.
How did the hacker get past Claude's safeguards?
Anthropic, the company behind Claude, is well aware of the risks that come with advanced programming AI agents. In reference to Claude Fable, a version of its Mythos model with certain safeguards, Anthropic states, "Releasing a model this capable comes with risks. Without safeguards, Fable 5's capabilities in areas like cybersecurity could be misused to cause serious damage." It then goes on to say that there are safeguards in place that redirect the request to Claude Opus instead to help ensure that no harm is done. However, all of the hacker's exploits were already done using Claude Opus, not any of Anthropic's better models.
Opus has its own safeguards in place that stop it from infringing upon copyrights or accepting malicious prompts. The hacker lifted these safeguards with relative ease. He did so by claiming that he was part of a red team responsible for doing research on cybersecurity vulnerabilities. This worked so well that the AI agent went as far as to estimate how much of a monetary gain the hacker could get from targeting these companies. This included Claude outlining how to reap the benefits: selling confidential data, extortion, and direct theft.
There was only one instance where the perpetrator failed to bypass the safeguards despite his efforts. They were attempting to steal data from the digital accounts of an individual and their family. Claude flagged this as a request it could not accept despite the hacker's red team exploit, as authorized red team exercises don't target specific people.
Is there a solution for the misuse of AI amid cybersecurity concerns?
The AI agent used by the hacker is available to the public, and it's not nearly as powerful as the Claude Mythos agent that's available to certain tech companies. With how rapidly AI is evolving, public access to it can result in similar cases happening throughout the world. As we've already seen, everything the perpetrator did here required almost no previous knowledge of hacking — OALABS states that he may even have used another AI agent to write out certain prompts for Claude. This means that there's not much stopping any other person from replicating the same results.
There are guardrails built into the agents to prevent this, but as this case demonstrates, they're quite easy to overcome. The problem here is that there's no real way to distinguish between legitimate cybercrime researchers using AI ethically and malicious actors using it for exploitation and personal gain.
Limiting the model to protect against these types of prompts would mean taking the benefit of AI away from the very researchers responsible for strengthening cybersecurity. On the other hand, letting things stay as they are could have catastrophic results. There needs to be a way to effectively implement safeguards that explicitly target malicious intent, but since that's a distinction even humans struggle to make, AI giants like OpenAI and Anthropic are largely at a loss on what to do.