We Might Not Be Able To Monitor How AI Makes Decisions For Much Longer, New Study Warns

AI is progressing at a breakneck pace, with new models consistently accomplishing things humans hadn't expected to see from AI so soon. While chatbots like ChatGPT and Gemini make it easier to answer certain queries — and even complete some types of work — researchers behind some of these top AI models suggest that we're already struggling to monitor how AI thinks, and that without proper systems in place, that problem will only get worse.

Considering Trump's AI Action Plan calls for less regulation of AI, it's important to identify factors like this, that can directly affect how safe AI is to use and rely on. The core of this problem, researchers say, is tied directly to how AI models think, or how they reason.

According to a new study published on arXiv, a preprint server, the chains of thoughts (COT) are a key way for us to monitor how an AI model approaches and subsequently solves a problem. However, not all models operate on a typical COT setup, as this requires breaking the queries down into more intermediate and logical steps.

Understanding an AI's chain of thoughts

We know that AI is already capable of some pretty crazy things, like lashing out at humans when threatened. However, understanding these behaviors and pushing AI safety to another level is an important step in ensuring that things don't get out of hand. Otherwise, some argue that AI could overtake humanity without the proper precautions in place.

So how do we stop that from happening? Well, according to this new pre-published study, monitoring the chain of thoughts that an AI model uses to solve queries could be the key to ensuring AI safety across the board. But, as noted above, not all models break things down this way. Additionally, not all queries require the model to break down the query into intermediate and logical steps to provide a solution.

The other problem with monitoring the COT is that many models don't provide this information even when instructed to do so. While AI has come a long way, and it does some really mind-blowing things — like helping find new cancer treatments — the machines still tend to "think" for themselves sometimes and cut corners where they find it applicable to do.

The researchers also argue that newer models might not need COTs to complete queries. Even more concerning is that future models could even be able to detect that their COT is being supervised and thus conceal bad behavior from the observers. Despite the warnings here, though, this study is still not peer reviewed, which means other scientists haven't scrutinized the data or findings presented here.

As such, it's best not to take anything covered in this study as fact, at least for now.

Recommended