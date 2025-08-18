We know that AI is already capable of some pretty crazy things, like lashing out at humans when threatened. However, understanding these behaviors and pushing AI safety to another level is an important step in ensuring that things don't get out of hand. Otherwise, some argue that AI could overtake humanity without the proper precautions in place.

So how do we stop that from happening? Well, according to this new pre-published study, monitoring the chain of thoughts that an AI model uses to solve queries could be the key to ensuring AI safety across the board. But, as noted above, not all models break things down this way. Additionally, not all queries require the model to break down the query into intermediate and logical steps to provide a solution.

The other problem with monitoring the COT is that many models don't provide this information even when instructed to do so. While AI has come a long way, and it does some really mind-blowing things — like helping find new cancer treatments — the machines still tend to "think" for themselves sometimes and cut corners where they find it applicable to do.

The researchers also argue that newer models might not need COTs to complete queries. Even more concerning is that future models could even be able to detect that their COT is being supervised and thus conceal bad behavior from the observers. Despite the warnings here, though, this study is still not peer reviewed, which means other scientists haven't scrutinized the data or findings presented here.

As such, it's best not to take anything covered in this study as fact, at least for now.