How A 2,400 Year Old Problem Shows Us How Close ChatGPT's AI Is To Human Intelligence

When testing how close ChatGPT is to the capabilities of human intelligence, doesn't it make sense to follow an intellectual's teachings, such as the Greek philosopher Socrates? That's the idea behind a study scientists published in the International Journal of Mathematical Education in Science and Technology. They asked ChatGPT a 2,400-year-old math problem. The problem, involves doubling the area of a square. In the original story, the student doubled the length of each side of the square. After a back-and-forth, Socrates guides them to use the length of the original square's diagonal as a base to calculate each side instead. But the point is not to correctly utilize the math. Socrates was trying to demonstrate the student already had the necessary knowledge to calculate the real answer through reasoning. And scholars have argued precisely that for centuries — whether we have the mathematical knowledge embedded in us or it's accessed through logical reasoning and experience. But how do large language models (LLMs) like ChatGPT handle this?

Dr. Nadav Marco, study co-leader from the Hebrew University of Jerusalem, worked alongside Andreas Stylianides, professor of Mathematics Education at Cambridge. They reasoned that because ChatGPT is trained on text rather than images, if it can find the correct solution, it supports the idea that mathematical abilities and reasoning are learned rather than innate. Is this knowledge locked away and able to be "retrieved" or is it "generated" through lived-in experiences? They believed there was a low chance the chatbot would get things right. What actually happened is the bot improvised to find a solution, even making a similar mistake to the human student — the bot incorrectly said the diagonal cannot be used and there was no geometric solution.

How do scientists know the chatbot was improvising?

With scientists trying to make AI suffer by feeling simulated pain, truly improvisational AI could be a frightening development. But it's not quite like that. Marco says that it's a "vanishingly small" possibility the false double-sides claim came from existing data, which means it was improvising. ChatGPT was adapting its responses based on previous discussions, which indicates that the knowledge was being generated rather than innately accessed. Marco explains that, as humans, our natural instinct is to try and reason things out "based on [...] past experiences." During the experiment, ChatGPT was doing the same, it seems, coming up with its own hypotheses or answer "like a learner or a scholar." It wasn't using experiences, though, just data. Unlocking the neural code to our reasoning could be the breakthrough that may lead to superhuman AI someday, yes, but we're not there yet.

What this also potentially points to is a concept called the zone of proximal development (ZPD). The term is used to describe the natural gap between what a person knows and what they can eventually grow to know, or learn, with the appropriate guidance — usually from a more knowledgeable or experienced person. ChatGPT may be using a similar concept as a framework to build knowledge, by solving problems that aren't represented in training or past data stores. As with all studies, however, the team is "cautious" about the results, and further behavioral studies would be necessary to make definitive conclusions. For now, the behavior seems "learner-like," but there's not enough evidence to conclude LLMs reason or work things out like humans do. Good news for humanity is an unrelated breakthrough Apple study recently came to the same conclusion, that advanced reasoning AI doesn't actually use reasoning like humans do.

Recommended