OpenAI May Have To Wipe ChatGPT And Start Over
OpenAI, the company behind the popular generative AI tool ChatGPT, could be forced to wipe its chatbot and start over completely, according to a new report from NPR (via Ars Technica). The wipe may come as part of a potential lawsuit which could also see OpenAI fined up to $150,000 for each piece of copyrighted material used to train the language model.
ChatGPT has made quite a stir over the past several months, especially as users have found various ways to use the generative AI tool. One of our own, Chris Smith, has even used ChatGPT to train for a half-marathon, and we're even seeing it used to create AI-generated reviews now, too. But the success of the language model behind the generative AI could have come at a very steep cost.
See, language models like GPT-3.5 and GPT-4 – which make ChatGPT's generative AI work – are trained using third-party data. And OpenAI has even created a web-scraping bot that can pull information from websites to train the GPT model. The problem here, though, is that OpenAI isn't just relying on freely available and non-copyrighted material. It's also using copyrighted material to train its AI models, and it's been doing so without permission.

According to a new report from Ars Technica's Ashley Belanger, the New York Times is currently discussing suing OpenAI after updating its terms of service to prohibit AI from scraping its articles and images to train language models. The exact nature of what this particular type of lawsuit might bring upon OpenAI is unclear, but experts told Ars that it could see OpenAI paying up to $150,000 per infringing piece of content.
Further, the lawsuit could force OpenAI to wipe ChatGPT and completely start over in training its language model, something that would essentially cripple all of the work the company has done on the model so far. This isn't the first time we've seen OpenAI under fire with lawsuits. Well-known authors like Sarah Silverman have banded together to sue the company for similar concerns, all seeking to protect the copyright of the material they created.
It's a really messy situation. If OpenAI is sued by the New York Times, it could lead to other companies and websites trying to make similar moves to protect their work. It's also possible, as NPR notes, that NYT and OpenAI could come to some sort of licensing agreement, allowing the company to pay NYT for access to its content, which would then legally be available to train its GPT model.
Whether or not the Times follows with its lawsuit or OpenAI has to wipe ChatGPT remains to be seen. However, it continues to drive the criticism that these language models have always seemed to rely heavily on the work others have already put in, which is one reason so many people have problems with them.
