Cloudflare Accuses Perplexity Of Scraping Websites Blocked From AI Scraping
A new report from Cloudflare claims that Perplexity has been scraping content from websites that have opted to block AI web scrapers. The company says that Perplexity's continued attempts to hide its crawling activity has led to diminished trust from websites that have opted out of sharing their content with AI companies like Perplexity.
In a new report shared on Cloudflare's blog, the network service provider says that Perplexity has been using stealth and modifying its user agents and source ASNs to hide their crawling activity, as well as ignoring or completely failing to fetch the robots.txt files set up for these websites.
That particular file — for those who haven't run a website — is responsible for relaying a website owner's preferences to bots. And since Perplexity has supposedly been ignoring the preferences set by users, Cloudflare says it has delisted the company as a verified bot and has added additional measures to its services to block the stealthy crawling attempts. These accusations could throw a wrench into other ongoing plans from third-party companies like Samsung, which might have planned to include Perplexity on its S26 smartphones.
Testing the claims
Cloudflare isn't just blindly making these accusations. The company says that it conducted a series of tests and experiments to determine if Perplexity was really trying to skirt outside of the boundaries set by the owners of the various websites it was scraping. According to the findings that the team shared, it does appear that Perplexity has been figuring out ways around the preferences set by the websites.
Cloudflare says that it found that when met with blockage, Perplexity resorted to using an undeclared user-agent intended to mimic Google Chrome on macOS. This undeclared crawler then utilized multiple IP addresses not listed in Perplexity's official IP range and would rotate through those IPs as it ran into blockages from the robots.txt file for certain pages.
While a Perplexity spokesperson told TechCrunch that the bot listed in the image shared within the research report isn't one of theirs, that hasn't stopped Cloudflare from standing strong against its allegations. This is also not the first time that Perplexity has been accused of scraping content without the proper authorization.
The concerns over this possible breach of trust have serious implications, as AI web scrapers have been under fire for years due to concerns they would plagiarize human written content to train the AI — thus profiting off of someone's hard work. And with Perplexity's Comet browser making headlines lately, this kind of accusation could incentivize some users to steer clear.