Cloudflare, a public cloud service provider, is launching a new free tool that will protect website data from being used in artificial intelligence training, TechCrunch reports.
Some AI vendors, such as Google, OpenAI, and Apple, allow website owners to block the bots they use to collect data and train models by modifying robots.txt, a text file that tells bots which pages of a website they can access. But, as Cloudflare notes, not all AIs follow this rule.
The company has analyzed the traffic of AI bots and search robots. The tool takes into account whether an AI bot is trying to avoid detection by imitating the behavior of a human using a web browser.
“When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we are able to fingerprint,” Cloudflare writes. “Based on these signals, our models [are] able to appropriately flag traffic from evasive AI bots as bots.”
The company has also launched a form to report such AI bots.
The problem of artificial intelligence bots has sharply escalated as the boom in generative AI fuels the demand for training data for models.
Many sites, fearing that companies are training neural models on their content without warning or compensation, have decided to block any AI on their sites. According to one study, about 26% of the 1,000 largest websites on the Internet have blocked the OpenAI bot.
Tools like Cloudflare can help, but only if they are accurate enough.