Cloud service company Cloudflare has just launched a new tool to help prevent AI bots from illegally collecting data from websites.

In the context of increasing demand for data to train artificial intelligence (AI) models, many AI companies have deployed bots to "scrape" data from websites, leading to the risk of breaches. copyright and unauthorized use of information.

Although some AI companies such as Google, OpenAI and Apple allow website owners to block bots through robots.txt files, Cloudflare is concerned that some AI companies may find ways to "circumvent the law" to collect data. , despite content controls.

Sharing with TechCrunch, the company commented, "Customers do not want AI bots accessing their websites."

Distribution of blocked User-Agents in robots.txt file”

In fact, with OpenAI alone, multiple studies have shown that more than 600 news publishers and 26% of the top 1,000 websites have disabled its bots.

Faced with this situation, Cloudflare has researched and analyzed the activities of AI bots, thereby refining the automatic bot detection algorithm. The algorithm will consider many factors, including whether the AI ​​bot is trying to “impersonate” a web user.

“Bad actors often use tools and frameworks to collect data on a large scale,” Cloudflare said. Based on these signals, our algorithm can accurately flag AI bots.”

This new tool is completely free for websites using Cloudflare's platform. Website owners can also report suspicious AI bots, which Cloudflare will review and blacklist.

Besides Cloudflare, Reddit also made a similar move when it announced that it would block most automated bots from using their data without a license. According to Mashable, Reddit will modify the robots.txt file to restrict web crawlers.

However, blocking bots is not always effective. Some AI companies have been accused of ignoring robots.txt rules to gain a competitive advantage. AI search engine Perplexity is accused of impersonating users to “scrape” content, while OpenAI and Anthropic are said to have repeatedly violated robots.txt.

Tools like Cloudflare's could be useful if they prove accurate in detecting anonymous AI bots. However, they don't solve the larger problem of publishers having to sacrifice traffic from AI tools like Google's AI Overviews, which exclude sites from listings if they block crawlers. Specific AI.

In the newly released "State of Application Security 2024" report, Cloudflare said that cybersecurity teams are having difficulty dealing with threats from modern applications.

Accordingly, software supply chains, DDoS attacks and malicious bots are major challenges for professional application security teams. Cloudflare emphasizes that it prevents 209 billion cyberattacks every day for its customers.

Cloudflare co-founder and CEO Matthew Prince said online applications are often built with little regard for security, making them vulnerable to attacks.