close
close

Cloudflare Shares Free Tool to Stop AI Bots From Scraping Websites

The company also shared details about the major bots attempting to scrape data, including bots allegedly run by ByteDance, Amazon, Anthropic and OpenAI.

As bots continue to roam the internet to help train AI models, Cloudflare has released a new tool that allows customers to block all bots at once.

The tool aims to combat scraping, the process by which robots extract content and data from websites. This is a practice that has grown with the rise of generative AI. Another problem highlighted by Cloudflare is web crawling, i.e. robots that crawl the web to index the content of various sites.

Cloudflare announced last year an option for its customers to block certain types of bots, but this new tool lets them block all types of bots at once.

The company also conducted an analysis of its traffic to monitor the prevalence of scraping bots and claimed that the value of “bulk original content has never been higher.”

“While our analysis identified the most popular crawlers in terms of query volume and number of Internet properties viewed, many customers are likely unaware of the most popular AI crawlers actively crawling their sites,” Cloudflare said in a blog post.

The computing giant also warned that not all AI companies are transparent about their data-harvesting practices. Cloudflare says it has spotted bot operators attempting to impersonate “a real browser using a spoofed user agent.”

“We will continue to monitor and add more bot blocks to our AI scrapers and crawlers policy and evolve our machine learning models to help make the internet a place where content creators can thrive,” Cloudflare said.

The biggest robots

In addition to this new feature, Cloudflare also shared information about some of the most well-known AI bots that scrape its network. As of June 2023, the company claimed to be connected to over 12,000 networks in over 300 cities.

The company said the main AI crawlers performing queries on Cloudflare sites are Bytespider, Amazonbot, ClaudeBot, and GPTBot. Cloudflare claimed that these crawlers are used to train AI models for ByteDance, Amazon, Anthropic, and OpenAI respectively.

According to Cloudflare data, Bytespider, GPTBot, and ClaudeBot were the top three bots in terms of share of websites visited.

Data scraping has become a concern for several industries in recent times with the growth of generative AI. In May, Sony Music Group wrote to more than 700 tech companies asking them to refrain from using its content to train AI models.

Discover how new technology trends are transforming the future with our new podcast, Future Human: The Series. Listen now on Spotifyon Apple or wherever you get your podcasts.