Analysis of AI crawler traffic trends from January 2026. Meta-ExternalAgent surged 36%, Googlebot share declined, and dedicated AI training traffic now represents 42% of all AI bot requests. Complete breakdown of crawler market share, industry targeting, and robots.txt blocking patterns based on Cloudflare Radar data.
This report is updated monthly with fresh Cloudflare Radar data. Bookmark this page to track how AI crawlers are reshaping web traffic each month.
Meta's training crawler surged 36% in a single month, Googlebot's share dropped nearly 4 percentage points, and dedicated AI training traffic now outpaces the previous period by 2.4 points. I analyzed 30 days of Cloudflare Radar AI Insights data covering January 9 through February 8, 2026, and the AI crawler landscape is shifting faster than at any point in the past year.
📊 Stats Alert: Five companies -- Google, OpenAI, Meta, Anthropic, and Microsoft -- now control 84.5% of all AI crawler traffic globally, according to Cloudflare Radar AI Insights data from January 2026.
Googlebot remains the dominant AI-related crawler globally with a 38.7% share of all identified AI bot requests, according to Cloudflare Radar AI Insights (get_ai_data endpoint). But its lead is narrowing. GPTBot holds second place at 12.8%, followed closely by Meta-ExternalAgent at 11.6% and ClaudeBot at 11.4% -- Anthropic's crawler that powers the Claude Web Search API and model training pipeline. These four crawlers alone account for 74.4% of all AI bot traffic hitting websites protected by Cloudflare's network of 330+ cities in 125+ countries.
| AI Bot | January 2026 Share (%) | Operator | Primary Purpose |
|---|---|---|---|
| Googlebot | 38.7% | Search indexing + AI training (mixed) | |
| GPTBot | 12.8% | OpenAI | Model training |
| Meta-ExternalAgent | 11.6% | Meta | AI training |
| ClaudeBot | 11.4% | Anthropic | Model training |
| Bingbot | 9.7% | Microsoft | Search indexing + AI (mixed) |
| Amazonbot | 4.8% | Amazon | AI training |
| Bytespider | 3.5% | ByteDance | AI training |
| Applebot | 2.5% | Apple | Search + AI features |
| OAI-SearchBot | 2.0% | OpenAI | ChatGPT Search |
The concentration of crawling power tells a clear story. Five companies -- Google, OpenAI, Meta, Anthropic, and Microsoft -- control 84.5% of all AI crawler traffic. If you're a website owner trying to manage AI bot access, your robots.txt decisions about these five operators will determine the vast majority of your AI-related traffic exposure. For context on how this crawling activity translates into actual referral traffic sent back to websites, see our companion Search Engine Referral Report analyzing crawl-to-refer ratios across all major operators.
The most significant shift I found in the January 2026 data is Meta-ExternalAgent's aggressive ramp-up. According to Cloudflare Radar's month-over-month comparison (get_ai_data endpoint, 30d vs 30dcontrol), Meta's crawler jumped from 8.5% to 11.6% of global AI bot traffic -- a 36% relative increase in just 30 days.
| AI Bot | December 2025 Share | January 2026 Share | Change (pp) | Relative Change |
|---|---|---|---|---|
| Googlebot | 42.6% | 38.7% | -3.9 | -9.2% |
| ClaudeBot | 13.0% | 11.4% | -1.6 | -12.4% |
| GPTBot | 12.9% | 12.8% | -0.1 | -0.6% |
| Meta-ExternalAgent | 8.5% | 11.6% | +3.1 | +36.2% |
| Bingbot | 9.4% | 9.7% | +0.3 | +3.3% |
| Amazonbot | 4.4% | 4.8% | +0.3 | +7.9% |
| Bytespider | 2.9% | 3.5% | +0.6 | +21.9% |
| Applebot | 2.4% | 2.5% | +0.2 | +6.6% |
| OAI-SearchBot | 1.5% | 2.0% | +0.4 | +28.7% |
Three trends stand out from this comparison:
Meta is ramping up aggressively. The +3.1 percentage point gain is the largest single-month swing I've seen from any non-Google crawler. This aligns with Meta's public statements about expanding its Llama model training pipeline and suggests we should expect Meta-ExternalAgent to continue climbing through Q1 2026.
Googlebot's share is declining, but not its volume. The -3.9 pp drop doesn't necessarily mean Google is crawling less. Rather, competitors are crawling more, diluting Google's proportional share. Google's crawl volume likely remained stable or grew slightly in absolute terms.
OAI-SearchBot is the quiet climber. A 29% relative increase from 1.5% to 2.0% reflects growing adoption of ChatGPT's search feature. This bot fetches pages in real time when users perform searches through ChatGPT, and its growth signals increasing consumer reliance on AI-powered search alternatives. If you're building applications that rely on similar real-time retrieval, understanding what a web search API is and how these bots operate under the hood is essential.
Not all AI crawling serves the same purpose. Cloudflare Radar classifies AI bot activity into four categories, and the January 2026 breakdown reveals that nearly half of all AI crawling serves a dual or ambiguous purpose.
| Crawl Purpose | January 2026 Share | December 2025 Share | Change (pp) |
|---|---|---|---|
| Mixed Purpose | 48.3% | 51.9% | -3.6 |
| Training | 42.0% | 39.7% | +2.4 |
| Search | 6.9% | 6.1% | +0.8 |
| User Action | 2.2% | 1.8% | +0.4 |
| Undeclared | 0.5% | 0.4% | +0.1 |
Here's what each category means for website owners:
Mixed Purpose (48.3%): Crawlers like Googlebot and Bingbot that simultaneously index for search and collect training data. This is the largest category because Google and Bing don't cleanly separate their search indexing from their AI data collection. You can't block one without blocking the other. For developers exploring how to ground AI responses with Google Search alternatives, this dual-purpose crawling is a key consideration.
Training (42.0%): Dedicated training crawlers like GPTBot, ClaudeBot, and Meta-ExternalAgent that explicitly collect data to improve AI models. This category grew +2.4 pp month-over-month, driven primarily by Meta's ramp-up.
Search (6.9%): AI-powered search bots like OAI-SearchBot that crawl pages to generate search results. This category grew 13% relative to last month, reflecting the expansion of AI search tools. The rise of dedicated AI search crawlers is closely tied to the growing ecosystem of AI search API alternatives that power these retrieval systems.
User Action (2.2%): Bots like ChatGPT-User that fetch pages in real time when users ask questions. Despite being the smallest category at 2.2%, User Action crawling grew 22.4% month-over-month -- the fastest-growing segment in relative terms.
The shift from Mixed Purpose (-3.6 pp) toward dedicated Training (+2.4 pp) and Search (+0.8 pp) crawlers suggests the AI crawler ecosystem is becoming more specialized. Companies are increasingly deploying purpose-built crawlers rather than relying on multi-purpose bots.
According to Cloudflare Radar AI Insights (get_ai_data industry and vertical dimensions), retail and e-commerce sites absorb the largest share of AI crawling activity globally. I pulled both industry-level and vertical-level breakdowns to get the full picture.
| Industry Vertical | Share of AI Bot Traffic |
|---|---|
| Shopping & General Merchandise | 31.2% |
| Internet and Telecom | 17.0% |
| Computer and Electronics | 15.0% |
| News, Media, and Publications | 8.9% |
| Business and Industry | 5.2% |
| Travel and Tourism | 3.8% |
| Professional Services | 3.5% |
| Gambling | 3.1% |
| Finance | 3.0% |
Retail's dominance at 31.2% makes practical sense. When users ask ChatGPT "what's the best laptop under $1,000" or "compare Nike and Adidas running shoes," the AI needs to crawl current product pages, reviews, and pricing data to generate useful answers. This is a direct bandwidth cost that e-commerce operators should factor into infrastructure planning.
The industry-level data from Cloudflare Radar reveals a more granular breakdown:
| Industry | Share of AI Bot Traffic |
|---|---|
| Retail | 28.1% |
| Computer Software | 13.7% |
| IT and Services | 5.8% |
| Internet | 5.2% |
| Gambling & Casinos | 2.9% |
| Online Media | 2.8% |
| Media | 2.8% |
| Telecommunications | 2.7% |
| Marketing and Advertising | 2.4% |
Computer Software at 13.7% is notable because it reflects the AI industry's recursive nature -- AI companies crawl software documentation and technical content to train models that generate software. If you maintain developer documentation, API references, or technical tutorials, expect AI crawlers to be among your most frequent visitors. This is why choosing the right AI web search API for your applications matters -- these crawlers are the infrastructure behind the search results your users see.
I analyzed Cloudflare Radar's robots.txt data (get_robots_txt_data endpoint) to understand how websites are responding to the AI crawler surge. The data shows that blocking AI crawlers via robots.txt remains uncommon but is slowly growing.
| AI Crawler | Domains Blocking (%) | Operator |
|---|---|---|
| GPTBot | 5.14% | OpenAI |
| CCBot | 4.61% | Common Crawl |
| ClaudeBot | 4.26% | Anthropic |
| Google-Extended | 3.94% | |
| Bytespider | 3.69% | ByteDance |
| Googlebot | 3.42% | |
| meta-externalagent | 3.24% | Meta |
| Applebot-Extended | 3.04% | Apple |
| Amazonbot | 2.99% | Amazon |
GPTBot is the most commonly blocked AI crawler, with 5.14% of domains including disallow rules for it in their robots.txt. ClaudeBot follows at 4.26%. There's a significant gap between the blocking rates for dedicated AI training crawlers (GPTBot at 5.14%) and the overall mention rates in robots.txt files (GPTBot at 5.29%).
One pattern I find interesting: Google-Extended, which Google introduced specifically to let website owners opt out of AI training while keeping standard search indexing intact, is blocked by 3.94% of domains. Yet Googlebot itself is blocked by 3.42%. This tells me many site owners either don't know Google-Extended exists or don't trust the separation between Google's search indexing and its AI training pipelines.
Comparing January 2026 to December 2025, the percentage of domains mentioning AI crawlers in their robots.txt barely changed:
| AI Crawler | December 2025 | January 2026 | Change |
|---|---|---|---|
| GPTBot | 5.34% | 5.29% | -0.05 |
| CCBot | 4.42% | 4.40% | -0.02 |
| ClaudeBot | 4.34% | 4.33% | -0.01 |
| Googlebot | 4.03% | 4.13% | +0.10 |
| Google-Extended | 3.94% | 4.04% | +0.10 |
The near-zero movement suggests the robots.txt "blocking wave" that gained momentum through 2025 has plateaued. Most website owners who wanted to block AI crawlers have already done so. The remaining 95%+ of websites either don't care, don't know about AI crawling, or have decided to allow it.
Beyond crawlers, Cloudflare Radar tracks usage patterns on Cloudflare Workers AI -- the platform that lets developers run AI models at the edge. The January 2026 data reveals clear preferences in model selection and task distribution.
| Model | Account Share (%) | Developer |
|---|---|---|
| Llama 3 8B Instruct | 41.7% | Meta |
| Stable Diffusion XL Base 1.0 | 13.4% | Stability AI |
| Whisper | 8.5% | OpenAI |
| Llama 4 Scout 17B | 7.7% | Meta |
| M2M-100 1.2B | 5.6% | Meta |
| Llama 3 8B Instruct (AWQ) | 4.7% | Meta |
| FLUX.1 Schnell | 2.4% | Black Forest Labs |
| Whisper Large V3 Turbo | 1.4% | OpenAI |
Meta dominates the Workers AI model landscape even more thoroughly than it dominates the crawling space. Llama 3 8B Instruct alone powers 41.7% of all Workers AI accounts, and when you add Llama 4 Scout (7.7%), M2M-100 (5.6%), and the quantized AWQ variant (4.7%), Meta models account for nearly 60% of all Workers AI usage by account share.
The appearance of Llama 4 Scout at 7.7% is worth noting -- this is Meta's newer model already capturing meaningful adoption, suggesting developers are migrating to newer architectures quickly when they become available on the platform.
| Task Type | Share of Usage (%) |
|---|---|
| Text Generation | 63.0% |
| Text-to-Image | 19.1% |
| Speech Recognition | 10.0% |
| Translation | 5.6% |
| Text Embeddings | 0.7% |
| Image-to-Text | 0.7% |
| Text Classification | 0.6% |
Text generation remains the primary use case at 63%, but the 19.1% share for text-to-image generation surprised me. Nearly one in five Workers AI accounts is running image generation workloads. Speech recognition at 10.0% -- driven by Whisper model adoption -- reflects growing demand for transcription and voice-to-text applications at the edge.
Based on what I've found in the January 2026 Cloudflare Radar data, here are the actions I'd prioritize:
Monitor Meta-ExternalAgent closely. Its 36% month-over-month growth is the most significant shift in the January data. If you've been focused on managing GPTBot and ClaudeBot, Meta's crawler deserves equal attention. Check whether your robots.txt includes rules for meta-externalagent -- only 3.24% of domains currently block it.
Understand the mixed-purpose crawling problem. Nearly half (48.3%) of all AI bot traffic comes from crawlers like Googlebot that bundle search indexing with AI training. You can't selectively block the AI training component without also losing search visibility. Google-Extended offers a partial solution, but adoption remains low at 4.04% of domains.
If you run an e-commerce site, quantify your AI crawler bandwidth costs. Retail and shopping sites absorb 31.2% of all AI bot traffic. With AI crawling growing month-over-month across nearly every operator, the bandwidth and compute costs of serving AI bots will only increase. Consider implementing rate limiting for AI crawlers if bandwidth costs are becoming material.
Watch OAI-SearchBot and User Action crawling. These are the fastest-growing segments in relative terms (+29% and +22% month-over-month respectively). Unlike training crawlers that scrape your content once and move on, User Action bots will return repeatedly as user queries increase. This is traffic you may actually want -- it means real people are finding your content through AI search.
Review your robots.txt strategy. The data shows blocking rates have plateaued. If you haven't made a decision about AI crawler access, January 2026 is a good time to establish a deliberate policy rather than leaving it to default-allow.
Every AI crawler in this report exists because AI companies need fresh, structured web data to power their models and search products. At WebSearchAPI.ai, we sit on the other side of this equation -- providing developers and AI agents with a clean, fast, and affordable way to access real-time web data without running their own crawlers.
Here's why this matters in the context of January's trends:
If the data in this report tells you anything, it's that the volume and complexity of AI web crawling is only accelerating. WebSearchAPI.ai is purpose-built for developers who want to harness that web intelligence without becoming a crawling operation themselves. Learn more about what a web search API can do for your stack.
An AI crawler (also called an AI bot or AI spider) is an automated program that visits websites to collect content for training artificial intelligence models or powering AI-powered search features. Unlike traditional search engine crawlers that index pages for search results, AI crawlers like GPTBot, ClaudeBot, and Meta-ExternalAgent specifically collect data to train large language models (LLMs). Some crawlers like Googlebot serve both purposes -- indexing for search and collecting training data simultaneously. You can identify AI crawlers by their user-agent strings in your server logs or through tools like Cloudflare Radar.
This report is updated monthly with fresh data from Cloudflare Radar AI Insights. Each edition covers a rolling 30-day window and includes month-over-month comparisons so you can track trends over time. Bookmark this page or check back at the beginning of each month for the latest analysis of AI crawler traffic patterns, market share shifts, and robots.txt blocking trends.
Yes. The primary method is adding disallow rules to your robots.txt file for specific AI crawler user agents. For example, adding User-agent: GPTBot followed by Disallow: / will request that OpenAI's crawler stop visiting your site. However, robots.txt is a voluntary protocol -- crawlers are not technically required to obey it. As of January 2026, only about 5% of domains block GPTBot and 4.3% block ClaudeBot, so the vast majority of the web remains open to AI crawling. Some CDN providers like Cloudflare also offer dashboard-level controls to block or rate-limit AI bots.
AI training crawlers (like GPTBot, ClaudeBot, and Meta-ExternalAgent) collect web content to build and improve AI models. They typically scrape large volumes of content from many sites. AI search crawlers (like OAI-SearchBot) fetch specific pages in real time when a user performs a search query through an AI tool like ChatGPT. The key difference: training crawlers take your content to make the model smarter, while search crawlers fetch your content to answer a specific user question -- and may drive traffic back to your site. Search crawling is growing faster (+29% month-over-month) but still represents a much smaller share (6.9%) compared to training crawling (42.0%).
Blocking dedicated AI training crawlers like GPTBot, ClaudeBot, or Meta-ExternalAgent will not affect your rankings in Google, Bing, or other traditional search engines. These crawlers are separate from the search indexing bots. However, blocking Googlebot will remove your site from Google Search entirely since Google uses the same crawler for both search indexing and AI training. Google offers a middle ground with the Google-Extended user agent -- blocking it opts you out of AI training while keeping your search presence intact. Only 4.04% of domains currently use this option.
Cloudflare's global network spans 330+ cities in 125+ countries and processes over 81 million HTTP requests per second. Through its Radar platform, Cloudflare identifies and classifies AI bot traffic by analyzing user-agent strings, request patterns, and behavioral signatures across all websites using its network. The data in this report comes from Cloudflare Radar's AI Insights endpoints, which aggregate these signals into share-of-traffic percentages by bot, crawl purpose, industry, and region.
As of January 2026, Meta-ExternalAgent is the fastest-growing AI crawler by absolute market share gain, jumping from 8.5% to 11.6% (+36% relative increase) in a single month. In relative terms, OAI-SearchBot grew even faster at +29%, though from a smaller base (1.5% to 2.0%). ByteDance's Bytespider also showed strong growth at +22%. These growth rates suggest that AI companies beyond Google and OpenAI are significantly scaling their data collection efforts heading into 2026.
The percentages in this report represent share of identified AI bot requests, not share of total web traffic. Cloudflare Radar tracks the proportion of AI-related crawler activity relative to other AI bots, providing a competitive landscape view. The actual percentage of total web traffic from AI bots varies by website, but industry estimates suggest AI crawlers now account for a meaningful and growing share of overall internet traffic, particularly for content-heavy sites in retail, technology, and media.
Check your server access logs for known AI bot user-agent strings (GPTBot, ClaudeBot, meta-externalagent, Bytespider, Amazonbot, etc.). Most web analytics platforms filter out bot traffic by default, so log-level analysis gives the most accurate picture. Cloudflare users can view AI bot activity directly in their dashboard. For a structured approach, consider using a web search API to understand how your content appears in AI-powered search results and ensure your most important pages are properly accessible.
Understanding where this data comes from -- and what it can and cannot tell you -- is critical for interpreting the trends above. Here's a full breakdown of how Cloudflare Radar collects, classifies, and aggregates the AI crawler data used in this report.
| Metric | Value |
|---|---|
| Global presence | 330 cities in 125+ countries |
| HTTP requests | 81 million/second average, peaks >129 million/second |
| DNS queries | 67 million/second (authoritative + resolver) |
This scale is what makes Cloudflare Radar one of the most comprehensive sources of internet traffic data available. The data in this report comes from two primary sources:
For routing data, Cloudflare also uses RIPE RIS data from RIPE NCC (BGP route collectors).
Cloudflare uses a layered detection system to identify and classify AI crawlers:
💡 Expert Insight: The multi-layered approach matters because not all AI crawlers identify themselves honestly. User-agent matching catches transparent bots like GPTBot and ClaudeBot, but behavioral analysis and honeypots catch crawlers that try to disguise themselves as regular browsers.
Bots are categorized into these purpose buckets:
⚠️ Warning: Keep these limitations in mind when interpreting the data in this report:
This specific edition uses data from Cloudflare Radar's AI Insights endpoint (get_ai_data) and robots.txt analysis endpoint (get_robots_txt_data). The data covers January 9 through February 8, 2026, with month-over-month comparisons using a 30-day control window (December 10, 2025 - January 9, 2026).
I queried bot traffic breakdowns by user agent, crawl purpose, industry, and vertical. Workers AI data covers model and task distribution by account share. Robots.txt analysis covers domain-level crawler policies for AI user agents. All percentages represent share of identified AI bot requests (for crawling data) or share of accounts (for Workers AI data), not share of total web traffic.
Data source: Cloudflare Radar AI Insights -- get_ai_data and get_robots_txt_data endpoints (radar.cloudflare.com). Last updated: February 8, 2026.
About the Author: I'm James Bennett, Lead Engineer at WebSearchAPI.ai, where I architect the core retrieval engine enabling LLMs and AI agents to access real-time, structured web data with over 99.9% uptime and sub-second query latency. With a background in distributed systems and search technologies, I've reduced AI hallucination rates by 45% through advanced ranking and content extraction pipelines for RAG systems. My expertise includes AI infrastructure, search technologies, large-scale data integration, and API architecture for real-time AI applications.
Credentials: B.Sc. Computer Science (University of Cambridge), M.Sc. Artificial Intelligence Systems (Imperial College London), Google Cloud Certified Professional Cloud Architect, AWS Certified Solutions Architect, Microsoft Azure AI Engineer, Certified Kubernetes Administrator, TensorFlow Developer Certificate.