Analysis of AI crawler traffic trends from February 2026. For the first time, dedicated AI training crawlers (45.4%) surpassed mixed-purpose bots (43.9%). Meta-ExternalAgent leapfrogged GPTBot to become the #2 AI crawler at 15.6%, and Googlebot fell another 3.5 pp to 34.6%. Complete breakdown of crawler market share, industry targeting, and robots.txt blocking patterns based on Cloudflare Radar data.
This report is updated monthly with fresh Cloudflare Radar data. Bookmark this page to track how AI crawlers are reshaping web traffic each month.
Something flipped in February. For the first time since I've been tracking this data, dedicated AI training crawlers now generate more traffic than mixed-purpose bots. Training hit 45.4% of all AI bot requests. Mixed Purpose dropped to 43.9%. That crossover didn't happen gradually, it happened fast.
Meta-ExternalAgent leapfrogged GPTBot to become the world's second-largest AI crawler at 15.6%. Googlebot's share fell another 3.5 percentage points to 34.6%. I pulled 30 days of data covering February 11 through March 13, 2026, and the pattern is clear: the AI crawling game has entered a different phase entirely.
📊 Stats Alert: Dedicated AI training crawlers now generate 45.4% of all AI bot traffic, surpassing Mixed Purpose crawlers (43.9%) for the first time ever.
Googlebot still sits at the top with 34.6% share of all identified AI bot requests. But its lead keeps shrinking. The real story this month is what happened behind it: Meta-ExternalAgent surged past GPTBot to claim second place at 15.6%, followed by GPTBot at 12.1% and ClaudeBot at 11.1% (Anthropic's crawler that powers the Claude Web Search API and their model training pipeline). Those four crawlers alone account for 73.5% of all AI bot traffic.
| AI Bot | February 2026 Share (%) | Operator | Primary Purpose |
|---|---|---|---|
| Googlebot | 34.6% | Search indexing + AI training (mixed) | |
| Meta-ExternalAgent | 15.6% | Meta | AI training |
| GPTBot | 12.1% | OpenAI | Model training |
| ClaudeBot | 11.1% | Anthropic | Model training |
| Bingbot | 9.3% | Microsoft | Search indexing + AI (mixed) |
| Amazonbot | 5.4% | Amazon | AI training |
| Bytespider | 3.3% | ByteDance | AI training |
| Applebot | 3.1% | Apple | Search + AI features |
| OAI-SearchBot | 2.6% | OpenAI | ChatGPT Search |
The top five companies (Google, Meta, OpenAI, Anthropic, Microsoft) control 82.8% of all AI crawler traffic, down from 84.5% in January. They're not crawling less. It's that second-tier crawlers like Amazonbot (+15.0%), Applebot (+17.0%), and OAI-SearchBot (+16.8%) are all growing fast enough to chip away at that concentration.
If you're managing AI bot access on your site, your robots.txt decisions about these five operators still control most of your AI traffic exposure. For context on how crawling translates into actual referral traffic back to websites, check our companion Search Engine Referral Report on crawl-to-refer ratios.
Meta-ExternalAgent keeps accelerating. It jumped from 11.9% to 15.6% of global AI bot traffic this month, a 30.8% relative increase and +3.7 percentage points. That's even bigger than January's +3.1 pp gain. Over just two months, Meta-ExternalAgent has roughly doubled from 8.5% to 15.6%.
I didn't expect this pace to hold. It did.
| AI Bot | January 2026 Share | February 2026 Share | Change (pp) | Relative Change |
|---|---|---|---|---|
| Googlebot | 38.1% | 34.6% | -3.5 | -9.3% |
| Meta-ExternalAgent | 11.9% | 15.6% | +3.7 | +30.8% |
| GPTBot | 13.3% | 12.1% | -1.1 | -8.5% |
| ClaudeBot | 11.1% | 11.1% | +0.0 | +0.3% |
| Bingbot | 9.5% | 9.3% | -0.1 | -1.4% |
| Amazonbot | 4.7% | 5.4% | +0.7 | +15.0% |
| Bytespider | 3.4% | 3.3% | -0.1 | -3.2% |
| Applebot | 2.6% | 3.1% | +0.4 | +17.0% |
| OAI-SearchBot | 2.2% | 2.6% | +0.4 | +16.8% |
Four things jumped out at me:
Meta has overtaken OpenAI. Meta-ExternalAgent (15.6%) now generates more AI bot traffic than GPTBot (12.1%). Two months ago, GPTBot was comfortably ahead. Meta's Llama training pipeline expansion, including Llama 4 Scout which already shows up on Workers AI, is driving this. At the current trajectory, Meta-ExternalAgent could hit 20% by April 2026.
Googlebot's decline is picking up speed. It lost 3.9 pp in January, then another 3.5 pp in February. Over two months, its share dropped from 42.6% to 34.6%, a cumulative 8-point decline. Google probably isn't crawling less in raw volume. But competitors are crawling so much more that Google's share keeps getting diluted.
ClaudeBot flatlined. After dropping from 13.0% to 11.1% over December-January, ClaudeBot held at exactly 11.1% in February. Anthropic seems to have found a steady crawling cadence while everyone else ramps up or down around them.
The mid-tier grew in lockstep. Amazonbot (+15.0%), Applebot (+17.0%), and OAI-SearchBot (+16.8%) all grew at nearly identical relative rates. That parallel movement tells me this is an industry-wide expansion of AI data collection, not any one company going rogue. If you're building apps that rely on real-time retrieval, understanding what a web search API is and how these bots work under the hood matters more than ever.
This is the big finding of February's data. Dedicated AI training crawlers now generate more traffic than mixed-purpose bots. First time that's happened.
| Crawl Purpose | February 2026 Share | January 2026 Share | Change (pp) |
|---|---|---|---|
| Training | 45.4% | 42.6% | +2.8 |
| Mixed Purpose | 43.9% | 47.6% | -3.7 |
| Search | 8.2% | 7.1% | +1.1 |
| User Action | 2.0% | 2.2% | -0.2 |
| Undeclared | 0.4% | 0.5% | -0.1 |
What does this crossover actually mean if you run a website?
Training is now #1 at 45.4%. GPTBot, ClaudeBot, and Meta-ExternalAgent have crossed the line. The +2.8 pp gain came almost entirely from Meta's ramp-up. The AI industry is deploying more purpose-built training crawlers than it's relying on dual-purpose bots. In plain terms: the content being taken from your site is increasingly going toward model weights, not search indexing. And you can block these crawlers individually, which gives you more control than you had when Googlebot carried the bulk of the traffic.
Mixed Purpose dropped to #2 at 43.9%. Googlebot and Bingbot, which handle both search indexing and AI training simultaneously, lost ground. The -3.7 pp decline mirrors Googlebot's falling share. You still can't separate the AI training from the search indexing with these crawlers. Block Googlebot and you disappear from Google Search. For developers looking at how to ground AI responses with Google Search alternatives, this bundling problem keeps coming up.
Search crawling grew fastest at 8.2%. AI search bots like OAI-SearchBot gained +1.1 pp, a 15.5% relative increase. Second straight month as the fastest-growing category. These bots are tied directly to the growing ecosystem of AI search API alternatives that power real-time retrieval systems.
User Action dipped to 2.0%. Bots like ChatGPT-User that fetch pages when users ask questions slipped -0.2 pp. Could be seasonal. Could be a temporary plateau.
If Training keeps growing at this rate, it'll pass 50% of all AI bot traffic by Q2 2026. That would be a different world from where we were six months ago, when mixed-purpose crawlers dominated.
Retail keeps absorbing the most AI crawling activity globally. The targeting profile barely moved month-over-month.
| Industry Vertical | Share of AI Bot Traffic |
|---|---|
| Shopping & General Merchandise | 31.3% |
| Internet and Telecom | 16.8% |
| Computer and Electronics | 15.1% |
| News, Media, and Publications | 9.0% |
| Business and Industry | 5.0% |
| Travel and Tourism | 4.1% |
| Professional Services | 3.3% |
| Finance | 2.9% |
| Gambling | 2.7% |
Retail at 31.3% (January was 31.2%) makes sense when you think about it. Users asking ChatGPT "what's the best laptop under $1,000" or "compare Nike and Adidas running shoes" generate crawl requests against product pages, reviews, and pricing data. That's real bandwidth cost for e-commerce operators.
One shift I noticed: Travel and Tourism climbed from 3.8% to 4.1%, jumping past Professional Services (3.5% to 3.3%). Probably seasonal. Spring travel planning drives more AI queries toward booking sites and hotel reviews.
The industry-level breakdown gets more granular:
| Industry | Share of AI Bot Traffic |
|---|---|
| Retail | 28.2% |
| Computer Software | 13.8% |
| IT and Services | 5.8% |
| Internet | 4.9% |
| Online Media | 2.7% |
| Telecommunications | 2.7% |
| Media | 2.6% |
| Gambling & Casinos | 2.6% |
| Adult Entertainment | 2.6% |
New entry this month: Adult Entertainment appeared at 2.6%, pushing out Marketing and Advertising (which held 2.4% in January). Computer Software at 13.8% is the second-largest target, and there's something recursive about it. AI companies are crawling software docs and technical content to train models that generate software. If you maintain API references or technical tutorials, AI crawlers are probably some of your most frequent visitors. That's exactly why picking the right AI web search API for your apps matters, since these crawlers are the infrastructure behind the search results your users actually see.
AI training crawlers aren't the only bots scanning the web at this scale, either. Technology detection platforms like Technologychecker.io crawl and fingerprint over 50 million domains using HTTP header analysis, JavaScript fingerprinting, DNS lookups, and headless browser rendering to identify 40,000+ technologies. Unlike AI training crawlers that take content for model weights, technology intelligence crawlers need to re-crawl frequently to track stack changes and new tech adoptions.
After the robots.txt blocking wave plateaued in January, I expected the numbers to stay flat. They didn't. February showed a modest but clear uptick in blocking activity across almost every AI crawler.
| AI Crawler | Domains Referencing (%) | Operator |
|---|---|---|
| GPTBot | 5.45% | OpenAI |
| CCBot | 4.63% | Common Crawl |
| ClaudeBot | 4.62% | Anthropic |
| Google-Extended | 4.36% | |
| Googlebot | 3.77% | |
| Bytespider | 3.73% | ByteDance |
| meta-externalagent | 3.26% | Meta |
| Amazonbot | 3.16% | Amazon |
| Applebot-Extended | 3.10% | Apple |
GPTBot is still the most commonly referenced AI crawler in robots.txt files at 5.45%, up from 5.24% in January. ClaudeBot saw the biggest jump among the top crawlers, rising from 4.33% to 4.62% (+0.29 pp). Google-Extended climbed from 4.06% to 4.36% (+0.30 pp), which tells me more site owners are finding Google's AI training opt-out mechanism.
Here's what bugs me about this data: meta-externalagent sits at only 3.26% despite being the second-largest AI crawler at 15.6% of traffic. That's the biggest gap between traffic share and blocking rate of any crawler on this list. Meta's bot generates nearly as much traffic as GPTBot and ClaudeBot combined, but far fewer domains block it. If you care about AI training crawlers hitting your site, check whether your robots.txt actually includes meta-externalagent.
The flat-to-declining trend from January reversed. Nearly every AI crawler saw increased robots.txt mentions in February:
| AI Crawler | January 2026 | February 2026 | Change |
|---|---|---|---|
| GPTBot | 5.24% | 5.45% | +0.21 |
| CCBot | 4.40% | 4.63% | +0.23 |
| ClaudeBot | 4.33% | 4.62% | +0.29 |
| Google-Extended | 4.06% | 4.36% | +0.30 |
| Bytespider | 3.43% | 3.73% | +0.30 |
The +0.30 pp increase for both Google-Extended and Bytespider stands out. Google-Extended adoption suggests growing awareness of Google's opt-out mechanism. Bytespider keeps attracting blocks despite its modest 3.3% traffic share, likely because of ongoing concerns about Chinese AI data collection.
The blocking wave may be entering a second phase, driven by more media coverage of AI crawling practices and more site owners realizing their content feeds model training. But let's be real: more than 94% of domains still allow all AI crawlers unrestricted access. The blocking minority is growing, but it's still a minority.
Workers AI, the platform that lets developers run AI models at the edge, had a new entry this month worth paying attention to.
| Model | Account Share (%) | Developer |
|---|---|---|
| Llama 3 8B Instruct | 40.1% | Meta |
| Stable Diffusion XL Base 1.0 | 13.4% | Stability AI |
| Whisper | 8.3% | OpenAI |
| Llama 4 Scout 17B | 6.7% | Meta |
| M2M-100 1.2B | 5.4% | Meta |
| Llama 3 8B Instruct (AWQ) | 4.9% | Meta |
| FLUX.1 Schnell | 2.5% | Black Forest Labs |
| GPT-OSS 120B | 1.6% | OpenAI |
| Whisper Large V3 Turbo | 1.6% | OpenAI |
Meta dominates here too. Llama 3 8B Instruct powers 40.1% of Workers AI accounts. Add Llama 4 Scout (6.7%), M2M-100 (5.4%), and the AWQ variant (4.9%) and Meta models cover 57.1% of usage by account share. That's down from ~60% in January because a newcomer took some share.
The newcomer: GPT-OSS 120B at 1.6%. OpenAI's open-source 120-billion-parameter model debuted on the leaderboard this month, matching Whisper Large V3 Turbo's share on its first appearance. It's the largest model by parameter count on the platform. Developers are willing to run bigger models at the edge for better inference quality, which tells me something about where edge AI is heading. OpenAI's combined Workers AI footprint (Whisper 8.3% + GPT-OSS 120B 1.6% + Whisper Large V3 Turbo 1.6%) now reaches 11.5%, making OpenAI the second-largest model provider on the platform behind Meta.
Llama 4 Scout dipped from 7.7% to 6.7%. The initial adoption rush is normalizing. It's still growing in absolute terms but losing share to GPT-OSS 120B and other entries competing for developer attention.
| Task Type | Share of Usage (%) |
|---|---|
| Text Generation | 62.6% |
| Text-to-Image | 19.7% |
| Automatic Speech Recognition | 10.1% |
| Translation | 5.4% |
| Text Classification | 0.7% |
| Image-to-Text | 0.6% |
| Text Embeddings | 0.5% |
| Text-to-Speech | 0.4% |
Mostly stable. Text generation held at 62.6% (down slightly from 63.0%), Text-to-Image ticked up to 19.7% (from 19.1%). One new category showed up: Text-to-Speech at 0.4%. Small number. But it's the first time voice synthesis appeared in the Workers AI task distribution, and edge deployment makes a lot of sense for latency-sensitive speech generation. I'll be watching whether this grows in the March data.
Here's what I'd actually prioritize based on this month's data:
Training crawlers are now the majority. If you've been letting all AI crawlers through because you assumed most of that traffic helped your search visibility, that assumption is wrong as of February 2026. Dedicated training crawlers (GPTBot, ClaudeBot, Meta-ExternalAgent) now account for more traffic than mixed-purpose bots. The content leaving your servers is more likely going into model weights than into search indexes.
Add meta-externalagent to your robots.txt. Meta's crawler is the #2 AI bot at 15.6% of traffic. Only 3.26% of domains block it. If you've already blocked GPTBot and ClaudeBot but skipped Meta's bot, you're leaving the door wide open to the fastest-growing training crawler on the web.
Try Google-Extended if you want a middle ground. Google-Extended adoption rose to 4.36% of domains (+0.30 pp month-over-month). It lets you opt out of Google's AI training while keeping your search rankings. If you want to reduce AI training access without hurting SEO, this is the simplest first step.
E-commerce operators: check your bandwidth costs. Retail and shopping sites absorb 31.3% of all AI bot traffic. Meta-ExternalAgent alone grew 31% this month, and the Training category hit 45.4%. If you haven't looked at how much bandwidth AI crawlers consume on your infrastructure, now's a good time.
Keep an eye on GPT-OSS 120B. OpenAI's open-source model showing up on Workers AI tells you something about where the open-source LLM space is going. A 120B-parameter model running at the edge opens up inference quality levels that weren't practical outside centralized API calls before.
Every AI crawler in this report exists because AI companies need fresh, structured web data to power their models and search products. At WebSearchAPI.ai, we sit on the other side of that equation. We give developers and AI agents a clean, fast way to access real-time web data without running their own crawlers.
Why this matters right now:
The volume and complexity of AI web crawling is accelerating, and the balance is shifting toward dedicated training. WebSearchAPI.ai is built for developers who want that web intelligence without becoming a crawling operation. Learn more about what a web search API can do for your stack.
An AI crawler (also called an AI bot or AI spider) is an automated program that visits websites to collect content for training AI models or powering AI-powered search features. Unlike traditional search engine crawlers that index pages for search results, AI crawlers like GPTBot, ClaudeBot, and Meta-ExternalAgent specifically collect data to train large language models (LLMs). Some crawlers like Googlebot serve both purposes, indexing for search and collecting training data at the same time. You can identify AI crawlers by their user-agent strings in your server logs or through tools like Cloudflare Radar.
Monthly. Each edition covers a rolling 30-day window and includes month-over-month comparisons so you can track trends over time. Bookmark this page or check back at the beginning of each month for the latest data on AI crawler traffic patterns, market share shifts, and robots.txt blocking trends.
Yes. Add disallow rules to your robots.txt file for specific AI crawler user agents. For example, adding User-agent: GPTBot followed by Disallow: / will request that OpenAI's crawler stop visiting your site. Robots.txt is voluntary though, crawlers aren't technically required to obey it. As of February 2026, only about 5.5% of domains block GPTBot and 4.6% block ClaudeBot. The vast majority of the web remains open to AI crawling. Some CDN providers like Cloudflare also offer dashboard-level controls to block or rate-limit AI bots.
AI training crawlers (GPTBot, ClaudeBot, Meta-ExternalAgent) collect web content to build and improve AI models. They scrape large volumes from many sites. AI search crawlers (like OAI-SearchBot) fetch specific pages in real time when someone performs a search through an AI tool like ChatGPT. Training crawlers take your content to make the model smarter. Search crawlers fetch your content to answer a specific question and may send traffic back to your site. As of February 2026, training crawling (45.4%) has overtaken mixed-purpose crawling (43.9%) for the first time, while search crawling continues to grow at 8.2%.
Blocking dedicated AI training crawlers like GPTBot, ClaudeBot, or Meta-ExternalAgent won't affect your rankings in Google, Bing, or other traditional search engines. Those crawlers are separate from search indexing bots. But blocking Googlebot will remove your site from Google Search entirely because Google uses the same crawler for both search indexing and AI training. Google-Extended is the middle ground. Blocking it opts you out of AI training while keeping your search presence. Adoption is growing: 4.36% of domains now use this option, up from 4.06% in January.
Cloudflare's global network spans 330+ cities in 125+ countries and processes over 81 million HTTP requests per second. Through its Radar platform, Cloudflare identifies and classifies AI bot traffic by analyzing user-agent strings, request patterns, and behavioral signatures across all sites on its network. The data in this report comes from Cloudflare Radar's AI Insights endpoints, which aggregate these signals into share-of-traffic percentages by bot, crawl purpose, industry, and region.
Meta-ExternalAgent. For a second consecutive month, it had the largest absolute market share gain, jumping from 11.9% to 15.6% (+30.8% relative increase). Over two months, it roughly doubled from 8.5% to 15.6%. In relative terms, Applebot (+17.0%), OAI-SearchBot (+16.8%), and Amazonbot (+15.0%) all showed strong growth from smaller bases.
The percentages in this report are share of identified AI bot requests, not share of total web traffic. Cloudflare Radar tracks AI crawler activity relative to other AI bots, giving a competitive view. The actual percentage of total web traffic from AI bots varies by site, but AI crawlers now account for a growing share of overall internet traffic, particularly for content-heavy sites in retail, technology, and media.
Check your server access logs for known AI bot user-agent strings (GPTBot, ClaudeBot, meta-externalagent, Bytespider, Amazonbot, etc.). Most web analytics platforms filter out bot traffic by default, so log-level analysis gives the most accurate picture. Cloudflare users can view AI bot activity directly in their dashboard. For a structured approach, consider using a web search API to see how your content appears in AI-powered search results and confirm your most important pages are accessible.
Where this data comes from and what it can't tell you matters. Here's how Cloudflare Radar collects, classifies, and aggregates the AI crawler data I used in this report.
| Metric | Value |
|---|---|
| Global presence | 330 cities in 125+ countries |
| HTTP requests | 81 million/second average, peaks >129 million/second |
| DNS queries | 67 million/second (authoritative + resolver) |
This scale is what makes the data useful. Two primary sources feed Cloudflare Radar:
For routing data, Cloudflare also uses RIPE RIS data from RIPE NCC (BGP route collectors).
Cloudflare uses a layered detection system:
💡 Expert Insight: The layered approach matters because not all AI crawlers identify themselves honestly. User-agent matching catches transparent bots like GPTBot and ClaudeBot. Behavioral analysis and honeypots catch crawlers that disguise themselves as regular browsers.
Bots fall into these categories:
⚠️ Warning: Keep these limitations in mind when interpreting the data in this report:
This edition uses data from Cloudflare Radar's AI Insights endpoint (get_ai_data) and robots.txt analysis endpoint (get_robots_txt_data). The data covers February 11 through March 13, 2026, with month-over-month comparisons using a 30-day control window (January 12 - February 11, 2026).
I queried bot traffic breakdowns by user agent, crawl purpose, industry, and vertical. Workers AI data covers model and task distribution by account share. Robots.txt analysis covers domain-level crawler policies for AI user agents. All percentages are share of identified AI bot requests (for crawling data) or share of accounts (for Workers AI data), not share of total web traffic.
Data source: Cloudflare Radar AI Insights, get_ai_data and get_robots_txt_data endpoints (radar.cloudflare.com). Last updated: March 13, 2026.
About the Author: I'm James Bennett, Lead Engineer at WebSearchAPI.ai, where I architect the core retrieval engine enabling LLMs and AI agents to access real-time, structured web data with over 99.9% uptime and sub-second query latency. With a background in distributed systems and search technologies, I've reduced AI hallucination rates by 45% through advanced ranking and content extraction pipelines for RAG systems. My expertise includes AI infrastructure, search technologies, large-scale data integration, and API architecture for real-time AI applications.
Credentials: B.Sc. Computer Science (University of Cambridge), M.Sc. Artificial Intelligence Systems (Imperial College London), Google Cloud Certified Professional Cloud Architect, AWS Certified Solutions Architect, Microsoft Azure AI Engineer, Certified Kubernetes Administrator, TensorFlow Developer Certificate.