Analysis of AI crawler traffic trends from May 2026, tracking the Q2 predictions made last month. Bytespider surged +61% to become the #4 AI crawler at 10.5%, passing ClaudeBot, Bingbot, and Applebot. Applebot's April top-five entry reversed — it fell from 9.1% to 7.0%. Googlebot kept falling to 27.1%. Training crawling plateaued at 51.8%, while Search crawling jumped to 9.6% as Claude-SearchBot tripled and overtook OAI-SearchBot. On Workers AI, Moonshot's Kimi K2.6 exploded from a new entrant to the #2 model at 18.4%. Complete breakdown of crawler market share, industry targeting, crawl-to-refer ratios, robots.txt directives, and Workers AI shifts from Cloudflare Radar data.
This report is updated monthly with fresh Cloudflare Radar data. Bookmark this page to track how AI crawlers are reshaping web traffic each month.
Last month I told website owners to update their Applebot policy because Apple's crawler had leapfrogged Bingbot into the top five. One month later, that move has already unwound: Applebot fell from 9.1% to 7.0% and dropped back to #7. The crawler that took its place at the top is one almost nobody flagged a quarter ago — ByteDance's Bytespider surged +61% to 10.5%, vaulting past ClaudeBot, Bingbot, and Applebot to become the fourth-largest AI crawler on the web. Meanwhile Googlebot kept sliding (29.6% → 27.1%), and the "Training will exceed 55% by June" prediction is now in trouble: training crawling plateaued at 51.8%, flat month-over-month, while Search crawling quietly jumped from 7.7% to 9.6%.
I analyzed the latest 28 days of Cloudflare Radar AI Insights data — the window covering May 5 through June 2, 2026 — and compared it against the preceding 28-day window (April 7 through May 5) so every month-over-month figure below is computed on an identical basis. The picture is the same diversification story I've tracked all year, but with two sharp reversals: the bots everyone added to their robots.txt last month (Applebot) cooled, and the model everyone wrote off as a curiosity (Moonshot's Kimi K2.6) exploded from a single-digit new entrant to the #2 model on Workers AI at 18.4%. If you want the foundational primer on the pipelines behind this data, our explainer on how search engines really work walks through crawl budgets, inverted indexes, and learning-to-rank — all the systems these bots are feeding.
📊 Stats Alert: Bytespider grew +61% month-over-month (6.5% → 10.5%) to become the #4 AI crawler. Applebot's April top-five surge reversed, falling from 9.1% to 7.0%. Googlebot dropped to 27.1%. Training crawling plateaued at 51.8% while Search crawling jumped to 9.6%. On Workers AI, Kimi K2.6 rocketed to 18.4% to become the #2 model.
Googlebot remains the largest AI-related crawler globally, but its decline accelerated again — from 29.6% to 27.1%, a -2.5 pp drop and a new all-time low in Cloudflare Radar's tracking. Meta-ExternalAgent stays in second at 13.1%, down for the second straight month. The big reshuffle happened just behind them: GPTBot recovered to 11.5% (#3), Bytespider exploded to 10.5% (#4), and ClaudeBot slipped to 9.5% (#5) — meaning ByteDance's crawler now out-crawls Anthropic's. Applebot's April leap reversed completely, falling -2.0 pp to 7.0% and surrendering the #5 spot it briefly held.
The other quiet headline is in the search tier: Claude-SearchBot tripled to 2.4% and overtook OpenAI's OAI-SearchBot (which fell back into the long tail). Anthropic's dedicated Claude web search crawler is now the single largest AI search crawler on the web — a notable lead for Anthropic in real-time retrieval, distinct from ClaudeBot's training role.
| AI Bot | May 2026 Share (%) | April 2026 Share (%) | Operator | Primary Purpose |
|---|---|---|---|---|
| Googlebot | 27.1% | 29.6% | Search indexing + AI training (mixed) | |
| Meta-ExternalAgent | 13.1% | 14.4% | Meta | AI training |
| GPTBot | 11.5% | 10.2% | OpenAI | Model training |
| Bytespider | 10.5% | 6.5% | ByteDance | AI training |
| ClaudeBot | 9.5% | 11.5% | Anthropic | Model training |
| Bingbot | 8.3% | 8.2% | Microsoft | Search indexing + AI (mixed) |
| Applebot | 7.0% | 9.1% | Apple | Search + AI features |
| Amazonbot | 5.3% | 4.8% | Amazon | AI training |
| Claude-SearchBot | 2.4% | <0.8% | Anthropic | Claude web search |
The concentration of crawling power among the traditional top five companies — Google, Meta, OpenAI, Anthropic, and Microsoft — fell to 69.5% in May (counting each company's primary crawler), down from 73.9% in April and continuing a five-month slide from 84.5% in January. The displacement is no longer a "long tail" story — it's ByteDance. Bytespider alone (10.5%) is now the fourth-largest individual AI crawler on the web, out-crawling every bot except Googlebot, Meta-ExternalAgent, and GPTBot — including both Anthropic's ClaudeBot (9.5%) and Microsoft's Bingbot (8.3%). Add ByteDance's second crawler (TikTokSpider, in the long tail) and ByteDance sits firmly among the top handful of AI crawler operators, a status it didn't hold a quarter ago. The diversification I've documented all year has a clear new winner, and it isn't Apple.
If you're managing AI bot access on your site, the operator that most needs a fresh look this month is ByteDance — Bytespider's traffic has nearly tripled since March (3.6% → 10.5%). For context on how crawling translates into actual referral traffic back to websites, check our companion Search Engine Referral Report on crawl-to-refer ratios — and see the new crawl-to-refer breakdown further down this report.
The headline shift in May is a single-crawler surge offsetting broad declines at the top. Googlebot (-2.5 pp), Applebot (-2.0 pp), ClaudeBot (-2.0 pp), and Meta-ExternalAgent (-1.4 pp) all gave up ground — and almost all of it flowed to Bytespider (+4.0 pp), with smaller gains to GPTBot (+1.3 pp), Amazonbot (+0.5 pp), and the search tier (Claude-SearchBot). This is the inverse of April, when the gains dispersed across many crawlers; in May they concentrated in one.
| AI Bot | April 2026 Share | May 2026 Share | Change (pp) | Relative Change |
|---|---|---|---|---|
| Googlebot | 29.6% | 27.1% | -2.5 | -8.4% |
| Meta-ExternalAgent | 14.4% | 13.1% | -1.4 | -9.5% |
| GPTBot | 10.2% | 11.5% | +1.3 | +12.8% |
| Bytespider | 6.5% | 10.5% | +4.0 | +61.1% |
| ClaudeBot | 11.5% | 9.5% | -2.0 | -17.0% |
| Bingbot | 8.2% | 8.3% | +0.1 | +1.5% |
| Applebot | 9.1% | 7.0% | -2.0 | -22.4% |
| Amazonbot | 4.8% | 5.3% | +0.5 | +10.9% |
| Claude-SearchBot | <0.8% | 2.4% | +1.6 | New top tier |
| OAI-SearchBot | 1.8% | <1.0% | -0.8 | Fell to tail |
Six trends stand out from this comparison:
Bytespider is the story of the month. ByteDance's crawler jumped from 6.5% to 10.5% (+4.0 pp, +61% relative) — the single largest absolute and relative gain of any AI crawler in May, and its third consecutive month of growth (3.6% in March → 6.5% in April → 10.5% in May). Bytespider now out-crawls ClaudeBot, Bingbot, Applebot, and Amazonbot individually. ByteDance has gone from a rounding error to the fourth-largest AI crawler on the web in one quarter.
Applebot's surge reversed. Last month's most-confident call — "Applebot has entered the top five, update your policy" — didn't hold. Applebot fell -2.0 pp to 7.0%, dropping back below Bingbot to #7. April's spike looks increasingly like a one-month training burst (a backfill of content for Apple Intelligence) rather than a sustained ramp. This is a useful reminder that a single month of crawl data can mislead; the trend matters more than the spike.
ClaudeBot slipped below GPTBot. ClaudeBot fell -2.0 pp to 9.5% (#5) while GPTBot recovered +1.3 pp to 11.5% (#3). Anthropic's training crawler lost ground even as its search crawler (Claude-SearchBot) gained — suggesting Anthropic may be shifting effort from bulk training collection toward real-time retrieval.
Googlebot's decline accelerated again. After a -1.6 pp dip the prior month, Googlebot dropped -2.5 pp to 27.1% — no sign of the floor I speculated about last month. Google has now lost more than 11 percentage points of AI-crawler share since January's 38.7%.
The search tier flipped to Anthropic. Claude-SearchBot tripled to 2.4% and overtook OAI-SearchBot, which fell out of the top nine. For the first time, Anthropic operates the largest dedicated AI search crawler — a mirror image of the training market, where OpenAI's GPTBot leads Anthropic's ClaudeBot. If you're building apps that rely on real-time retrieval, understanding what a web search API is and how these bots work under the hood matters more than ever.
Meta kept contracting. Meta-ExternalAgent fell a second straight month, -1.4 pp to 13.1%. The "plateau at 18-20%" I predicted in the Q1 review is now decisively wrong in the opposite direction — Meta has lost 3.6 pp across April and May combined, the clearest sign that its aggressive Q1 Llama-training crawl was a finite campaign, not a new baseline.
This is where last month's confident extrapolation broke. In April I projected Training would reach 55% by mid-Q2. Instead, Training plateaued at 51.8% in May — statistically flat (-0.04 pp) after seven straight months of gains. The category that actually moved was Search, which jumped +1.9 pp to 9.6%, its largest single-month gain on record.
| Crawl Purpose | May 2026 Share | April 2026 Share | Change (pp) |
|---|---|---|---|
| Training | 51.8% | 51.8% | 0.0 |
| Mixed Purpose | 35.4% | 37.8% | -2.4 |
| Search | 9.6% | 7.7% | +1.9 |
| User Action | 2.6% | 2.3% | +0.3 |
| Undeclared | 0.6% | 0.5% | +0.1 |
Here's what the May data tells website owners:
Training plateaued at 51.8%. After climbing from 42.0% in January to 51.8% in April, the seven-month growth streak stalled in May. Training is still the clear majority — for every 100 AI bot requests, 52 are explicitly dedicated to training — but the "Training takeover" has reached a near-term ceiling. The simplest read: the Q1–Q2 land-grab of bulk training data is largely complete, and the major labs are crawling for refresh, not first acquisition. Bytespider's surge kept the category from declining; without it, Training would have fallen.
Mixed Purpose kept sliding to 35.4%. Crawlers that simultaneously index for search and collect training data have now lost ground for five consecutive months — from 48.3% in January to 35.4% in May, a cumulative -12.9 pp drop that tracks Googlebot's own decline almost exactly. You still can't separate the AI training from the search indexing with these crawlers — block Googlebot and you disappear from Google Search. For developers looking at how to ground AI responses with Google Search alternatives, this bundling problem keeps coming up.
Search crawling surged to 9.6% — the real shift this month. After holding flat for two months, AI search crawlers jumped +1.9 pp, driven almost entirely by Claude-SearchBot tripling to 2.4% (now the largest dedicated AI search crawler, having passed OAI-SearchBot). This is the strongest signal yet that the AI industry's center of gravity is shifting from "crawl everything once to train" toward "fetch pages in real time to answer queries." The growing ecosystem of AI search API alternatives is the developer-facing side of exactly this trend.
User Action ticked up to 2.6%. Real-time, user-triggered fetches (ChatGPT-User and equivalents) continued their slow climb. This is the category most directly tied to AI assistants' "browse" features, and it has now risen three months running.
The structural story of May is a two-front shift toward retrieval: Search (+1.9 pp) and User Action (+0.3 pp) both gained while Training stalled and Mixed Purpose fell. If April was the month Training crossed the majority line, May is the month that line stopped moving — and real-time retrieval started to.
The industry mix moved meaningfully in May, and in the opposite direction from April. Shopping & General Merchandise fell from 28.2% to 25.0% (-3.2 pp) — the largest single-month drop of any vertical this year — while Computer & Electronics (+1.2 pp) and News, Media & Publications (+1.1 pp) absorbed most of the redistributed crawling. The retail-content land grab that defined Q1 is cooling.
| Industry Vertical | May 2026 Share | April 2026 Share | Change (pp) |
|---|---|---|---|
| Shopping & General Merchandise | 25.0% | 28.2% | -3.2 |
| Internet and Telecom | 21.9% | 21.6% | +0.3 |
| Computer and Electronics | 19.0% | 17.8% | +1.2 |
| News, Media, and Publications | 9.2% | 8.1% | +1.1 |
| Gambling | 6.8% | 6.6% | +0.2 |
| Business and Industry | 3.6% | 3.4% | +0.1 |
| Professional Services | 2.7% | 2.5% | +0.1 |
| Finance | 2.6% | 2.5% | +0.1 |
| Games | 2.3% | 2.2% | 0.0 |
Shopping is still the most-crawled vertical, but its lead over Internet & Telecom narrowed to just 3.1 pp — the tightest gap on record, down from a double-digit lead at the start of the year. The News, Media & Publications rise (+1.1 pp) is worth watching: as Search and User-Action crawling grows (see the crawl-purpose section above), AI assistants are fetching more news and reference content in real time to answer time-sensitive queries — a different pattern than bulk e-commerce training scrapes.
The industry-level breakdown gets more granular:
| Industry | May 2026 Share | April 2026 Share | Change (pp) |
|---|---|---|---|
| Retail | 22.5% | 25.4% | -2.9 |
| Computer Software | 17.2% | 16.0% | +1.2 |
| IT and Services | 7.0% | 6.4% | +0.6 |
| Gambling & Casinos | 6.2% | 6.0% | +0.2 |
| Marketing and Advertising | 5.5% | 6.0% | -0.5 |
| Media | 5.0% | 5.7% | -0.7 |
| Internet | 4.7% | 4.5% | +0.2 |
| Adult Entertainment | 4.6% | 3.6% | +1.0 |
| Telecommunications | 3.0% | 2.9% | +0.1 |
The most notable industry-level move is Retail falling -2.9 pp to 22.5%, mirroring the Shopping vertical's decline, while Computer Software climbed +1.2 pp to 17.2% — AI crawlers are increasingly targeting documentation, code, and developer content over product catalogs. Adult Entertainment also jumped +1.0 pp to 4.6%, a recurring pattern when training crawlers do broad recrawls. If you maintain developer documentation, API references, or technical tutorials, this is why choosing the right AI web search API for your applications matters — these crawlers are the infrastructure behind the search results your users see, and software content is now the second-most-crawled industry on the web.
AI training crawlers aren't the only bots scanning the web at this scale, either. Technology detection platforms like Technologychecker.io crawl and fingerprint over 50 million domains using HTTP header analysis, JavaScript fingerprinting, DNS lookups, and headless browser rendering to identify 40,000+ technologies. Unlike AI training crawlers that take content for model weights, technology intelligence crawlers need to re-crawl frequently to track stack changes and new tech adoptions.
New this month, I pulled Cloudflare Radar's crawl-to-refer ratio — the number of pages an operator crawls for every one referral it sends back to a website. It's the single most honest measure of whether a given AI company is a fair exchange or a pure extractor, and the May numbers are stark.
| Operator | May 2026 (crawls : 1 referral) | April 2026 | Direction |
|---|---|---|---|
| Anthropic | 11,992 : 1 | 12,126 : 1 | Most extractive |
| OpenAI | 1,056 : 1 | 1,034 : 1 | Slightly worse |
| Perplexity | 142 : 1 | 121 : 1 | Worsening |
| Mistral | 60 : 1 | 24 : 1 | Worsening fast |
| Microsoft | 34 : 1 | 30 : 1 | Worsening |
| Yandex | 24 : 1 | 21 : 1 | Worsening |
| Baidu | 12 : 1 | 9 : 1 | Worsening |
| ByteDance | 9 : 1 | 9 : 1 | Flat |
| 5 : 1 | 5 : 1 | Most generous | |
| DuckDuckGo | 1.6 : 1 | 1.6 : 1 | Near-even |
According to Cloudflare Radar, Anthropic crawled roughly 11,992 pages in May for every single visitor it referred back — by far the most lopsided ratio of any major operator, and consistent with ClaudeBot's training-heavy footprint. OpenAI sits at 1,057:1, while Google returns the most traffic relative to what it takes, at just 5:1 — the structural advantage of running search and AI crawling through bundled infrastructure that still drives clicks. DuckDuckGo, which leans on others' indexes, is nearly even at 1.6:1.
For website owners, this is the number that reframes the "should I block AI crawlers?" question. A 5:1 ratio (Google) is a recognizable search-engine bargain — you give crawl access, you get visitors. A ~12,000:1 ratio (Anthropic) is not a bargain in the traditional sense; it's content acquisition with almost no traffic return. That gap is the clearest data-backed case for treating training crawlers and search crawlers differently in your robots.txt — exactly the separation Anthropic's own ClaudeBot/Claude-SearchBot split now makes possible.
Source: Cloudflare Radar — radar/bots/crawlers/summary/crawl_refer_ratio (radar.cloudflare.com), May 5 – June 2, 2026 vs. April 7 – May 5, 2026.
⚠️ Methodology note: To sidestep the share-denominator problem that complicated last month's edition (Cloudflare refreshed its parsed robots.txt corpus in early Q2), this section reports raw domain counts — the number of distinct domains whose robots.txt explicitly names each crawler — rather than percentages. Counts are directly comparable month-to-month. May figures are the most recent Radar snapshot; April figures are last month's snapshot.
| AI Crawler | Domains (May 2026) | Domains (April 2026) | Change | Operator |
|---|---|---|---|---|
| GPTBot | 632 | 597 | +35 | OpenAI |
| ClaudeBot | 539 | 504 | +35 | Anthropic |
| Google-Extended | 502 | 476 | +26 | |
| CCBot | 501 | 493 | +8 | Common Crawl |
| Bytespider | 420 | 410 | +10 | ByteDance |
| Googlebot | 402 | 398 | +4 | |
| PerplexityBot | 382 | 358 | +24 | Perplexity |
| meta-externalagent | 371 | 368 | +3 | Meta |
| Amazonbot | 367 | 345 | +22 | Amazon |
| facebookexternalhit | 355 | — | New | Meta |
| ChatGPT-User | 346 | 322 | +24 | OpenAI |
| Applebot-Extended | 338 | 338 | 0 | Apple |
| OAI-SearchBot | 282 | 270 | +12 | OpenAI |
| anthropic-ai | 246 | 245 | +1 | Anthropic |
Two rank changes stand out. Google-Extended overtook CCBot for #3, gaining +26 domains versus Common Crawl's +8 — website owners are increasingly opting out of Google's AI training (Google-Extended) while keeping search visibility (Googlebot). And PerplexityBot climbed to #7 (+24 domains), continuing the pattern where blocking activity tracks a crawler's visibility in logs rather than its raw traffic share.
The most telling non-move is Applebot-Extended, flat at 338 domains despite Applebot's traffic falling -2.0 pp this month. Robots.txt adoption lags traffic by a month or more — the owners who added Applebot-Extended directives during April's surge are still carrying them now that the surge has faded. Adoption is sticky; it accumulates on awareness and rarely reverses.
The meta-externalagent gap persists: despite being the #2 AI crawler by traffic at 13.1%, Meta's bot ranks only #8 by robots.txt references — still one of the largest disparities between traffic share and blocking attention among major crawlers. If you maintain a robots.txt and you're worried about training crawlers, Meta-ExternalAgent and the now-#4 Bytespider are the two highest-traffic bots most under-referenced relative to their footprint.
Two qualitative observations continue to hold:
Beyond crawlers, Cloudflare Radar tracks usage patterns on Cloudflare Workers AI — the platform that lets developers run AI models at the edge. If April reshuffled the model leaderboard, May rewrote it: Moonshot AI's Kimi K2.6 exploded from a low-single-digit new entrant to the #2 model at 18.4% — the largest single-month gain for any Workers AI model on record — while OpenAI's GPT-OSS 120B, last month's breakout, fell out of the top tier entirely.
| Model | May 2026 Share | April 2026 Share | Change (pp) | Developer |
|---|---|---|---|---|
| Llama 3 8B Instruct | 21.5% | 28.0% | -6.5 | Meta |
| Kimi K2.6 | 18.4% | ~2% (new) | +16 | Moonshot AI |
| Gemma 4 26B-A4B-IT | 13.0% | 8.2% | +4.8 | |
| Stable Diffusion XL Base 1.0 | 7.9% | 9.5% | -1.6 | Stability AI |
| Llama 4 Scout 17B | 5.0% | 6.0% | -1.0 | Meta |
| Whisper | 5.0% | 5.7% | -0.7 | OpenAI |
| M2M-100 1.2B | 3.3% | 3.9% | -0.6 | Meta |
| FLUX.1 Schnell | 3.0% | ~3% (tail) | ~0 | Black Forest Labs |
| Llama 3 8B Instruct (AWQ) | 2.6% | 3.5% | -0.9 | Meta |
| GPT-OSS 120B | <2.6% (tail) | 5.5% | -3+ | OpenAI |
The headline is Kimi K2.6. Moonshot AI's model debuted at roughly 2% in April as "the first non-US model to crack the top 11" — a footnote. One month later it's the #2 model on Workers AI at 18.4%, a ~16 pp jump that is the single largest one-month gain Radar has recorded for any model. Combined with Google's Gemma 4 26B climbing to 13.0% (#3), two of the top three Workers AI models are now non-Meta — something that was unthinkable when Llama variants held ~60% of the platform in January.
Llama 3 8B keeps eroding. Still #1, but down to 21.5% from 28.0% — a third consecutive monthly decline, and less than half its 41.7% January peak. Meta's overall provider share fell to roughly 32–33% (Llama 3 8B + Llama 4 Scout + M2M-100 + Llama 3 AWQ), down from ~46% in April and ~60% in January. The Workers AI platform has gone from Meta-dominated to genuinely multipolar in two quarters.
The reversal that mirrors the crawler data: OpenAI's GPT-OSS 120B, which I called "the breakout adoption story of Q2" last month at 5.5%, fell out of the top nine (now under 2.6%). Just as Applebot's traffic surge didn't hold, GPT-OSS 120B's adoption spike didn't either. Both are reminders that a single month of momentum on Cloudflare's platform doesn't guarantee a trend — the developers experimenting with a new model in April moved on to Kimi K2.6 in May.
| Task Type | May 2026 Share | April 2026 Share | Change (pp) |
|---|---|---|---|
| Text Generation | 66.1% | 54.1% | +12.0 |
| Text-to-Image | 18.2% | 25.5% | -7.3 |
| Automatic Speech Recognition | 9.0% | 11.3% | -2.4 |
| Translation | 4.4% | 5.9% | -1.5 |
| Text Classification | 1.0% | 1.1% | -0.1 |
| Text-to-Speech | 0.7% | 1.3% | -0.6 |
| Image-to-Text | 0.6% | 0.6% | 0.0 |
Text Generation surged +12.0 pp to 66.1% — the largest single-month gain in any task category Radar tracks — directly attributable to Kimi K2.6 and Gemma 4 26B, both text-generation workloads, absorbing the share that fled Stable Diffusion XL. Text-to-Image fell -7.3 pp to 18.2% as image generation cooled across the board. The edge-AI workload mix is consolidating around text: between Text Generation and Translation, language tasks now account for more than 70% of all Workers AI inference.
Q1 2026 was the most transformative quarter in the history of AI web crawling. This section compiles data from three consecutive monthly analyses to document the structural shifts reshaping how AI companies interact with the open web.
| Metric | January 2026 | February 2026 | March 2026 | Q1 Change |
|---|---|---|---|---|
| Top crawler (Googlebot) | 38.7% | 34.6% | 31.6% | -7.1 pp |
| #2 crawler | GPTBot (12.8%) | Meta-ExternalAgent (15.6%) | Meta-ExternalAgent (16.7%) | Meta took #2 |
| Training crawl share | 42.0% | 45.4% | 49.9% | +7.9 pp |
| Mixed Purpose crawl share | 48.3% | 43.9% | 39.9% | -8.4 pp |
| Top 5 company concentration | 84.5% | 82.8% | 80.2% | -4.3 pp |
| Domains blocking GPTBot | 5.29% | 5.45% | 5.52% | +0.23 pp |
| Llama 3 8B (Workers AI) | 41.7% | 40.1% | 37.3% | -4.4 pp |
| AI Bot | Jan 2026 | Feb 2026 | Mar 2026 | Q1 Change (pp) | Q1 Direction |
|---|---|---|---|---|---|
| Googlebot | 38.7% | 34.6% | 31.6% | -7.1 | Declining |
| Meta-ExternalAgent | 11.6% | 15.6% | 16.7% | +5.1 | Rising |
| GPTBot | 12.8% | 12.1% | 12.0% | -0.8 | Stable/Slow decline |
| ClaudeBot | 11.4% | 11.1% | 11.7% | +0.3 | V-shaped recovery |
| Bingbot | 9.7% | 9.3% | 8.2% | -1.5 | Declining |
| Applebot | 2.5% | 3.1% | 5.8% | +3.3 | Surging |
| Amazonbot | 4.8% | 5.4% | 4.4% | -0.4 | Volatile |
| Bytespider | 3.5% | 3.3% | 3.6% | +0.1 | Flat |
| OAI-SearchBot | 2.0% | 2.6% | 2.2% | +0.2 | Volatile |
Googlebot's sustained decline is the defining trend of Q1. Losing 7.1 pp over three months represents the largest quarterly share loss for any single crawler in Cloudflare Radar's tracking history. This doesn't necessarily mean Google is crawling less — it means competitors are crawling more, faster. At 31.6%, Googlebot is no longer 2x the size of the next-largest crawler; the ratio to Meta-ExternalAgent is now 1.9x and shrinking.
Meta-ExternalAgent was the biggest winner of Q1. Gaining +5.1 pp over three months, Meta's crawler overtook GPTBot in February and never looked back. The growth rate decelerated across the quarter (+3.1 pp → +3.7 pp → +2.3 pp), suggesting Meta may be approaching a near-term equilibrium.
Applebot's March surge was the quarter's biggest surprise. After modest growth in January (+0.2 pp) and February (+0.6 pp), Applebot exploded in March (+3.2 pp, +124% relative). Over Q1, Applebot more than doubled from 2.5% to 5.8%. Apple is now a top-six AI crawler operator — a status no one predicted at the start of the quarter.
The most consequential structural shift of Q1 2026 happened in the crawl purpose data:
| Crawl Purpose | Jan 2026 | Feb 2026 | Mar 2026 | Q1 Change (pp) |
|---|---|---|---|---|
| Training | 42.0% | 45.4% | 49.9% | +7.9 |
| Mixed Purpose | 48.3% | 43.9% | 39.9% | -8.4 |
| Search | 6.9% | 8.2% | 7.7% | +0.8 |
| User Action | 2.2% | 2.0% | 2.1% | -0.1 |
| Undeclared | 0.5% | 0.4% | 0.4% | -0.1 |
In January, Mixed Purpose led Training by 6.3 pp (48.3% vs 42.0%). By February, Training had overtaken Mixed Purpose for the first time (45.4% vs 43.9%). By March, the gap had widened to 10 pp (49.9% vs 39.9%).
This crossover represents a fundamental change in how AI companies interact with the web:
The shift gives website owners more control. You can now block the majority of AI training crawling without sacrificing search visibility — something that wasn't possible when Mixed Purpose crawlers dominated.
| Metric | January | February | March |
|---|---|---|---|
| Top 5 companies (Google, Meta, OpenAI, Anthropic, Microsoft) | 84.5% | 82.8% | 80.2% |
| Top 6 companies (+ Apple) | 87.0% | 85.9% | 86.0% |
| Top 4 crawlers share | 74.4% | 73.5% | 72.0% |
The AI crawler market is diversifying. The traditional top five lost 4.3 pp of concentration over Q1, but when you include Apple as a sixth major player, the top-six concentration remained remarkably stable at ~86%. The diversification is happening within the top tier, not from outside it.
| AI Crawler | Jan 2026 | Feb 2026 | Mar 2026 | Q1 Change |
|---|---|---|---|---|
| GPTBot | 5.29% | 5.45% | 5.52% | +0.23 pp |
| CCBot | 4.40% | 4.63% | 4.53% | +0.13 pp |
| ClaudeBot | 4.33% | 4.62% | 4.72% | +0.39 pp |
| Google-Extended | 4.04% | 4.36% | 4.37% | +0.33 pp |
| Googlebot | 4.13% | 3.77% | 3.80% | -0.33 pp |
| Bytespider | 3.69% | 3.73% | 3.67% | -0.02 pp |
| meta-externalagent | 3.24% | 3.26% | 3.34% | +0.10 pp |
| Amazonbot | 2.99% | 3.16% | 3.29% | +0.30 pp |
ClaudeBot saw the largest blocking increase across Q1 at +0.39 pp, overtaking CCBot to become the second-most referenced AI crawler in robots.txt files behind GPTBot. The blocking wave is real but slow — over the entire quarter, GPTBot grew from 5.29% to just 5.52%. More than 94% of domains still allow all AI crawlers unrestricted access. At the current rate, it would take years for blocking rates to reach even 10%.
| AI Crawler | March Traffic Share | March Blocking Rate | Gap (pp) |
|---|---|---|---|
| Googlebot | 31.6% | 3.80% | 27.8 |
| Meta-ExternalAgent | 16.7% | 3.34% | 13.4 |
| GPTBot | 12.0% | 5.52% | 6.5 |
| ClaudeBot | 11.7% | 4.72% | 7.0 |
| Amazonbot | 4.4% | 3.29% | 1.1 |
Meta-ExternalAgent's 13.4 pp gap is the most actionable — it's a dedicated training crawler with no search indexing benefit, yet it's blocked by fewer domains than GPTBot despite generating more traffic.
| Model | Jan 2026 | Feb 2026 | Mar 2026 | Q1 Change |
|---|---|---|---|---|
| Llama 3 8B Instruct | 41.7% | 40.1% | 37.3% | -4.4 pp |
| Stable Diffusion XL Base 1.0 | 13.4% | 13.4% | 12.3% | -1.1 pp |
| Whisper | 8.5% | 8.3% | 7.5% | -1.0 pp |
| Llama 4 Scout 17B | 7.7% | 6.7% | 7.0% | -0.7 pp |
| M2M-100 1.2B | 5.6% | 5.4% | 5.1% | -0.5 pp |
| Llama 3 8B Instruct (AWQ) | 4.7% | 4.9% | 4.4% | -0.3 pp |
| FLUX.1 Schnell | 2.4% | 2.5% | 3.0% | +0.6 pp |
| GPT-OSS 120B | -- | 1.6% | 2.1% | New (+2.1 pp) |
| Whisper Large V3 Turbo | 1.4% | 1.6% | 1.7% | +0.3 pp |
Every top model except FLUX.1 Schnell, GPT-OSS 120B, and Whisper Large V3 Turbo lost share across Q1. Meta's dominance is slowly eroding — from ~60% in January to 53.8% in March. GPT-OSS 120B is the breakout model of Q1, debuting in February and climbing to 2.1% in March. FLUX.1 Schnell is gaining on Stable Diffusion XL, emerging as a credible challenger in the text-to-image space.
The Training Takeover. Training crawlers went from minority (42.0%) to near-majority (49.9%) in a single quarter. The web's content is now primarily being consumed for AI model weights rather than search indexing. Website owners who want to opt out of AI training have clearer tools to do so.
The Google Erosion. Googlebot lost 7.1 pp across Q1. This is almost certainly driven by competitors growing faster rather than Google crawling less, but the proportional shift matters for every website owner's traffic analysis.
The Meta Ascendancy. Meta-ExternalAgent gained +5.1 pp, overtook GPTBot for #2 in February, and finished the quarter at 16.7%. Meta's aggressive Llama training pipeline is driving unprecedented data collection.
Apple's Arrival. Applebot went from 2.5% to 5.8% across Q1, with most growth concentrated in March. Apple is now a top-six AI crawler operator and should be included in any website owner's AI bot management strategy.
The Diversification of Workers AI. The model ecosystem became meaningfully more diverse. Llama 3 8B's share fell from 41.7% to 37.3% as developers adopted newer models and the long tail grew from ~15% to ~20%.
Based on Q1 trends, here's what I forecast for Q2 2026 — scored against May data in the tracker below:
Training crawlers will exceed 55%. The +7.9 pp Q1 trajectory suggests Training could reach 55-57% by June 2026, with Mixed Purpose falling below 35%.
Googlebot will drop below 30%. The current quarterly decline rate would put Googlebot in the 28-30% range by end of Q2.
Applebot will enter the top five. If Apple sustains even half of March's growth rate, Applebot could pass Bingbot (currently 8.2%) by May or June.
Meta-ExternalAgent will plateau around 18-20%. The deceleration trend suggests Meta's growth rate is stabilizing.
GPT-OSS 120B will enter the Workers AI top five. At its current growth rate, it could reach 3-4% by June, approaching FLUX.1 Schnell territory.
Robots.txt blocking will remain below 6% for all crawlers. The slow growth rate means meaningful blocking thresholds remain years away.
With one month of Q2 already scored in April and a second now in, here's where each prediction stands. The pattern from last month inverted: the calls that "hit early" in April (Applebot top-five, GPT-OSS 120B) have since reversed, while the slow-burn structural calls (Googlebot decline) keep compounding.
| # | Q2 Prediction (made in April) | Window | May 2026 Actual | Verdict |
|---|---|---|---|---|
| 1 | Training crawlers will exceed 55% | By June 2026 | Plateaued at 51.8% (flat MoM) | ❌ Off track |
| 2 | Googlebot will drop below 30% | End of Q2 | 27.1% (and still falling) | ✅ Hit |
| 3 | Applebot will enter the top five | May or June | Hit in April, reversed to #7 (7.0%) | ⚠️ Hit, then reversed |
| 4 | Meta-ExternalAgent will plateau at 18-20% | Q2 | Fell again to 13.1% | ❌ Miss |
| 5 | GPT-OSS 120B will enter Workers AI top five | By June | Hit ~5.5% in April, fell out of top 9 | ⚠️ Hit, then reversed |
| 6 | Robots.txt blocking will remain below 6% | Q2 | Top crawler in 632 of ~thousands of domains | ✅ Holds |
The training prediction has broken. Last month I said 55% by June was "on track" at a ~+1.7 pp/month pace. May delivered +0.0 pp. Training didn't just decelerate — it stopped. Barring a June surge from Bytespider, this prediction will miss, and the more interesting story (real-time Search crawling rising to fill the gap) is the one I should have been forecasting.
Two predictions hit, then reversed — the lesson of the month. Applebot entered the top five in April (prediction #3, "hit early") and GPT-OSS 120B reached the Workers AI top five (#5, "near hit"). Both unwound in May: Applebot fell to #7, GPT-OSS 120B dropped out of the top nine. A single month above a threshold is not the same as crossing it durably. This is the clearest cautionary tale in two editions of this report — momentum-based "early hits" are the most fragile kind of prediction.
The structural calls keep compounding. Googlebot's sub-30% prediction (#2) didn't just hit — it overshot to 27.1%, with no floor in sight. The Meta plateau prediction (#4) keeps getting more wrong as Meta contracts further. The common thread: predictions grounded in a sustained, multi-month structural trend (Google's proportional erosion, Meta's finite training campaign) are far more reliable than predictions that extrapolate a single steep month.
Revised forecasts for the rest of Q2 (June):
Based on what I've found in the May 2026 Cloudflare Radar data, here are the actions I'd prioritize:
Make Bytespider your top robots.txt priority this month. ByteDance's Bytespider surged to 10.5% — the #4 AI crawler, ahead of ClaudeBot, Bingbot, and Applebot, and nearly triple its March share. If your robots.txt was written around the "big four" of Google, Meta, OpenAI, and Anthropic, it now misses the fastest-growing major crawler on the web. Add an explicit Bytespider directive.
Don't over-react to last month's Applebot surge. I told you in April to update your Applebot policy because it had entered the top five. It has since fallen back to #7 (9.1% → 7.0%). Keep your Applebot-Extended directive if you added one — adoption is sticky and Apple may ramp again — but Applebot is no longer the urgent case it looked like a month ago. The urgent case is Bytespider.
Separate training from search — the crawl-to-refer data proves why. Anthropic crawls roughly 12,000 pages for every referral it sends back (vs. Google's 5:1). Claude-SearchBot (search, now 2.4% and the largest AI search crawler) is distinct from ClaudeBot (training, 9.5%). If you want Claude's search to surface your content while opting out of training, use separate directives — and apply the same logic to OpenAI's GPTBot/OAI-SearchBot split.
Watch the shift from training to retrieval. Training crawling plateaued at 51.8% while Search jumped to 9.6%. A robots.txt policy written purely to block bulk training scrapers will increasingly miss the real-time search and user-action fetches that actually drive AI-assistant citations back to your site. Decide deliberately whether you want to be in AI answers (allow search crawlers) or out of training sets (block training crawlers) — they're now genuinely separable.
Don't assume Meta-ExternalAgent will keep growing. Meta's bot fell again to 13.1% — its second straight monthly contraction (16.7% in March → 13.1% in May). Meta's aggressive Q1 Llama-training crawl looks like a finite campaign, not a permanent baseline.
Audit your Googlebot assumptions. Googlebot fell to 27.1%, down from ~39% in January. For every 10 AI bot requests hitting your site at the start of the year, Google accounted for ~4; now it's closer to ~3. The other 7 are increasingly dedicated training and search crawlers — not search-indexing bots.
Every AI crawler in this report exists because AI companies need fresh, structured web data to power their models and search products. At WebSearchAPI.ai, we sit on the other side of this equation — providing developers and AI agents with a clean, fast, and affordable way to access real-time web data without running their own crawlers.
Here's why this matters in the context of May's trends:
If the data in this report tells you anything, it's that the volume and complexity of AI web crawling is only accelerating — and the balance is now tilting toward real-time retrieval. WebSearchAPI.ai is purpose-built for developers who want to harness that web intelligence without becoming a crawling operation themselves. Learn more about what a web search API can do for your stack.
An AI crawler (also called an AI bot or AI spider) is an automated program that visits websites to collect content for training artificial intelligence models or powering AI-powered search features. Unlike traditional search engine crawlers that index pages for search results, AI crawlers like GPTBot, ClaudeBot, and Meta-ExternalAgent specifically collect data to train large language models (LLMs). Some crawlers like Googlebot serve both purposes — indexing for search and collecting training data simultaneously. You can identify AI crawlers by their user-agent strings in your server logs or through tools like Cloudflare Radar.
This report is updated monthly with fresh data from Cloudflare Radar AI Insights. Each edition covers a rolling 28-day window and compares it against the immediately preceding 28-day window, so every month-over-month figure is computed on an identical basis. Quarterly editions add full-quarter trajectory analysis (the Q1 2026 review remains in this post as a historical anchor). Bookmark this page or check back at the beginning of each month for the latest analysis of AI crawler traffic patterns, market share shifts, and robots.txt directives.
Yes. The primary method is adding disallow rules to your robots.txt file for specific AI crawler user agents. For example, adding User-agent: GPTBot followed by Disallow: / will request that OpenAI's crawler stop visiting your site. However, robots.txt is a voluntary protocol — crawlers are not technically required to obey it. As of May 2026, GPTBot and ClaudeBot remain the two most-referenced AI crawlers in robots.txt files (appearing in 632 and 539 domains respectively in Cloudflare Radar's parsed sample), with Google-Extended now third, but absolute adoption is still a small fraction of the web. Some CDN providers like Cloudflare also offer dashboard-level controls to block or rate-limit AI bots.
AI training crawlers (like GPTBot, ClaudeBot, and Meta-ExternalAgent) collect web content to build and improve AI models. They typically scrape large volumes of content from many sites. AI search crawlers (like OAI-SearchBot and the newly-observed Claude-SearchBot) fetch specific pages in real time when a user performs a search query through an AI tool like ChatGPT or Claude. The key difference: training crawlers take your content to make the model smarter, while search crawlers fetch your content to answer a specific user question — and may drive traffic back to your site. As of May 2026, training crawling commands a clear majority at 51.8% of all AI bot traffic but has plateaued, while search crawling jumped to 9.6% — its strongest month yet, led by Anthropic's Claude-SearchBot.
Blocking dedicated AI training crawlers like GPTBot, ClaudeBot, Meta-ExternalAgent, Bytespider, or Applebot-Extended will not affect your rankings in Google, Bing, or other traditional search engines. These crawlers are separate from the search indexing bots. However, blocking Googlebot will remove your site from Google Search entirely since Google uses the same crawler for both search indexing and AI training. Google offers a middle ground with the Google-Extended user agent — blocking it opts you out of AI training while keeping your search presence intact. Apple offers the same kind of separation with Applebot-Extended, which sits in the top 12 most-referenced robots.txt user agents as of May 2026.
After Applebot leapt from 5.8% (March) to 9.1% (April) and briefly entered the top five, it fell back to 7.0% in May, surrendering the #5 spot and dropping to #7. The most likely explanation is that April's spike was a finite training burst — a backfill of web content to support Apple Intelligence — rather than a sustained crawling ramp. It's a useful reminder that a single month above a threshold doesn't establish a durable trend. Website owners who added Applebot-Extended directives during the April surge can keep them (adoption is sticky and Apple may ramp again), but Applebot is no longer the urgent case it appeared to be a month ago; ByteDance's Bytespider, now the #4 crawler at 10.5%, is.
Anthropic's two crawlers moved in opposite directions in May. ClaudeBot (training) fell to 9.5%, slipping below GPTBot to #5, while Claude-SearchBot (search) tripled to 2.4% — overtaking OpenAI's OAI-SearchBot to become the single largest dedicated AI search crawler on the web. This mirrors OpenAI's GPTBot/OAI-SearchBot split but in reverse: OpenAI leads in training (GPTBot), while Anthropic now leads in search (Claude's web search functionality). The divergence suggests Anthropic is shifting effort from bulk training collection toward real-time retrieval, and it gives website owners a clear reason to set separate directives for the two bots.
Cloudflare's global network spans 330+ cities in 125+ countries and processes over 81 million HTTP requests per second. Through its Radar platform, Cloudflare identifies and classifies AI bot traffic by analyzing user-agent strings, request patterns, and behavioral signatures across all sites on its network. The data in this report comes from Cloudflare Radar's AI Insights endpoints, which aggregate these signals into share-of-traffic percentages by bot, crawl purpose, industry, and region.
In May 2026, Bytespider had both the largest absolute and relative growth, +4.0 pp (+61%) (6.5% → 10.5%), making ByteDance's crawler the #4 AI bot on the web. Bytespider has now grown for three consecutive months (3.6% → 6.5% → 10.5%), the most sustained surge of any crawler this year. Applebot, last month's standout, reversed sharply (-2.0 pp to 7.0%), and Claude-SearchBot was the fastest riser in the search tier, tripling to 2.4% to become the largest dedicated AI search crawler.
The percentages in this report represent share of identified AI bot requests, not share of total web traffic. Cloudflare Radar tracks the proportion of AI-related crawler activity relative to other AI bots, providing a competitive landscape view. The actual percentage of total web traffic from AI bots varies by website, but industry estimates suggest AI crawlers now account for a meaningful and growing share of overall internet traffic, particularly for content-heavy sites in retail, technology, and media.
Check your server access logs for known AI bot user-agent strings (GPTBot, ClaudeBot, meta-externalagent, Applebot, Bytespider, Amazonbot, etc.). Most web analytics platforms filter out bot traffic by default, so log-level analysis gives the most accurate picture. Cloudflare users can view AI bot activity directly in their dashboard. For a structured approach, consider using a web search API to understand how your content appears in AI-powered search results and ensure your most important pages are properly accessible.
Understanding where this data comes from — and what it can and cannot tell you — is critical for interpreting the trends above. Here's a full breakdown of how Cloudflare Radar collects, classifies, and aggregates the AI crawler data used in this report.
| Metric | Value |
|---|---|
| Global presence | 330 cities in 125+ countries |
| HTTP requests | 81 million/second average, peaks >129 million/second |
| DNS queries | 67 million/second (authoritative + resolver) |
This scale is what makes Cloudflare Radar one of the most comprehensive sources of internet traffic data available. The data in this report comes from two primary sources:
For routing data, Cloudflare also uses RIPE RIS data from RIPE NCC (BGP route collectors).
Cloudflare uses a layered detection system to identify and classify AI crawlers:
💡 Expert Insight: The layered approach matters because not all AI crawlers identify themselves honestly. User-agent matching catches transparent bots like GPTBot and ClaudeBot. Behavioral analysis and honeypots catch crawlers that disguise themselves as regular browsers.
Bots are categorized into these purpose buckets:
⚠️ Warning: Keep these limitations in mind when interpreting the data in this report:
This edition uses data from Cloudflare Radar's AI Insights endpoint (/radar/ai/bots/summary/*), Workers AI inference endpoint (/radar/ai/inference/summary/*), web-crawler endpoints (/radar/bots/crawlers/summary/{vertical|industry|crawl_refer_ratio}), and robots.txt analysis endpoint (/radar/robots_txt/top/user_agents/directive). The May 2026 monthly data covers the rolling 28-day window of May 5 through June 2, 2026, with every month-over-month comparison computed against the immediately preceding 28-day window (April 7 through May 5, 2026) via the API's dateRange=28d and dateRange=28dControl parameters, so both columns share an identical methodology. The Q1 2026 Quarterly Review compiles data from three consecutive monthly analyses covering January through March 2026.
I queried bot traffic breakdowns by user agent, crawl purpose, industry, and vertical; the new crawl-to-refer ratio by operator; Workers AI model and task distribution by account share; and domain-level robots.txt directives. All percentages represent share of identified AI bot requests (for crawling data) or share of accounts (for Workers AI data), not share of total web traffic. The crawl-to-refer ratio is a RATIO (crawls per referral), and robots.txt figures are raw domain counts.
⚠️ Note on revised values: Cloudflare Radar aggregates and may revise data after publication, and it refreshed its parsed robots.txt corpus in early Q2 2026. To keep month-over-month deltas clean, this edition computes the prior-month ("April 2026") column from the API's 28dControl window rather than re-using last edition's published figures, and reports robots.txt as raw domain counts rather than percentages. Where last month's published numbers differ slightly, the difference reflects Radar's revisions, not a change in this report's method.
Data source: Cloudflare Radar AI Insights and Web Crawlers API endpoints (radar.cloudflare.com), May 5 – June 2, 2026 vs. April 7 – May 5, 2026. Last updated: June 2, 2026.
About the Author: I'm James Bennett, Lead Engineer at WebSearchAPI.ai, where I architect the core retrieval engine enabling LLMs and AI agents to access real-time, structured web data with over 99.9% uptime and sub-second query latency. With a background in distributed systems and search technologies, I've reduced AI hallucination rates by 45% through advanced ranking and content extraction pipelines for RAG systems. My expertise includes AI infrastructure, search technologies, large-scale data integration, and API architecture for real-time AI applications.
Credentials: B.Sc. Computer Science (University of Cambridge), M.Sc. Artificial Intelligence Systems (Imperial College London), Google Cloud Certified Professional Cloud Architect, AWS Certified Solutions Architect, Microsoft Azure AI Engineer, Certified Kubernetes Administrator, TensorFlow Developer Certificate.