All posts
AI Crawlers

Monthly AI Crawler Report: April 2026 Traffic Trends & Q1 Predictions Scorecard

Analysis of AI crawler traffic trends from April 2026, with a scorecard on the Q1 predictions made last month. Training crawlers crossed 51.5% — past the 50% milestone. Applebot leapfrogged Bingbot to become the #5 AI crawler at 9.1%. Googlebot dropped below 30% for the first time. Bytespider almost doubled. Three Q1 predictions hit, one missed. Complete breakdown of crawler market share, industry targeting, robots.txt blocking patterns, and Workers AI model shifts based on Cloudflare Radar data.

JBJames Bennett
35 minutes read
Monthly AI Crawler Report April 2026 - AI bot traffic trends, market share analysis, Applebot surge to top five, and Q1 predictions scorecard

This report is updated monthly with fresh Cloudflare Radar data. Bookmark this page to track how AI crawlers are reshaping web traffic each month.

Three of the four predictions I made in last month's Q1 review have already been confirmed by April data — and one was decisively wrong. Googlebot dropped below 30% for the first time (29.96%). Applebot kept surging, leapfrogged Bingbot, and is now the fifth-largest AI crawler at 9.1%. And training crawlers crossed the 50% threshold to reach 51.5% of all AI bot traffic. The miss: Meta-ExternalAgent didn't plateau at 18-20% as expected — it actually fell from 16.7% to 14.5%.

I analyzed 30 days of Cloudflare Radar AI Insights data covering April 4 through May 4, 2026, and the results confirm what the Q1 review predicted: the diversification of the AI crawler market is accelerating, not stabilizing. Two new crawlers entered the top tier (TikTokSpider and Anthropic's dedicated Claude-SearchBot), Bytespider nearly doubled, and the Workers AI model leaderboard saw three brand-new entrants. If you want the foundational primer on the pipelines behind this data, our explainer on how search engines really work walks through crawl budgets, inverted indexes, and learning-to-rank — all the systems these bots are feeding.

📊 Stats Alert: Dedicated AI training crawlers now generate 51.5% of all AI bot traffic — the first month past the 50% milestone. Applebot reached 9.1% to overtake Bingbot for #5. Googlebot fell below 30% for the first time. Bytespider grew +72% month-over-month.

Who Are the Top AI Crawlers in April 2026?

Googlebot remains the largest AI-related crawler globally, but for the first time in Cloudflare Radar's tracking history, its share dipped below 30% — landing at 29.96%. Meta-ExternalAgent stays in second at 14.5%, but down significantly from March's 16.7%. ClaudeBot held essentially flat at 11.6%, narrowly overtaking GPTBot which slipped to 10.0%. The biggest move in April: Applebot leapfrogged Bingbot to become the #5 AI crawler at 9.1%. And Bytespider had a major resurgence, jumping from 3.6% to 6.2% to take the #7 spot.

Two new crawlers also broke into the top 15 this month: TikTokSpider (1.07%) — ByteDance's second crawler beyond Bytespider — and Claude-SearchBot (0.79%), Anthropic's newly observed dedicated search crawler distinct from ClaudeBot. Anthropic now operates two separate crawlers in the wild, one for training and one specifically for Claude's web search functionality.

AI BotApril 2026 Share (%)OperatorPrimary Purpose
Googlebot30.0%GoogleSearch indexing + AI training (mixed)
Meta-ExternalAgent14.5%MetaAI training
ClaudeBot11.6%AnthropicModel training
GPTBot10.0%OpenAIModel training
Applebot9.1%AppleSearch + AI features
Bingbot8.2%MicrosoftSearch indexing + AI (mixed)
Bytespider6.2%ByteDanceAI training
Amazonbot4.7%AmazonAI training
OAI-SearchBot1.8%OpenAIChatGPT Search
ChatGPT-User1.1%OpenAIReal-time user queries
TikTokSpider1.1%ByteDanceAI training
Claude-SearchBot0.8%AnthropicClaude web search

The concentration of crawling power among the traditional top five companies — Google, Meta, OpenAI, Anthropic, and Microsoft — stands at 74.3% in April (counting each company's primary crawler), down from 80.2% in March, 82.8% in February, and 84.5% in January. This is the fourth consecutive monthly decline, and the steepest single-month drop of the year. Apple is no longer the only diversification story: ByteDance now operates two crawlers (Bytespider + TikTokSpider) totaling 7.3%, putting it firmly in the top tier alongside Apple. Even adding Applebot (9.1%) and Bytespider (6.2%) to the traditional top five only reaches 89.6% for the top seven — meaningful "long tail" growth in just a quarter.

If you're managing AI bot access on your site, the list of operators you need to account for in your robots.txt got longer again this month — TikTokSpider, Claude-SearchBot, and the now-resurgent Bytespider all warrant explicit policies. For context on how crawling translates into actual referral traffic back to websites, check our companion Search Engine Referral Report on crawl-to-refer ratios.

Which AI Crawlers Gained or Lost Ground This Month?

AI bot traffic share breakdown showing Googlebot, ClaudeBot, GPTBot, Meta-ExternalAgent, Applebot, and Bytespider market share in April 2026

The headline shift in April is the simultaneous decline of all three "big AI labs" crawlers — Meta-ExternalAgent (-2.2 pp), GPTBot (-2.0 pp), and Googlebot (-1.6 pp) all lost share in the same month for the first time this year. The traffic that vacated didn't go to a new dominant crawler; it dispersed across Applebot (+3.3 pp), Bytespider (+2.6 pp), and the emerging long tail (TikTokSpider, Claude-SearchBot, ChatGPT-User).

AI BotMarch 2026 ShareApril 2026 ShareChange (pp)Relative Change
Googlebot31.6%30.0%-1.6-5.2%
Meta-ExternalAgent16.7%14.5%-2.2-13.1%
GPTBot12.0%10.0%-2.0-16.5%
ClaudeBot11.7%11.6%-0.1-0.6%
Bingbot8.2%8.2%0.00.0%
Applebot5.8%9.1%+3.3+56.3%
Bytespider3.6%6.2%+2.6+72.6%
Amazonbot4.4%4.7%+0.3+6.0%
OAI-SearchBot2.2%1.8%-0.4-17.2%
ChatGPT-User1.1%NewNew entrant
TikTokSpider1.1%NewNew entrant
Claude-SearchBot0.8%NewNew entrant

Six trends stand out from this comparison:

  1. Applebot leapfrogged Bingbot. Applebot's surge from 5.8% to 9.1% (+3.3 pp, +56% relative) makes it the fifth-largest AI crawler globally — passing Microsoft's Bingbot for the first time. This was the central prediction of last month's quarterly review (we projected this could happen by May or June), and it happened in April. At the current absolute share of 9.1%, Applebot is now generating more crawl traffic than ClaudeBot and GPTBot combined a year ago.

  2. Googlebot finally dropped below 30%. The Q1 review predicted Googlebot would breach 30% in Q2 — it happened in the first month of Q2. At 29.96%, Googlebot has now lost over 8.7 percentage points since January. The decline rate slowed (-1.6 pp in April vs -4.0 pp in March), suggesting Google may be approaching a floor as competitors saturate.

  3. Meta-ExternalAgent reversed direction. After three consecutive monthly gains in Q1, Meta-ExternalAgent fell -2.2 pp in April to 14.5% — its first decline since tracking began. The Q1 review predicted a plateau at 18-20%; the actual outcome was a contraction. This is the single biggest miss in last month's predictions and warrants explanation: Meta may have completed a major training data refresh, or shifted crawling to a different unidentified user agent.

  4. Bytespider had a major resurgence. ByteDance's crawler nearly doubled from 3.6% to 6.2% (+72% relative). Combined with the new TikTokSpider at 1.1%, ByteDance's crawler footprint reached 7.3% in April — making ByteDance the third-largest AI crawler operator globally after Google and Meta, ahead of OpenAI's combined footprint.

  5. OpenAI's combined footprint declined. GPTBot (10.0%), OAI-SearchBot (1.8%), and ChatGPT-User (1.1%) sum to 12.9% — down from 14.2% in March (GPTBot 12.0% + OAI-SearchBot 2.2%). Even with the new ChatGPT-User entrant breaking out, OpenAI's overall share contracted. If you're building apps that rely on real-time retrieval, understanding what a web search API is and how these bots work under the hood matters more than ever.

  6. Anthropic split into two crawlers. ClaudeBot held essentially flat at 11.6%, but the new Claude-SearchBot (0.8%) appeared as a distinct user agent for the first time. This mirrors OpenAI's split between GPTBot (training) and OAI-SearchBot (search) — Anthropic now operates a clear training/search separation, giving website owners the ability to allow one without the other.

What Are AI Bots Actually Doing With the Content They Crawl?

The 50% threshold is in the rearview mirror. In April 2026, dedicated AI training crawlers reached 51.5% of all AI bot traffic — a clear majority for the first time, and a +1.6 pp gain over March. Mixed-purpose crawlers (the Googlebot/Bingbot bundle that does both search indexing and training) continued to slide, reaching 38.2%.

Crawl PurposeApril 2026 ShareMarch 2026 ShareChange (pp)
Training51.5%49.9%+1.6
Mixed Purpose38.2%39.9%-1.7
Search7.6%7.7%-0.1
User Action2.2%2.1%+0.1
Undeclared0.4%0.4%0.0

Here's what the April data tells website owners:

Training crossed the majority threshold at 51.5%. The +1.6 pp gain is more modest than March's +6.0 pp surge, but it's the seventh consecutive monthly increase. The deceleration suggests the Q1 acceleration phase has ended and Training is settling into a steadier growth pattern. For every 100 AI bot requests hitting your site, 52 are now explicitly dedicated to training — meaning robots.txt rules targeting training crawlers (GPTBot, ClaudeBot, Meta-ExternalAgent, Applebot, Bytespider) cover the majority of AI traffic, not a fraction.

Mixed Purpose continued its slide to 38.2%. Crawlers that simultaneously index for search and collect training data have lost ground for four consecutive months — from 48.3% in January to 38.2% in April, a cumulative -10.1 pp drop. The decline mirrors Googlebot's -1.6 pp and matches the trajectory predicted in the Q1 review. You still can't separate the AI training from the search indexing with these crawlers — block Googlebot and you disappear from Google Search. For developers looking at how to ground AI responses with Google Search alternatives, this bundling problem keeps coming up.

Search crawling stabilized at 7.6%. After declining in March, AI search crawlers held essentially flat in April. OAI-SearchBot fell to 1.8% but was offset by the new Claude-SearchBot at 0.8%. The growing ecosystem of AI search API alternatives continues to drive demand for these retrieval systems.

User Action ticked up to 2.2%. ChatGPT-User now shows up explicitly in the top crawler list at 1.1%, accounting for the slight uptick in real-time user-query bots. This is the category most directly tied to ChatGPT's "browse" feature.

The 13.3-percentage-point gap between Training (51.5%) and Mixed Purpose (38.2%) is a decisive structural shift. Training crawlers aren't just the majority — they're pulling away. At the current trajectory, Training could reach 55% by mid-Q2 2026, while Mixed Purpose may fall below 35% by June.

Which Industries Are AI Bots Targeting Most?

AI crawlers showed remarkable consistency in which industries they targeted in April. Retail and e-commerce sites continue to absorb the largest share of AI crawling activity globally — but the standout April change is Internet and Telecom jumping from 16.9% to 18.3% (+1.4 pp), the largest single-month gain across any vertical this year.

Industry VerticalApril 2026 ShareMarch 2026 ShareChange (pp)
Shopping & General Merchandise31.7%31.7%0.0
Internet and Telecom18.3%16.9%+1.4
Computer and Electronics14.2%14.4%-0.2
News, Media, and Publications8.7%8.9%-0.2
Business and Industry4.9%5.0%-0.1
Travel and Tourism3.8%4.0%-0.2
Professional Services3.2%3.3%-0.1
Finance2.8%2.9%-0.1
Gambling2.4%2.9%-0.5
Career and Education2.1%New top 10

Shopping held at exactly 31.7% — retail has now stayed above 31% for four consecutive months, confirming that e-commerce content is the single most crawled category on the web. The notable Internet and Telecom rise (+1.4 pp) tracks closely with Applebot's surge — Apple appears to be heavily targeting telecom and SaaS content as it builds out Apple Intelligence. Career and Education entered the top 10 verticals at 2.1%, displacing Gambling.

The industry-level breakdown gets more granular:

IndustryApril 2026 ShareMarch 2026 ShareChange (pp)
Retail28.7%28.6%+0.1
Computer Software13.0%13.2%-0.2
IT and Services5.9%5.8%+0.1
Internet5.2%5.1%+0.1
Telecommunications3.8%2.8%+1.0
Adult Entertainment2.6%2.7%-0.1
Online Media2.6%2.6%0.0
Media2.4%2.5%-0.1
Marketing and Advertising2.3%New top 10
Gambling & Casinos2.3%2.7%-0.4

The most notable industry-level change is Telecommunications growing from 2.8% to 3.8% (+1.0 pp). This is the largest single-month gain in any industry category in the past quarter, and it correlates almost exactly with Applebot's surge. Marketing and Advertising also entered the top 10 industries at 2.3%, replacing Hospitality. If you maintain developer documentation, API references, or technical tutorials, this is why choosing the right AI web search API for your applications matters — these crawlers are the infrastructure behind the search results your users see.

AI training crawlers aren't the only bots scanning the web at this scale, either. Technology detection platforms like Technologychecker.io crawl and fingerprint over 50 million domains using HTTP header analysis, JavaScript fingerprinting, DNS lookups, and headless browser rendering to identify 40,000+ technologies. Unlike AI training crawlers that take content for model weights, technology intelligence crawlers need to re-crawl frequently to track stack changes and new tech adoptions.

How Are Websites Fighting Back Against AI Crawlers?

Chart showing percentage of websites blocking AI crawlers like GPTBot, ClaudeBot, and Meta-ExternalAgent via robots.txt in April 2026

⚠️ Methodology note: Cloudflare Radar's robots.txt parsing corpus was refreshed in early Q2 2026, which materially changed the share denominator used for ranking AI crawler directives. The March/February/January percentages reported in last month's edition reflect the prior, larger sample and are preserved unchanged below for historical accuracy. April 2026 percentages use the new sample (4,102 parsed robots.txt files at the most recent snapshot). Because the samples differ, direct percentage-to-percentage comparisons between April and earlier months should be treated as directional, not arithmetic. Relative ranks remain comparable.

Most Referenced AI Crawlers in robots.txt — April 2026

AI CrawlerDomains Referencing% of Parsed FilesOperator
GPTBot59714.55%OpenAI
ClaudeBot50412.29%Anthropic
CCBot49312.02%Common Crawl
Google-Extended47611.60%Google
Bytespider4109.99%ByteDance
Googlebot3989.70%Google
meta-externalagent3688.97%Meta
PerplexityBot3588.73%Perplexity
Amazonbot3458.41%Amazon
Applebot-Extended3388.24%Apple
ChatGPT-User3227.85%OpenAI
OAI-SearchBot2706.58%OpenAI
anthropic-ai2455.97%Anthropic

The April rankings preserve March's top four order — GPTBot, ClaudeBot, CCBot, Google-Extended — but the gap between #1 and #5 narrowed. PerplexityBot climbed to #8 with 358 domains referencing it (8.73% of parsed files), up two positions from March. Applebot-Extended also jumped into the top 10 at 8.24% — a meaningful response to Applebot's traffic surge: as more website owners notice Applebot in their logs, they're adding the Applebot-Extended directive (which controls AI training opt-out independently of search visibility).

Notably, OpenAI now occupies three slots in the top 12: GPTBot (#1), ChatGPT-User (#11), and OAI-SearchBot (#12). No other operator has three crawlers explicitly referenced this often.

The meta-externalagent gap persists. Despite being the #2 AI crawler by traffic at 14.5%, Meta's bot only ranks #7 in robots.txt references — still one of the largest disparities between traffic share and blocking rate among major crawlers.

For continuity, the previously published Q1 2026 robots.txt percentages remain as historical anchors:

AI CrawlerJanuary 2026February 2026March 2026April 2026 (new methodology)
GPTBot5.29%5.40%5.52%14.55%*
ClaudeBot4.33%4.57%4.72%12.29%*
CCBot4.40%4.64%4.53%12.02%*
Google-Extended4.04%4.38%4.37%11.60%*
Bytespider3.69%3.70%3.67%9.99%*
Amazonbot2.99%3.12%3.29%8.41%*

*April 2026 percentages reflect a refreshed Cloudflare Radar corpus with a smaller sample of parsed robots.txt files. Use these to compare April crawlers to one another, not to the January–March series.

Despite the methodology shift, two qualitative observations hold across both samples:

  1. Selective adoption continues. Website owners are not adopting blanket AI crawler blocks; they're adding rules for specific crawlers. The fact that Applebot-Extended (8.24%) entered the top 10 in April mirrors the way PerplexityBot did in March — blocking activity tracks awareness, not traffic.
  2. The blocking minority is still a minority. Even using April's higher denominators, the absolute number of parsed robots.txt files referencing the top crawler is 597 — meaningful, but a small fraction of the global web.

What's Happening on Cloudflare Workers AI?

Beyond crawlers, Cloudflare Radar tracks usage patterns on Cloudflare Workers AI — the platform that lets developers run AI models at the edge. The April 2026 data reveals the most dramatic single-month reshuffling of the model leaderboard since tracking began: three brand-new models entered the top 11, Llama 3 8B's share dropped 8.5 pp in a single month, and OpenAI's GPT-OSS 120B more than doubled.

ModelApril 2026 ShareMarch 2026 ShareChange (pp)Developer
Llama 3 8B Instruct28.8%37.3%-8.5Meta
Stable Diffusion XL Base 1.09.7%12.3%-2.6Stability AI
Gemma 4 26B-A4B-IT (NEW)6.7%New entrantGoogle
Llama 4 Scout 17B6.1%7.0%-0.9Meta
Whisper5.8%7.5%-1.7OpenAI
GPT-OSS 120B5.5%2.1%+3.4OpenAI
M2M-100 1.2B3.9%5.1%-1.2Meta
Llama 3.2 1B Instruct (NEW)3.6%New entrantMeta
Llama 3 8B Instruct (AWQ)3.5%4.4%-0.9Meta
FLUX.1 Schnell3.2%3.0%+0.2Black Forest Labs
Kimi K2.6 (NEW)2.3%New entrantMoonshot AI

The headline trend in April is the collapse of Llama 3 8B's dominance — its share fell from 37.3% to 28.8% in a single month, the largest single-month decline for any Workers AI model on record. The model still leads, but its lead has compressed dramatically: in February it commanded 40.1%; in April, 28.8%. The "other" category (models outside the top 11) reached 20.8%, continuing the long-tail expansion.

Three brand-new models entered the top 11 simultaneously: Google's Gemma 4 26B debuted at 6.7% (immediately #3, the highest debut of any model this year), Meta's lightweight Llama 3.2 1B Instruct at 3.6%, and Moonshot AI's Kimi K2.6 at 2.3% (the first non-US-developed model to crack Workers AI's top 11). These three new entrants together account for 12.6% of all Workers AI accounts.

Meta's overall provider share contracted again. Combining Llama 3 8B (28.8%), Llama 4 Scout (6.1%), M2M-100 (3.9%), Llama 3.2 1B (3.6%), and Llama 3 AWQ (3.5%), Meta models account for 45.9% of all Workers AI accounts — down from 53.8% in March. Meta has lost roughly 12 percentage points of provider share since January. Google, by contrast, is suddenly meaningful at 6.7% (Gemma 4 26B alone) — a category Google barely registered in last month.

GPT-OSS 120B's growth accelerated dramatically, climbing from 2.1% to 5.5% (+3.4 pp, +160% relative). OpenAI's open-source 120B model is the breakout adoption story of Q2 — it's now the #6 model on Workers AI, having ranked #8 just one month ago. Combined with Whisper (5.8%) and OpenAI's other models, OpenAI's Workers AI footprint reaches roughly 11-12%, holding its #2 provider position.

Task Distribution

Task TypeApril 2026 ShareMarch 2026 ShareChange (pp)
Text Generation69.5%64.2%+5.3
Text-to-Image16.9%19.3%-2.4
Automatic Speech Recognition7.5%9.3%-1.8
Translation3.9%5.1%-1.2
Text-to-Speech0.8%0.5%+0.3
Text Classification0.7%0.8%-0.1
Image-to-Text0.5%0.7%-0.2

Text Generation surged to 69.5%, up +5.3 pp from March. This is the largest single-month gain in any task category since tracking began, and it's directly attributable to GPT-OSS 120B's adoption surge plus Gemma 4 26B's debut — both are text-generation workloads. Text-to-Image declined to 16.9%, the lowest share of the year, as Stable Diffusion XL slid from 12.3% to 9.7%.

Text-to-Speech jumped to 0.8% (up from 0.5%), continuing its consistent growth and confirming sustained developer interest in voice synthesis at the edge. While still a small category, this is the second-largest relative increase (+60%) of any task category in April after Text Generation.

Q1 2026 Quarterly Review: The Great Reshuffling

Q1 2026 was the most transformative quarter in the history of AI web crawling. This section compiles data from three consecutive monthly analyses to document the structural shifts reshaping how AI companies interact with the open web.

Q1 2026 at a Glance

MetricJanuary 2026February 2026March 2026Q1 Change
Top crawler (Googlebot)38.7%34.6%31.6%-7.1 pp
#2 crawlerGPTBot (12.8%)Meta-ExternalAgent (15.6%)Meta-ExternalAgent (16.7%)Meta took #2
Training crawl share42.0%45.4%49.9%+7.9 pp
Mixed Purpose crawl share48.3%43.9%39.9%-8.4 pp
Top 5 company concentration84.5%82.8%80.2%-4.3 pp
Domains blocking GPTBot5.29%5.45%5.52%+0.23 pp
Llama 3 8B (Workers AI)41.7%40.1%37.3%-4.4 pp

How Each Crawler Moved Across the Quarter

AI BotJan 2026Feb 2026Mar 2026Q1 Change (pp)Q1 Direction
Googlebot38.7%34.6%31.6%-7.1Declining
Meta-ExternalAgent11.6%15.6%16.7%+5.1Rising
GPTBot12.8%12.1%12.0%-0.8Stable/Slow decline
ClaudeBot11.4%11.1%11.7%+0.3V-shaped recovery
Bingbot9.7%9.3%8.2%-1.5Declining
Applebot2.5%3.1%5.8%+3.3Surging
Amazonbot4.8%5.4%4.4%-0.4Volatile
Bytespider3.5%3.3%3.6%+0.1Flat
OAI-SearchBot2.0%2.6%2.2%+0.2Volatile

Googlebot's sustained decline is the defining trend of Q1. Losing 7.1 pp over three months represents the largest quarterly share loss for any single crawler in Cloudflare Radar's tracking history. This doesn't necessarily mean Google is crawling less — it means competitors are crawling more, faster. At 31.6%, Googlebot is no longer 2x the size of the next-largest crawler; the ratio to Meta-ExternalAgent is now 1.9x and shrinking.

Meta-ExternalAgent was the biggest winner of Q1. Gaining +5.1 pp over three months, Meta's crawler overtook GPTBot in February and never looked back. The growth rate decelerated across the quarter (+3.1 pp → +3.7 pp → +2.3 pp), suggesting Meta may be approaching a near-term equilibrium.

Applebot's March surge was the quarter's biggest surprise. After modest growth in January (+0.2 pp) and February (+0.6 pp), Applebot exploded in March (+3.2 pp, +124% relative). Over Q1, Applebot more than doubled from 2.5% to 5.8%. Apple is now a top-six AI crawler operator — a status no one predicted at the start of the quarter.

The Training-Mixed Purpose Crossover

The most consequential structural shift of Q1 2026 happened in the crawl purpose data:

Crawl PurposeJan 2026Feb 2026Mar 2026Q1 Change (pp)
Training42.0%45.4%49.9%+7.9
Mixed Purpose48.3%43.9%39.9%-8.4
Search6.9%8.2%7.7%+0.8
User Action2.2%2.0%2.1%-0.1
Undeclared0.5%0.4%0.4%-0.1

In January, Mixed Purpose led Training by 6.3 pp (48.3% vs 42.0%). By February, Training had overtaken Mixed Purpose for the first time (45.4% vs 43.9%). By March, the gap had widened to 10 pp (49.9% vs 39.9%).

This crossover represents a fundamental change in how AI companies interact with the web:

  • Before Q1 2026: The majority of AI bot traffic came from dual-purpose crawlers (Googlebot, Bingbot) that bundled search indexing with AI training. Website owners couldn't separate the two.
  • After Q1 2026: The majority comes from purpose-built training crawlers (GPTBot, ClaudeBot, Meta-ExternalAgent, Applebot) that exist solely to collect AI training data. Website owners can block these selectively.

The shift gives website owners more control. You can now block the majority of AI training crawling without sacrificing search visibility — something that wasn't possible when Mixed Purpose crawlers dominated.

Market Concentration Is Declining

MetricJanuaryFebruaryMarch
Top 5 companies (Google, Meta, OpenAI, Anthropic, Microsoft)84.5%82.8%80.2%
Top 6 companies (+ Apple)87.0%85.9%86.0%
Top 4 crawlers share74.4%73.5%72.0%

The AI crawler market is diversifying. The traditional top five lost 4.3 pp of concentration over Q1, but when you include Apple as a sixth major player, the top-six concentration remained remarkably stable at ~86%. The diversification is happening within the top tier, not from outside it.

AI CrawlerJan 2026Feb 2026Mar 2026Q1 Change
GPTBot5.29%5.45%5.52%+0.23 pp
CCBot4.40%4.63%4.53%+0.13 pp
ClaudeBot4.33%4.62%4.72%+0.39 pp
Google-Extended4.04%4.36%4.37%+0.33 pp
Googlebot4.13%3.77%3.80%-0.33 pp
Bytespider3.69%3.73%3.67%-0.02 pp
meta-externalagent3.24%3.26%3.34%+0.10 pp
Amazonbot2.99%3.16%3.29%+0.30 pp

ClaudeBot saw the largest blocking increase across Q1 at +0.39 pp, overtaking CCBot to become the second-most referenced AI crawler in robots.txt files behind GPTBot. The blocking wave is real but slow — over the entire quarter, GPTBot grew from 5.29% to just 5.52%. More than 94% of domains still allow all AI crawlers unrestricted access. At the current rate, it would take years for blocking rates to reach even 10%.

The Traffic-Blocking Gap

AI CrawlerMarch Traffic ShareMarch Blocking RateGap (pp)
Googlebot31.6%3.80%27.8
Meta-ExternalAgent16.7%3.34%13.4
GPTBot12.0%5.52%6.5
ClaudeBot11.7%4.72%7.0
Amazonbot4.4%3.29%1.1

Meta-ExternalAgent's 13.4 pp gap is the most actionable — it's a dedicated training crawler with no search indexing benefit, yet it's blocked by fewer domains than GPTBot despite generating more traffic.

Workers AI Model Ecosystem Evolution

ModelJan 2026Feb 2026Mar 2026Q1 Change
Llama 3 8B Instruct41.7%40.1%37.3%-4.4 pp
Stable Diffusion XL Base 1.013.4%13.4%12.3%-1.1 pp
Whisper8.5%8.3%7.5%-1.0 pp
Llama 4 Scout 17B7.7%6.7%7.0%-0.7 pp
M2M-100 1.2B5.6%5.4%5.1%-0.5 pp
Llama 3 8B Instruct (AWQ)4.7%4.9%4.4%-0.3 pp
FLUX.1 Schnell2.4%2.5%3.0%+0.6 pp
GPT-OSS 120B--1.6%2.1%New (+2.1 pp)
Whisper Large V3 Turbo1.4%1.6%1.7%+0.3 pp

Every top model except FLUX.1 Schnell, GPT-OSS 120B, and Whisper Large V3 Turbo lost share across Q1. Meta's dominance is slowly eroding — from ~60% in January to 53.8% in March. GPT-OSS 120B is the breakout model of Q1, debuting in February and climbing to 2.1% in March. FLUX.1 Schnell is gaining on Stable Diffusion XL, emerging as a credible challenger in the text-to-image space.

Five Structural Shifts That Defined Q1 2026

  1. The Training Takeover. Training crawlers went from minority (42.0%) to near-majority (49.9%) in a single quarter. The web's content is now primarily being consumed for AI model weights rather than search indexing. Website owners who want to opt out of AI training have clearer tools to do so.

  2. The Google Erosion. Googlebot lost 7.1 pp across Q1. This is almost certainly driven by competitors growing faster rather than Google crawling less, but the proportional shift matters for every website owner's traffic analysis.

  3. The Meta Ascendancy. Meta-ExternalAgent gained +5.1 pp, overtook GPTBot for #2 in February, and finished the quarter at 16.7%. Meta's aggressive Llama training pipeline is driving unprecedented data collection.

  4. Apple's Arrival. Applebot went from 2.5% to 5.8% across Q1, with most growth concentrated in March. Apple is now a top-six AI crawler operator and should be included in any website owner's AI bot management strategy.

  5. The Diversification of Workers AI. The model ecosystem became meaningfully more diverse. Llama 3 8B's share fell from 41.7% to 37.3% as developers adopted newer models and the long tail grew from ~15% to ~20%.

Q2 2026 Predictions

Based on Q1 trends, here's what I expect to see in Q2 2026:

  1. Training crawlers will exceed 55%. The +7.9 pp Q1 trajectory suggests Training could reach 55-57% by June 2026, with Mixed Purpose falling below 35%.

  2. Googlebot will drop below 30%. The current quarterly decline rate would put Googlebot in the 28-30% range by end of Q2.

  3. Applebot will enter the top five. If Apple sustains even half of March's growth rate, Applebot could pass Bingbot (currently 8.2%) by May or June.

  4. Meta-ExternalAgent will plateau around 18-20%. The deceleration trend suggests Meta's growth rate is stabilizing.

  5. GPT-OSS 120B will enter the Workers AI top five. At its current growth rate, it could reach 3-4% by June, approaching FLUX.1 Schnell territory.

  6. Robots.txt blocking will remain below 6% for all crawlers. The slow growth rate means meaningful blocking thresholds remain years away.

Q1 Predictions Scorecard: April 2026 Reality Check

Now that April data is in, here's how each Q1 prediction held up. Three of six predictions were confirmed in the very first month of Q2, one missed in the opposite direction, and two are still in flight.

#Q1 PredictionPredicted WindowApril 2026 ActualVerdict
1Training crawlers will exceed 55%By June 202651.5% (passed 50% threshold)🟡 On track
2Googlebot will drop below 30%End of Q229.96%Hit (early)
3Applebot will enter the top fiveMay or June9.1%, passed BingbotHit (early)
4Meta-ExternalAgent will plateau at 18-20%Q2Fell to 14.5%Miss
5GPT-OSS 120B will enter Workers AI top fiveBy June5.5% (#6, near top 5)🟡 Near hit
6Robots.txt blocking will remain below 6%Q2Methodology shift (see note)⚠️ Inconclusive

The predictions that hit early. Applebot's leap into the top five and Googlebot's sub-30% drop both happened in April rather than May/June. The trajectory of both crawlers in March was steep enough that the Q2 timeline was conservative — Apple's continued buildout of Apple Intelligence appears to be a sustained, not transient, ramp.

The miss that matters. The Meta-ExternalAgent prediction was the most confidently stated and the most decisively wrong. Q1 data showed three months of consecutive gains decelerating — a textbook plateau pattern — but April produced a -2.2 pp contraction, the first decline since tracking began. The simplest hypothesis: Meta completed a major Llama training data refresh and dialed back crawl intensity. The alternative: Meta is shifting traffic to a not-yet-identified user agent. Either way, the lesson is that "decelerating gains" don't always precede a plateau — they can also precede a contraction.

The training prediction is on track. Training reached 51.5% in April, +1.6 pp month-over-month. To hit 55% by June would require a ~+1.7 pp/month pace, which is roughly the current trajectory. This prediction looks likely to land within its window.

The robots.txt prediction is inconclusive. Cloudflare Radar's parsed-corpus refresh changed the share denominator (see methodology note above), so the "below 6%" threshold can't be cleanly evaluated against the new sample. The underlying behavior — slow, selective adoption — does appear to continue.

The takeaway: when a prediction is built on sustained directional momentum (Googlebot decline, Applebot growth), it tends to hit. When it's built on extrapolating a slowdown into a plateau (Meta-ExternalAgent), it's risky — the same deceleration that looks like a soft landing can also look like the leading edge of a reversal.

Based on what I've found in the April 2026 Cloudflare Radar data, here are the actions I'd prioritize:

Add Bytespider and TikTokSpider to your robots.txt review. ByteDance now operates two AI crawlers that together generate 7.3% of all AI bot traffic — making ByteDance the third-largest AI crawler operator globally after Google and Meta. Bytespider almost doubled in April (3.6% → 6.2%), and TikTokSpider made its top-15 debut at 1.1%. If you maintain a list of crawlers covered by your robots.txt, both should now be on it.

Update your Applebot policy — Applebot is now the #5 crawler, not the #6. Apple's crawler reached 9.1% of AI bot traffic in April and passed Bingbot to take the fifth spot. Applebot-Extended also entered the top 10 in robots.txt references this month, suggesting more website operators are adding it. Apple offers Applebot-Extended as a separate user agent for opting out of AI training while maintaining Apple Search visibility.

Distinguish ClaudeBot from Claude-SearchBot. Anthropic now operates two separate crawlers: ClaudeBot (training, 11.6%) and Claude-SearchBot (search, 0.8% — first observed in April). This mirrors OpenAI's GPTBot/OAI-SearchBot split. If you want to allow Claude's search functionality to surface your content while opting out of training, you can now do that with separate directives.

Recognize that training crawlers are now the clear majority. At 51.5%, dedicated training crawlers crossed the 50% threshold for the first time in April. A robots.txt policy that only addresses search crawlers now covers less than half of AI bot traffic on your site.

Don't assume Meta-ExternalAgent will keep growing. Meta's bot fell from 16.7% to 14.5% in April — the first contraction on record. If you've been deferring a meta-externalagent rule because Meta's traffic seemed to keep growing, the trajectory has changed. Meta is still the #2 AI crawler, but the trend isn't a one-way street.

Audit your Googlebot assumptions. Googlebot dropped below 30% for the first time in April. For every 10 AI bot requests hitting your site in January, Google accounted for ~4. In April, it's closer to ~3. The other 7 are increasingly from dedicated training crawlers — not search indexing bots.

How WebSearchAPI.ai Fits Into the AI Crawler Ecosystem

Every AI crawler in this report exists because AI companies need fresh, structured web data to power their models and search products. At WebSearchAPI.ai, we sit on the other side of this equation — providing developers and AI agents with a clean, fast, and affordable way to access real-time web data without running their own crawlers.

Here's why this matters in the context of April's trends:

  • Training crawlers crossed the 50% majority threshold. At 51.5%, dedicated training crawlers now command the majority of AI bot traffic. The web's content is increasingly being consumed for model weights rather than search indexing. Instead of building and maintaining your own crawling infrastructure, WebSearchAPI.ai gives you instant access to structured search results, content extraction, and real-time web intelligence through a single API call.
  • The crawler landscape is fragmenting faster than it's consolidating. With Apple now #5, ByteDance running two crawlers, Anthropic split into ClaudeBot and Claude-SearchBot, and the top-five concentration falling to 74.3%, the complexity of managing web data access is growing each month. WebSearchAPI.ai handles the retrieval layer so you can focus on your application logic.
  • Sub-second latency, 99.9% uptime, and structured responses mean your AI applications get the data they need without the infrastructure headaches that come with managing crawler fleets.

If the data in this report tells you anything, it's that the volume and complexity of AI web crawling is only accelerating — and the balance has now decisively shifted toward dedicated training. WebSearchAPI.ai is purpose-built for developers who want to harness that web intelligence without becoming a crawling operation themselves. Learn more about what a web search API can do for your stack.

Frequently Asked Questions

What is an AI crawler?

An AI crawler (also called an AI bot or AI spider) is an automated program that visits websites to collect content for training artificial intelligence models or powering AI-powered search features. Unlike traditional search engine crawlers that index pages for search results, AI crawlers like GPTBot, ClaudeBot, and Meta-ExternalAgent specifically collect data to train large language models (LLMs). Some crawlers like Googlebot serve both purposes — indexing for search and collecting training data simultaneously. You can identify AI crawlers by their user-agent strings in your server logs or through tools like Cloudflare Radar.

How often is this AI crawler report updated?

This report is updated monthly with fresh data from Cloudflare Radar AI Insights. Each edition covers a rolling 30-day window and includes month-over-month comparisons so you can track trends over time. Quarterly editions (like this one) also include full-quarter trajectory analysis. Bookmark this page or check back at the beginning of each month for the latest analysis of AI crawler traffic patterns, market share shifts, and robots.txt blocking trends.

Can I block AI crawlers from my website?

Yes. The primary method is adding disallow rules to your robots.txt file for specific AI crawler user agents. For example, adding User-agent: GPTBot followed by Disallow: / will request that OpenAI's crawler stop visiting your site. However, robots.txt is a voluntary protocol — crawlers are not technically required to obey it. As of April 2026, GPTBot and ClaudeBot remain the two most-referenced AI crawlers in robots.txt files, but absolute adoption is still a small fraction of the web. Some CDN providers like Cloudflare also offer dashboard-level controls to block or rate-limit AI bots.

What is the difference between AI training crawlers and AI search crawlers?

AI training crawlers (like GPTBot, ClaudeBot, and Meta-ExternalAgent) collect web content to build and improve AI models. They typically scrape large volumes of content from many sites. AI search crawlers (like OAI-SearchBot and the newly-observed Claude-SearchBot) fetch specific pages in real time when a user performs a search query through an AI tool like ChatGPT or Claude. The key difference: training crawlers take your content to make the model smarter, while search crawlers fetch your content to answer a specific user question — and may drive traffic back to your site. As of April 2026, training crawling commands a clear majority at 51.5% of all AI bot traffic, while search crawling holds at 7.6%.

Will blocking AI crawlers affect my SEO or search rankings?

Blocking dedicated AI training crawlers like GPTBot, ClaudeBot, Meta-ExternalAgent, Bytespider, or Applebot-Extended will not affect your rankings in Google, Bing, or other traditional search engines. These crawlers are separate from the search indexing bots. However, blocking Googlebot will remove your site from Google Search entirely since Google uses the same crawler for both search indexing and AI training. Google offers a middle ground with the Google-Extended user agent — blocking it opts you out of AI training while keeping your search presence intact. Apple offers the same kind of separation with Applebot-Extended, which entered the top 10 most-referenced robots.txt user agents for the first time in April 2026.

Why did Applebot keep surging in April 2026?

Applebot grew from 5.8% in March to 9.1% in April, taking the #5 spot from Bingbot. The surge is consistent with Apple's continued expansion of Apple Intelligence across its ecosystem. As Apple integrates more AI features into Siri, Safari, and iOS/macOS, the company needs fresh web content to power these capabilities. The April industry data also shows Telecommunications and Internet & Telecom verticals receiving disproportionate Applebot traffic, suggesting Apple is targeting SaaS, telecom, and connectivity content as part of its build-out. Website owners should now treat Applebot as a top-tier AI crawler and add Applebot-Extended directives if they want to opt out of AI training while maintaining Apple Search visibility.

What changed with Anthropic's crawlers in April 2026?

In April 2026, Cloudflare Radar identified a new Anthropic user agent — Claude-SearchBot — operating distinctly from the existing ClaudeBot crawler. ClaudeBot continues to handle training and held essentially flat at 11.6%, while Claude-SearchBot debuted at 0.8% as a search-focused crawler powering Claude's web search functionality. Anthropic now mirrors OpenAI's separation of GPTBot (training) and OAI-SearchBot (search), giving website owners the ability to allow Claude's search to surface their content while opting out of training, or vice versa.

How does Cloudflare track AI crawler traffic?

Cloudflare's global network spans 330+ cities in 125+ countries and processes over 81 million HTTP requests per second. Through its Radar platform, Cloudflare identifies and classifies AI bot traffic by analyzing user-agent strings, request patterns, and behavioral signatures across all sites on its network. The data in this report comes from Cloudflare Radar's AI Insights endpoints, which aggregate these signals into share-of-traffic percentages by bot, crawl purpose, industry, and region.

Which AI crawler is growing the fastest in 2026?

In April 2026, Bytespider had the largest relative growth at +72.6% (3.6% → 6.2%), and Applebot had the largest absolute gain at +3.3 pp (5.8% → 9.1%). Over the rolling four-month window from January to April 2026, Applebot has been the standout grower overall — climbing from 2.5% to 9.1%, a +6.6 pp gain that is the largest absolute increase of any AI crawler this year. Meta-ExternalAgent, which led the Q1 growth ranking, reversed direction in April with a -2.2 pp contraction.

What percentage of web traffic comes from AI bots?

The percentages in this report represent share of identified AI bot requests, not share of total web traffic. Cloudflare Radar tracks the proportion of AI-related crawler activity relative to other AI bots, providing a competitive landscape view. The actual percentage of total web traffic from AI bots varies by website, but industry estimates suggest AI crawlers now account for a meaningful and growing share of overall internet traffic, particularly for content-heavy sites in retail, technology, and media.

How can I monitor AI crawler activity on my own website?

Check your server access logs for known AI bot user-agent strings (GPTBot, ClaudeBot, meta-externalagent, Applebot, Bytespider, Amazonbot, etc.). Most web analytics platforms filter out bot traffic by default, so log-level analysis gives the most accurate picture. Cloudflare users can view AI bot activity directly in their dashboard. For a structured approach, consider using a web search API to understand how your content appears in AI-powered search results and ensure your most important pages are properly accessible.

Cloudflare Radar Data Source & Methodology

Understanding where this data comes from — and what it can and cannot tell you — is critical for interpreting the trends above. Here's a full breakdown of how Cloudflare Radar collects, classifies, and aggregates the AI crawler data used in this report.

Network Scale

MetricValue
Global presence330 cities in 125+ countries
HTTP requests81 million/second average, peaks >129 million/second
DNS queries67 million/second (authoritative + resolver)

This scale is what makes Cloudflare Radar one of the most comprehensive sources of internet traffic data available. The data in this report comes from two primary sources:

  1. Cloudflare's global network — real-time traffic data from HTTP requests flowing through their infrastructure
  2. 1.1.1.1 public DNS resolver — aggregated and anonymized DNS query data

For routing data, Cloudflare also uses RIPE RIS data from RIPE NCC (BGP route collectors).

How AI Bots Are Identified

Cloudflare uses a layered detection system to identify and classify AI crawlers:

  1. User-agent string matching — the most basic method; identifies bots that transparently announce themselves (GPTBot, ClaudeBot, etc.)
  2. Verified Bot Directory — manual approval process requiring bots to maintain public robots.txt commitments, use dedicated/verifiable IPs, unique user-agents, and honor crawl-delay settings
  3. Machine learning — supervised ML system that assigns a Bot Score (1-99)
  4. Heuristics — tailored rulesets for AI bot classification
  5. Behavioral analysis — pattern recognition from request sequences
  6. AI Labyrinth honeypot — hidden links to AI-generated decoy pages; bots that follow them are identified with high confidence since human visitors never see these links
  7. ai.robots.txt list — used as the basis for which AI bots to track

💡 Expert Insight: The layered approach matters because not all AI crawlers identify themselves honestly. User-agent matching catches transparent bots like GPTBot and ClaudeBot. Behavioral analysis and honeypots catch crawlers that disguise themselves as regular browsers.

How Crawl Purpose Is Classified

Bots are categorized into these purpose buckets:

  • Training: dedicated training crawlers (GPTBot, ClaudeBot, Meta-ExternalAgent)
  • Search: AI search bots (OAI-SearchBot)
  • User Action: bots fetching pages for real-time user queries (ChatGPT-User)
  • Mixed Purpose: bots serving dual roles like search indexing + AI training (Googlebot, Bingbot)
  • Undeclared: purpose not identifiable

Data Aggregation Methods

  • 7-day trailing average used to smooth daily fluctuations
  • IPv4 addresses aggregated into /20 prefixes for visualization
  • HTML traffic is separately classified into human, AI bot, and non-AI bot categories
  • Normalization: data expressed as percentage of total requests (not absolute counts)
  • Methodologies remain unchanged year-over-year for valid comparisons
  • API data available under CC BY-NC 4.0 license

Caveats & Limitations

⚠️ Warning: Keep these limitations in mind when interpreting the data in this report:

  • Countries with insufficient data volume are excluded from trend reporting
  • Some metrics are available only at worldwide level, not per-country
  • Mobile device categorization relies on User-Agent headers (accuracy limitations)
  • Speed test data excludes locations with fewer than 100 tests per week
  • The "location" filter corresponds to the billing country of the Cloudflare customer whose site received the traffic, not where the crawler is physically located
  • Cloudflare sees traffic only to sites behind its network, not the entire internet, so the data is representative but not exhaustive

Report Parameters

This edition uses data from Cloudflare Radar's AI Insights endpoint (/radar/ai/bots/summary/*), Workers AI inference endpoint (/radar/ai/inference/summary/*), and robots.txt analysis endpoint (/radar/robots_txt/top/user_agents/directive). The April 2026 monthly data covers April 4 through May 4, 2026, with month-over-month comparisons against the prior 30-day window (March 4 - April 4, 2026). The Q1 2026 Quarterly Review compiles data from three consecutive monthly analyses covering January through March 2026.

I queried bot traffic breakdowns by user agent, crawl purpose, industry, and vertical. Workers AI data covers model and task distribution by account share. Robots.txt analysis covers domain-level crawler policies for AI user agents. All percentages represent share of identified AI bot requests (for crawling data) or share of accounts (for Workers AI data), not share of total web traffic.

⚠️ Methodology change for April 2026: Cloudflare Radar's robots.txt parsed-corpus was refreshed in early Q2 2026, changing the share denominator. April robots.txt percentages are computed against 4,102 parsed robots.txt files at the most recent snapshot and are not directly comparable to January–March percentages. This change affects only the robots.txt section; AI bot traffic data, Workers AI data, and industry data use the same methodology as prior editions.

Data source: Cloudflare Radar AI Insights API endpoints (radar.cloudflare.com). Last updated: May 9, 2026.

About the Author: I'm James Bennett, Lead Engineer at WebSearchAPI.ai, where I architect the core retrieval engine enabling LLMs and AI agents to access real-time, structured web data with over 99.9% uptime and sub-second query latency. With a background in distributed systems and search technologies, I've reduced AI hallucination rates by 45% through advanced ranking and content extraction pipelines for RAG systems. My expertise includes AI infrastructure, search technologies, large-scale data integration, and API architecture for real-time AI applications.

Credentials: B.Sc. Computer Science (University of Cambridge), M.Sc. Artificial Intelligence Systems (Imperial College London), Google Cloud Certified Professional Cloud Architect, AWS Certified Solutions Architect, Microsoft Azure AI Engineer, Certified Kubernetes Administrator, TensorFlow Developer Certificate.