All posts
YouTube Analysis

How Search Engines Really Work: Crawling, Indexing, Ranking, and Serving

A technical breakdown of web search systems based on ByteByteGo's explainer — from crawl budgets and inverted indexes to learning-to-rank models and globally distributed serving clusters.

JBJames Bennett
14 minutes read

Web search runs on four tightly coupled systems: crawlers that map the open web, indexers that turn pages into retrievable tokens, ranking models trained on human-rated results, and globally distributed serving clusters. ByteByteGo's 9-minute explainer walks through every stage, and this post unpacks what matters for engineers building search-driven AI.

Video Summary and Key Insights

ByteByteGo, the YouTube channel from the authors of the System Design Interview books, published a 9:17 overview of how modern search engines actually work end to end. The video traces a web page's journey from crawl queue to ranked result, covering crawl budgets, JavaScript rendering, inverted indexes, learning-to-rank models, query classification, and the distributed serving infrastructure that answers billions of queries a day. The single most important takeaway: search is not one algorithm. It's four pipelines — crawl, index, rank, serve — each with its own failure modes, and each evolving under machine learning pressure.

Key Insights:

  • Crawlers can only touch a fraction of the web each day. Search engines allocate a "crawl budget" per site based on architecture, sitemaps, and internal link quality. New sites might be hit every few minutes. Stale ones might see a bot once a month.
Even with their immense processing power, search engines can only crawl a fraction of the Internet daily. They carefully allocate a crawl budget based on site architecture, site maps, and internal link quality.
B
ByteByteGoSystem Design Channel
  • JavaScript-heavy sites get crawled twice. Modern crawlers first fetch static HTML, then execute JavaScript in a second rendering pass. That render pass is expensive, which is why SPAs without SSR often lag behind server-rendered competitors in indexing speed.

  • The inverted index is the heart of the system. Instead of storing documents and scanning them, search engines flip the data structure — they map each word to every document it appears in. That's what makes sub-second full-text lookup across billions of pages possible.

  • Ranking is done by machine learning, not hand-written rules. Google, Bing, and the rest use learning-to-rank models trained on massive datasets of queries paired with human-rated results. Classical signals (link graph, content quality, freshness) become features, not rules.

  • Queries get classified into three intent buckets. Search engines tag incoming queries as navigational (find a specific site), informational (answer a question), or transactional (complete an action). The ranking blend changes based on which bucket you land in.

Queries are often categorized as navigational, informational, or transactional, helping the engine tailor its results accordingly.
B
ByteByteGoSystem Design Channel
  • The search index is too big for any single machine. It's sharded across numerous servers, replicated for redundancy, and spread across data centers on multiple continents. New content is typically indexed in a side structure before being merged into the main index.

  • Personalization layers on top of base ranking. Location, prior searches, and engagement history all feed back into what you see — balanced against a deliberate diversity constraint so results don't collapse into a filter bubble.

Why Search Engine Internals Matter if You're Building with AI

Most AI engineers I talk to treat search as a black box. You hit an API, you get links, you pass them to a language model. That works until the retrieval quality drops — and when it drops, you have no vocabulary to describe why.

This video is a good refresher because it spells out the four layers that every search system, including the web search API powering WebSearchAPI.ai, has to solve. Once you know where crawl budgets, inverted indexes, and learning-to-rank models live in the pipeline, you can reason about why a page is missing, why a result is stale, or why your RAG agent keeps pulling the same outdated answer. That's the angle I'm taking here: quote the video, then tie each concept back to decisions an AI engineer actually has to make.

How Does Web Crawling Actually Work?

Crawling is the first and most resource-bound stage of search. A crawler starts with a list of seed URLs and walks the link graph outward, combining breadth-first and depth-first traversals so it discovers both new domains and deeper content inside sites it already knows.

Search engine deploys advanced crawlers that combine breadth first and depth first strategies to efficiently explore web pages. These crawlers begin with seed URLs and follow hyperlinks to discover new content.
B
ByteByteGoSystem Design Channel

Two details from the video matter more than they sound. First, crawlers don't just fetch pages — they score them before queueing, using signals like external link count, update frequency, and perceived authority. That's why a well-linked domain gets crawled aggressively and a brand-new blog with no backlinks can sit uncrawled for weeks. Second, the URL queue itself is the bottleneck. You have essentially infinite URLs and finite bandwidth, so the scheduler has to balance discovery of new content against refresh of existing content. Our monthly AI crawler report tracks which bots are actually hitting your site and how that mix has shifted over the last year — a useful companion to this video if you publish content you want AI engines to cite.

Here's how the video describes the refresh cadence:

New sites might be crawled every few minutes while less frequently updated pages might only see a crawler once a month.
B
ByteByteGoSystem Design Channel

Two practical consequences if you're publishing content for AI engines to cite:

  1. Update signals matter. A page that changes once and never again looks stale to the scheduler. Pages that get pinged with meaningful updates get more crawl attention.
  2. Internal link quality is a crawl-budget lever, not just a ranking one. Good internal links help the crawler find and prioritize your pages. Orphan pages with no inbound internal links rarely get re-crawled.

The JavaScript Rendering Problem

JavaScript logo on a dark background illustrating how modern crawlers must render dynamic content before indexing

Modern sites push a lot of content into client-side JavaScript. That breaks the classic crawler model, because the HTML the server sends contains almost nothing useful. Search engines now run a two-phase approach: fetch static HTML first, then render JavaScript in a headless browser to capture the post-hydration DOM.

To address this, crawlers use a two phase approach. First crawling static HTML, then rendering JavaScript to capture the full page content. This process is computationally intensive.
B
ByteByteGoSystem Design Channel

"Computationally intensive" is understating it. The video doesn't spell it out, but render passes are an order of magnitude more expensive than HTML fetches, which is why sites that rely on pure client-side rendering often show up in search results days or weeks after an SSR equivalent would. If you're building an AI-facing site, server-side rendering or static generation is the safer bet. It's not a hypothetical — it's a budget decision the crawler is making on every URL.

Duplicate Detection and URL Normalization

Crawlers spend a surprising amount of effort not crawling things. URL normalization collapses different URL forms that point to the same page (trailing slashes, UTM parameters, session IDs, http vs https), and content fingerprinting detects near-duplicates so the system doesn't index the same article from five different domains. Both of those checks happen before a page hits the index, which is how search engines keep billions of pages manageable instead of trillions.

How Do Search Engines Turn Pages Into an Index?

Once a page is crawled, indexing begins. The job is to convert unstructured text into a data structure that supports sub-second retrieval across billions of documents. The video walks through this in three stages: tokenization, linguistic normalization, and inverted index construction.

Tokenization breaks text into words and phrases. That sounds trivial until you hit languages without whitespace boundaries like Chinese or Japanese, where segmenting "where does one word end" is itself a machine learning problem. English makes it easy. Most other languages don't.

The next step is stemming and lemmatization — collapsing "running," "runs," and "ran" into the same concept. Context analysis handles ambiguity: does "jaguar" mean the animal or the car? The engine uses surrounding text to disambiguate before writing the term to the index.

Then comes the part that actually makes search fast:

The process tags feeds into the indexing pipeline with the inverted index at its core. This powerful data structure enables rapid retrieval of documents containing specific terms, essentially mapping which words appear in which documents.
B
ByteByteGoSystem Design Channel

The inverted index is the reason you can type a query and get results in milliseconds instead of minutes. Conceptually it's a hash map from each term to a posting list of the document IDs where that term appears, plus positional data for phrase matching. When you search for web search api, the engine intersects the three posting lists and ranks the result — it never touches documents that don't contain all three terms.

The Index Size Problem

ChallengeHow Search Engines Handle It
Index too large for memoryCompression via techniques like variable-byte encoding and gap encoding of posting lists
Different languages, scriptsPer-language tokenizers and stemmers, Unicode normalization
Duplicate contentURL normalization + content fingerprinting before indexing
Constant updatesSeparate "fresh" indexes merged periodically into the main index
Quality filteringLow-quality pages held in a pre-index staging area for evaluation

The video mentions that some engines use machine learning to dynamically optimize compression based on content characteristics. That's the kind of detail that matters if you're ever forced to build your own search index — the classic assumption that "compression is a solved problem" doesn't hold when your posting lists run into the billions.

How Do Ranking Algorithms Decide Which Page Wins?

Retrieval gets you candidates. Ranking decides the order. This is where modern search differs most from the PageRank-era systems some developers still picture when they hear "search algorithm."

Modern ranking systems rely heavily on advanced machine learning models. These models are trained on massive datasets of search queries and human rated results, learning to recognize what makes a result relevant.
B
ByteByteGoSystem Design Channel

The key phrase there is "learning to rank." Instead of engineers writing a weighted formula like score = 0.3 * link_authority + 0.2 * freshness + 0.5 * content_match, teams train models on query-document pairs labeled by human raters. The classical signals — link quality, content depth, freshness, click-through rate — become input features, not rules. The model learns its own weights and captures patterns no human would code by hand. If you want the deeper backstory on how attention-based models power modern ranking, our transformer architecture explained post walks through why the 2017 Attention Is All You Need paper reshaped every ranking system that followed it.

The video lists the signal categories ranking systems examine:

  1. Relevance — topic coverage, keyword presence, semantic match to query
  2. Quality and authority — site reputation, content depth, intent match
  3. User engagement — click-through rates, dwell time, return visits
  4. Technical health — page speed, mobile friendliness, Core Web Vitals
  5. Link analysis — inbound link count and source quality, natural vs artificial patterns
  6. Freshness — prioritized for news queries, not for evergreen topics
  7. Personalization — location, search history, balanced against diversity constraints

No single signal dominates. That's the shift from the PageRank era: in 2002, link count was the killer feature. In 2026, it's one feature among dozens, and its weight depends on the query class.

Two Google search results showing location-based personalization with soccer preferred in one region and American football in another

The personalization point from the video is a good example of why this matters. The same query for "football" returns American football in the US and soccer in most of Europe. That's not a bug — it's the ranker reading a location feature and picking the interpretation its user model expects.

How Do Search Engines Understand Your Query?

Queries are short. The video pins the number at "a few words long," which is the polite way of saying search engines have almost no context to work with. Query understanding is the pipeline stage that turns three ambiguous words into a structured intent.

Google search bar showing partial query input during query parsing and intent detection

The video breaks query understanding into three jobs: parsing (spell correction, tokenization), intent classification (navigational, informational, transactional), and query expansion (adding related terms to catch rare or ambiguous searches).

They correct spelling errors, expand queries with related terms, and use advanced analysis methods to handle rare and ambiguous searches.
B
ByteByteGoSystem Design Channel

Intent classification is the most important of the three for downstream ranking. A navigational query (github login) wants one specific URL. An informational query (how does pagerank work) wants an explainer. A transactional query (buy noise cancelling headphones) wants product pages. The ranker applies different signal weights to each bucket — freshness matters for news-adjacent informational queries but barely registers for navigational ones.

If you're building an AI agent or RAG system, this is the part of the stack you most often end up reimplementing. Your agent needs to know whether a user wants a URL, a fact, or a product before it picks a tool. Most fail here. They treat every input as one generic "search," and the results reflect that. Google's own approach — documented in our grounding with Google Search in the Gemini API breakdown — leans on dynamic retrieval scores to decide when an LLM should fall back to live web data. That's essentially query classification running inside the model rather than in front of it.

How Is Search Served at Billions of Queries a Day?

Distributed database cluster on a globe showing search index replication across a single data center

The last layer is the boring-but-critical one: infrastructure. A full web-scale index is too large for any single machine, so it's sharded across clusters, replicated for redundancy, and spread across data centers on multiple continents.

The search index itself is too vast for a single machine, so it's distributed across numerous servers with redundancy for reliability. These serving clusters span multiple data centers globally.
B
ByteByteGoSystem Design Channel

Two globally distributed server clusters illustrating search index replication across multiple data centers

A typical query hits a load balancer, gets routed to the nearest serving cluster, fans out across index shards in parallel, and collects partial results before a ranker merges them. All in under 200 milliseconds. The video makes a small but important point about freshness: new content is usually written to a separate "fresh index" first and merged into the main index on a schedule, because touching the main index on every new URL would be too expensive.

This is the shape of every production search system I've worked on, including the one behind WebSearchAPI.ai. You don't run one big index. You run a fresh tier, a main tier, and a cold tier, and you accept that newly published pages take time to propagate through all three. The same tiered pattern shows up when you look at how Anthropic's Claude web search API serves live results to Claude sessions — it's the only sane way to balance freshness against retrieval cost.

Frequently Asked Questions

What is a crawl budget and why does it matter?

A crawl budget is the number of URLs a search engine is willing to fetch from your site in a given window. According to ByteByteGo's explainer, crawlers allocate budget based on site architecture, sitemap quality, and internal link structure. New or low-authority sites get smaller budgets, which is why publishing 10,000 pages on a fresh domain won't get them all indexed — the crawler will pick the most important ones and skip the rest.

Why can't search engines just crawl the entire web every day?

Scale and bandwidth. ByteByteGo notes that even with massive infrastructure, search engines can only touch a fraction of the Internet each day. The open web contains hundreds of billions of URLs, most of which are low-value duplicates or stale. The scheduler prioritizes pages most likely to be useful to users rather than trying to achieve full coverage.

What is an inverted index and why is it important?

An inverted index is a data structure that maps each word to the list of documents containing it, rather than storing documents and scanning them at query time. ByteByteGo describes it as "the heart of the indexing pipeline" because it's what makes sub-second full-text search across billions of pages possible. Querying web search api intersects three posting lists rather than scanning every document.

How do search engines handle JavaScript-rendered pages?

They use a two-phase crawl: first fetch the static HTML, then run the JavaScript in a headless rendering engine to capture the fully hydrated DOM. ByteByteGo calls this process "computationally intensive," which is why pure client-side rendered sites typically get indexed slower than server-rendered ones. Server-side rendering or static generation remains the safer choice for SEO-sensitive content.

What's the difference between navigational, informational, and transactional queries?

ByteByteGo's video defines these as the three query classes search engines use for intent detection. Navigational queries want a specific site (github login), informational queries want an answer or explanation (what is an inverted index), and transactional queries want to complete an action (buy bluetooth headphones). The ranker applies different signal weights to each class — freshness matters more for news, authority more for informational queries, and product data more for transactional ones.

How do modern ranking algorithms use machine learning?

According to the video, ranking is now driven by learning-to-rank models trained on massive datasets of queries paired with human-rated results. Classical signals like link authority, content depth, freshness, and click-through rate become input features to the model rather than hand-weighted rules. The model learns weights from training data, which captures patterns engineers couldn't code manually.

Why does location affect my search results?

Personalization is one of the ranking signals. Your location, search history, and engagement patterns feed into the ranker to tailor results — which is why "football" returns American football in the US and soccer in Europe. ByteByteGo notes that search engines balance personalization against a need for diverse perspectives to avoid pure filter bubbles.

How do search engines keep their index fresh if new content is added constantly?

Most large search engines index new content into a separate "fresh" data structure before merging it into the main index on a schedule. ByteByteGo describes this as an "ongoing challenge" in the video — modifying the main distributed index on every new URL would be prohibitively expensive, so fresh content lives in a side tier until a merge job moves it.

Key Takeaways

  • Search is four pipelines, not one algorithm: crawl, index, rank, serve. Each has its own budget, failure modes, and optimization pressure.
  • Crawlers operate on a budget. Internal link quality, sitemap health, and update frequency determine how much of that budget your site gets.
  • JavaScript rendering is expensive enough to create an indexing lag. SSR or static generation remains the safer path for content you want indexed fast.
  • The inverted index is what makes sub-second full-text search across billions of pages possible. Every modern retrieval system, including vector-augmented ones, still relies on this idea.
  • Ranking is machine-learned. Classical signals — links, freshness, authority, CTR — are features in the model, not hand-coded rules.
  • Query intent classification (navigational, informational, transactional) changes how the ranker weights signals. AI agents that ignore this layer retrieve worse results than classic search would.
  • The index is sharded, replicated, and spread across global data centers, with a fresh tier for new content. Full index propagation takes time — don't expect a new URL to rank worldwide in minutes.

Want to go deeper on the retrieval layer that feeds AI agents? Compare vendors with our writeups on Exa AI alternatives and Tavily alternatives for AI agents, read our breakdown of Google AI search experience (AI Mode and AI Overviews), or see where traffic is actually coming from in our search engine referral insights report.

This post is based on How Search Really Works by ByteByteGo.