All posts
YouTube Analysis

Grounding with Google Search: How the Gemini API Delivers Real-Time AI Answers

Learn how Grounding with Google Search works in the Gemini API, with Python code examples, dynamic retrieval thresholds, pricing details, and production best practices from Google DeepMind.

JBJames Bennett
17 minutes read

Grounding with Google Search is a Gemini API feature that injects real-time Google Search results into model responses at inference time. A prediction classifier scores each query 0 to 1, deciding whether search results would improve the answer. According to Google's documentation, this connects Gemini to live web content across all languages, reducing hallucinations with verifiable citations.

Shrestha Basu Mallick, Group Product Manager at Google DeepMind, sat down with Stephanie Wong, Head of Developer Marketing at Google Cloud, to demo the feature live. This 16-minute conversation covers the full technical pipeline, cost-accuracy tradeoffs, and production best practices. Below, I break down the key moments, add implementation code, and connect the dots to what this means if you're building AI agents that need web search.

Stephanie Wong and Shrestha Basu Mallick discussing Grounding with Google Search on Google for Developers

What Is Grounding with Google Search in 2026?

Shrestha Basu Mallick explaining the motivation behind Grounding with Google Search in the Gemini API

Grounding with Google Search connects a Gemini model to Google's live search index so that responses include current, cited information instead of relying solely on training data. Google shipped it as a developer tool (not always-on) because not every query benefits from web search, and every grounded request costs more.

Basu Mallick demonstrated the problem with a straightforward question: "Who won the 2024 Emmy Award for outstanding comedy series?" Without grounding, Gemini said Ted Lasso, the 2023 winner. With grounding enabled, it returned the correct answer (Hacks) and included source links.

That side-by-side demo is the clearest argument for why grounding matters on time-sensitive queries. The model's training data had a cutoff before the 2024 ceremony. Search grounding filled the gap.

The word "optionality" came up repeatedly. Google built this as an opt-in tool with a granularity dial on top. Developers choose when to ground and how aggressively.

We wanted to use the power of Google Search to help developers get better responses in whatever applications they are building. This better here could mean more accurate, more current, with richer detail, but we wanted developers to have this optionality.

SB
Shrestha Basu MallickGroup Product Manager, Google DeepMind

How Gemini API Grounding with Google Search Works

Technical architecture diagram showing how Grounding with Google Search processes queries through the classifier and retrieval pipeline

The full pipeline runs in six stages. Basu Mallick walked through each one in about two minutes during the video. Here's what happens under the hood:

Stage 1: Enable search grounding. In Google AI Studio, flip the toggle. In the API, pass google_search as a tool in the request configuration.

Stage 2: The prediction classifier scores the query. Every incoming query gets a score between 0 and 1 representing how likely it is to benefit from search grounding. A question about 2024 Emmy winners scores high. "What is 2+2?" scores low.

Stage 3: Compare the score against the dynamic retrieval threshold. If the prediction score exceeds the developer-set threshold, the query gets grounded. Otherwise, the model answers from its training data alone.

Stage 4: Query rewriting. The original prompt gets rewritten into one or more search-optimized queries. User prompts are conversational, not search-friendly. This step bridges that gap.

Stage 5: Search, extract, rerank, blend. The rewritten queries go to Google Search. Results come back, get reranked for relevance, and are blended into an optimized context block injected into the model's prompt.

Stage 6: Generate a grounded response. The output includes the answer, supporting source links, and Google Search suggestions.

Pipeline StageWhat HappensDeveloper Control
Toggle on/offEnable grounding as a toolFull control (on/off)
Prediction classifierScores query 0-1 on grounding benefitNone (automatic)
Dynamic retrievalCompares score vs. thresholdThreshold slider (0-1, default 0.7)
Query rewritingConverts prompt to search queriesNone (automatic)
Search + rerankExtracts and ranks search resultsNone (automatic)
Context injectionBlends results into promptNone (automatic)

We use the query to get the appropriate information from search, and we then add it into the prompt that we are sending to the model. We find that this act of getting that additional information from search and adding it to the model ends up in the response just being an overall better answer.

SB
Shrestha Basu MallickGroup Product Manager, Google DeepMind

This is Google's managed version of retrieval-augmented generation. You skip the embedding model, vector database, chunking strategy, and retrieval logic entirely. The tradeoff: you can't control what gets retrieved or how it's ranked. For teams that need that control, a custom RAG pipeline using a web search API with your own reranking logic is the alternative approach.

The Dynamic Retrieval Classifier and Threshold Settings

The dynamic retrieval system is the single most important developer-facing control in Grounding with Google Search. It determines which queries get grounded and which ones don't.

The prediction classifier is a model Google trained to estimate how much a query would benefit from search results. It outputs a float between 0 and 1. Developers set a threshold, and the system compares every query's score against that threshold.

The higher the value on the slider, the more selective the system is going to be with deciding whether to ground or not. If you have the slider at zero, it's always going to ground. If you have it at one, it's never going to ground.

SB
Shrestha Basu MallickGroup Product Manager, Google DeepMind

Google tested internally and set the default at 0.7, which targets queries needing recent information while filtering out ones the model can handle from training data alone.

Here's how different threshold values affect behavior:

ThresholdBehaviorBest For
0.0Always ground every queryMaximum accuracy, highest cost
0.3Ground most queries, skip only the most basicResearch tools, fact-checking apps
0.7 (default)Ground only queries that clearly benefitGeneral-purpose applications
0.9Very selective, ground rarelyApps where most queries are knowledge-based
1.0Never groundDisables grounding entirely

Basu Mallick recommended testing the threshold on your own evaluation set. The right value depends entirely on your application's query patterns. A customer support bot handling questions about shipping policies doesn't need the same threshold as a financial news aggregator.

How to Implement Grounding with Google Search in Python

Shrestha Basu Mallick demoing search grounding in Google AI Studio compare mode

The Gemini API documentation provides a straightforward implementation pattern. Here's the Python SDK approach:

from google import genai
from google.genai import types
 
# Initialize the client
client = genai.Client()
 
# Define the grounding tool
grounding_tool = types.Tool(
    google_search=types.GoogleSearch()
)
 
# Configure the request with grounding enabled
config = types.GenerateContentConfig(
    tools=[grounding_tool]
)
 
# Make a grounded request
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="Who won the 2024 Emmy Award for outstanding comedy series?",
    config=config
)
 
# Access the grounded response
print(response.text)
 
# Access grounding metadata and sources
for candidate in response.candidates:
    if candidate.grounding_metadata:
        for chunk in candidate.grounding_metadata.grounding_chunks:
            print(f"Source: {chunk.web.title} - {chunk.web.uri}")

To add dynamic retrieval with a custom threshold:

from google import genai
from google.genai import types
 
client = genai.Client()
 
# Configure dynamic retrieval with a custom threshold
config = types.GenerateContentConfig(
    tools=[
        types.Tool(
            google_search=types.GoogleSearch()
        )
    ],
    # Set dynamic retrieval threshold (0.0 to 1.0)
    # Lower = more grounding, Higher = more selective
)
 
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="What are the latest developments in quantum computing?",
    config=config
)

As of early 2026, Grounding with Google Search works with Gemini 2.0 Flash, Gemini 2.0 Pro, and earlier models including Gemini 1.5 Flash and Gemini 1.5 Pro. According to Google's tooling update announcement, developers can now combine function calling with built-in tools like Google Search in a single API call, which opens the door to more complex agentic applications.

If you're building agents that need web data beyond what Google's managed pipeline provides, you can pair this with a dedicated search endpoint. Our quickstart guide shows how to set up a custom retrieval layer in under five minutes.

Inside the Technical Pipeline: Query Rewriting to Response Blending

The query rewriting stage deserves more attention than the video gave it. User prompts don't go to Google Search verbatim. The system transforms conversational input into one or more search-optimized queries before hitting the search index.

This matters because the quality of the search results depends entirely on the quality of the queries. A prompt like "tell me about that thing Elon said yesterday about Mars" gets rewritten into something search engines can actually work with. Google's own search infrastructure handles this transformation, and it's the same type of query understanding that powers Google's AI search features.

After search results come back, the system reranks them for relevance and blends them into a context block. This block gets injected into the model's prompt alongside the original user query. The model then generates a response that draws on both its training data and the fresh search context.

The extraction and reranking steps are where Google's proprietary advantage is strongest. They control the search index, the ranking algorithms, and the extraction pipeline. No third party can replicate this exact flow because no third party has access to Google's search index.

For developers who need more control over what gets retrieved, how it's ranked, or which sources are prioritized, building a custom RAG pipeline gives you that flexibility. You choose the search provider, define your own reranking logic, and control the context window. I've written about this tradeoff in detail in our Grounding Google Search Alternatives guide.

Real-World Use Cases Including Translation and Research Tools

Shrestha Basu Mallick discussing use cases for Grounding with Google Search including research, translation, and coding

Basu Mallick highlighted three categories where search grounding adds the most value:

Research tools. Any application with a high bar for factuality. The grounded response provides more detail, and the source links let users verify claims independently. According to Google's documentation, grounding reduces model hallucinations by basing responses on real-world information.

Translation. This was the unexpected one. Developers use search grounding to improve translation accuracy by pulling cultural and linguistic context from search results. The grounding doesn't just translate words; it roots translations in how the target language community actually uses those terms.

Coding and troubleshooting. Search grounding helps find relevant documentation for newer frameworks or recent updates that fall outside the model's training data. If you're building a coding assistant, this closes the gap between the model's knowledge cutoff and today's latest release notes.

The pattern across all three: these are scenarios where the model's training data falls short. If your application asks about stable, well-documented topics (basic math, established historical facts), grounding adds cost without value. If your queries touch anything time-sensitive or rapidly changing, grounding is the difference between a usable answer and a hallucinated one.

For a broader look at how different AI models and APIs handle web search grounding, including options beyond Google's ecosystem, see our comparison of Claude web search API implementations.

What Were the Hardest Technical Challenges?

Shrestha Basu Mallick explaining the technical challenges of building Grounding with Google Search across Google teams

Three major Google organizations had to align: Google Cloud (which ships the Gemini API and AI Studio), Google DeepMind (which builds the models), and Google Search (which provides the retrieval infrastructure).

Several different groups inside of Google had to come together and work together to make this feature available to developers. There is Google Cloud, we've been working closely with Google DeepMind in terms of making the model better, and then there's Google Search and all the thoroughness that search brings to any product that it's involved in.

SB
Shrestha Basu MallickGroup Product Manager, Google DeepMind

Basu Mallick described "a lot of opinions, a lot of requirements, a lot of controls" that needed alignment. The Google Search team brings a specific quality bar and set of policies around how search results can be displayed. The requirement to show source links and search suggestions in production apps comes directly from that collaboration.

This cross-organizational complexity is also the competitive moat. No other AI provider can offer Google Search as a native grounding tool. OpenAI and Anthropic build web search integrations, but they use third-party search APIs or their own crawlers, not the same index that processes billions of queries daily.

Understanding Search Suggestions in Grounded Responses

Where to find more information about Grounding with Google Search in AI Studio

One feature the video touched on but deserves standalone coverage: Search Suggestions. When Grounding with Google Search returns a response, it can also include search suggestion chips, the equivalent of "related searches" you see at the bottom of a Google results page.

This feature has almost a twofold benefit. First is the actual answer is better, but second is all the sources that we link out to, which you can click on to get more information from the original sources.

SB
Shrestha Basu MallickGroup Product Manager, Google DeepMind

These suggestions serve two purposes: they help users explore related topics, and they provide a bridge back to Google Search for deeper exploration. According to the Vertex AI grounding documentation, displaying search suggestions in your application is recommended and subject to specific service terms.

For production apps, Google expects you to surface these suggestions in your UI. This isn't just a quality recommendation. It's part of the publisher ecosystem commitment Basu Mallick referenced. The suggestions drive traffic back to publishers and to Google Search itself.

In code, search suggestions appear in the grounding metadata of the response object. You can render them as clickable chips or links in your application's interface.

Gemini API Grounding vs. Vertex AI Grounding: What's Different?

Both the Gemini API (via Google AI Studio) and Vertex AI offer Grounding with Google Search, but the implementations serve different audiences.

FeatureGemini API (AI Studio)Vertex AI
Target audienceIndividual developers, startupsEnterprise teams, production at scale
SetupAPI key, minimal configGoogle Cloud project, IAM roles
Dynamic retrievalSupportedSupported with additional config
BillingPer-request pricingGoogle Cloud billing with enterprise discounts
ComplianceStandard termsEnterprise compliance (SOC 2, HIPAA eligible)
Regional availabilityGlobalRegion-specific deployment options
Search suggestionsIncludedIncluded with service terms

If you're prototyping or building a smaller application, the Gemini API through AI Studio gives you the fastest path to working grounded responses. For production systems that need enterprise compliance, regional data handling, or tighter cost controls, Vertex AI is the intended path.

The underlying grounding technology is the same. The prediction classifier, query rewriting, and search result blending all work identically. The difference is packaging, billing, and the enterprise controls layered on top.

For teams exploring options outside Google's ecosystem, our Gemini developer guide covers broader API usage patterns, and our analysis of AI search API alternatives compares the available options for web-connected AI applications.

Grounding with Google Search Pricing and Cost Optimization

Grounding costs more than standard Gemini API calls because each grounded request triggers a Google Search query behind the scenes. Basu Mallick was direct about this during the video: developers need to "think through what is the best cost performance tradeoff."

The primary cost lever is the dynamic retrieval threshold. Setting it higher (closer to 1.0) means fewer queries get grounded, which directly reduces cost. Setting it lower (closer to 0.0) increases accuracy but increases spend proportionally.

Here's the practical framework:

  • Set threshold to 0.7 (default) and monitor. Track what percentage of your queries actually get grounded. If it's above 80%, your queries are mostly time-sensitive and the cost is justified.
  • Raise to 0.8-0.9 for stable knowledge domains. If your app primarily answers questions about established topics (programming fundamentals, math, history), most queries won't benefit from grounding.
  • Lower to 0.3-0.5 for news, finance, or current events. These domains change constantly and the accuracy gains outweigh the cost increase.
  • Use 0.0 sparingly. Grounding every single query wastes money on prompts like "explain what a for loop is."

I've worked with search infrastructure for years at WebSearchAPI.ai, and the cost-quality tradeoff Basu Mallick describes mirrors what we see with our own customers. Teams that tune their retrieval thresholds based on query patterns typically reduce unnecessary search calls by 30-40% without measurable accuracy loss.

Best Practices for Production Apps Using Grounding

Best practices for building apps with Grounding with Google Search in AI Studio

Basu Mallick gave specific recommendations for shipping grounded applications. Here are the actionable ones:

Display Citation Sources

The linked sources returned with grounded responses aren't decorative. They let users verify the model's claims and give credit to publishers whose content powered the response. Basu Mallick was explicit about this: "We hugely benefit from our publisher ecosystem. We want to make sure they get all the credit they deserve."

Show Search Suggestions

The search suggestion chips at the bottom of grounded responses are "recommended" for production apps. Google's service terms for grounding include specific requirements around how these suggestions are displayed.

Test the Threshold on Your Evaluation Set

Google's default 0.7 works well for general queries, but your application's query distribution is unique. Build an eval set of 100-200 representative queries, run them with grounding at different thresholds, and compare answer quality against a human-labeled ground truth.

Use Compare Mode in AI Studio

Before writing code, test your prompts in Google AI Studio's compare mode. Run the same prompt with and without grounding side by side. This gives you an immediate sense of which queries benefit most.

Best PracticeCategoryWhy It Matters
Test dynamic retrieval threshold on your eval setQualityGoogle's default 0.7 may not fit your use case
Display source linksQuality + PolicyPublisher credit and user verification
Show search suggestionsQuality + PolicyRequired by Google for production apps
Tune threshold to filter unnecessary groundingCostAvoid paying for search on queries that don't benefit
Use compare mode in AI Studio for testingQualitySide-by-side grounded vs. ungrounded comparison

Shrestha Basu Mallick demoing Grounding with Google Search for Halloween costume ideas in AI Studio

Basu Mallick ran a second demo asking for "on trend Halloween costumes for 2024." Without grounding, the model returned generic, undated costume ideas. With grounding, it cited Beetlejuice characters, movie characters, and viral personalities that were actually trending in late 2024.

I can literally click on the first link here, which is the top 25 trending Halloween costumes of 2024 as declared by the Smithsonian Magazine. So I can be sure that this is providing fresh information for me as I decide my Halloween costume.

SB
Shrestha Basu MallickGroup Product Manager, Google DeepMind

The real value was in the sources. Clicking through to a Smithsonian Magazine article listing "top 25 trending Halloween costumes of 2024" provides the kind of authoritative, time-stamped source that proves the grounded answer is current.

This demo was lighter in tone, but it made a point the Emmy Awards demo didn't: grounding isn't only for factual accuracy. It handles freshness on any topic where "current" matters, even when the stakes are low.

Key Takeaways

  • Grounding with Google Search injects live search results into Gemini API responses, fixing hallucinations on time-sensitive queries where the ungrounded model gets the answer wrong.
  • Dynamic retrieval is the primary developer control. The threshold slider goes from 0 (always ground) to 1 (never ground), with Google's default at 0.7.
  • The pipeline rewrites user prompts into search queries, extracts results, reranks them, and blends them into the model's context. Developers don't control the retrieval steps.
  • Three Google organizations collaborated to build this: Google Cloud, Google DeepMind, and Google Search. The Search team's quality standards drive the requirement to show source links and search suggestions.
  • Gemini 2.0 Flash, 2.0 Pro, 1.5 Flash, and 1.5 Pro all support grounding. The feature works through both the Gemini API and Vertex AI with similar capabilities.
  • Display source links and search suggestions in production. This is both a quality recommendation and a contractual publisher ecosystem obligation.
  • Build a custom retrieval pipeline if you need control. Google's managed RAG is zero-infrastructure but zero-customization. For teams needing source control, custom reranking, or multi-provider search, a dedicated web search API paired with your own logic gives you that flexibility.

Frequently Asked Questions

What Is Google Web Grounding?

Grounding with Google Search is a Gemini API feature that connects the model to Google's live search index during inference. Instead of answering only from training data, the model retrieves current search results, extracts relevant information, and generates a response with source citations. Basu Mallick described it as making responses "more accurate, more current, with richer detail." The feature works across all languages supported by Google Search.

Is Grounding with Google Search Good?

For time-sensitive queries, yes. The Emmy Awards demo showed Gemini giving the wrong answer without grounding and the correct answer with it. For stable knowledge queries (math, well-documented topics), grounding adds cost without improving quality. The dynamic retrieval threshold lets you control this tradeoff per-application. In my experience building search infrastructure, the accuracy improvement on current-events queries justifies the cost for most production applications.

How Does Grounding with Google Search Work in the Gemini API?

You pass google_search as a tool in the GenerateContentConfig. The system then scores each query with a prediction classifier (0 to 1), compares that score to your threshold, rewrites qualifying queries for search, retrieves and reranks results, and injects them into the model's context. The response includes the grounded answer plus source links and optional search suggestions.

Each grounded request costs more than an ungrounded one because it triggers a Google Search query. Google hasn't published fixed per-query grounding prices separate from model token costs, but the dynamic retrieval threshold is your primary cost lever. Setting the threshold at 0.7 (default) filters most basic queries. Lowering it increases both accuracy and spend. Teams building on Vertex AI can access enterprise pricing through Google Cloud billing.

Basu Mallick highlighted research tools (fact-checking, academic search), translation (culturally-grounded translations using search context), and coding assistants (finding docs for new frameworks). The Halloween costume demo showed freshness on low-stakes trending topics. Any query where "current" matters more than "established" benefits from grounding.

How Does Vertex AI Grounding with Google Search Reduce Hallucinations?

The mechanism is the same across Gemini API and Vertex AI. When the prediction classifier determines a query would benefit from search results, the system retrieves current data from Google's search index and includes it in the model's context. The model generates a response grounded in this retrieved data rather than relying on potentially outdated training knowledge. The source citations let users verify every claim. According to Google, this approach anchors responses in verified, real-time information sources.

How Does Grounding with Google Search Compare to Building Your Own RAG Pipeline?

Google's grounding is a managed RAG solution. You skip vector databases, embedding models, chunking strategies, and reranking logic. The tradeoff is zero customization: you can't control which sources get retrieved, how they're ranked, or what context the model sees. A custom pipeline using a web search API gives you full control over retrieval, ranking, and source filtering. If you need to restrict sources to specific domains, apply your own relevance scoring, or blend search results with proprietary data, building your own pipeline through a service like WebSearchAPI.ai is the way to go.

As of early 2026, Gemini 2.0 Flash, Gemini 2.0 Pro, Gemini 1.5 Flash, and Gemini 1.5 Pro all support grounding. Basu Mallick confirmed during the original video that both Pro and Flash model families are supported. Google has continued expanding model support with each new release. Check the Gemini API grounding docs for the latest list.


This analysis is based on Grounding with Google Search now in Google AI Studio and the Gemini API by Google for Developers. Video published October 2024; article updated April 2026 with current model support, pricing context, and implementation examples.