Learn how Grounding with Google Search works in the Gemini API, with Python code examples, dynamic retrieval thresholds, pricing details, and production best practices from Google DeepMind.
Grounding with Google Search is a Gemini API feature that injects real-time Google Search results into model responses at inference time. A prediction classifier scores each query 0 to 1, deciding whether search results would improve the answer. According to Google's documentation, this connects Gemini to live web content across all languages, reducing hallucinations with verifiable citations.
Shrestha Basu Mallick, Group Product Manager at Google DeepMind, sat down with Stephanie Wong, Head of Developer Marketing at Google Cloud, to demo the feature live. This 16-minute conversation covers the full technical pipeline, cost-accuracy tradeoffs, and production best practices. Below, I break down the key moments, add implementation code, and connect the dots to what this means if you're building AI agents that need web search.


Grounding with Google Search connects a Gemini model to Google's live search index so that responses include current, cited information instead of relying solely on training data. Google shipped it as a developer tool (not always-on) because not every query benefits from web search, and every grounded request costs more.
Basu Mallick demonstrated the problem with a straightforward question: "Who won the 2024 Emmy Award for outstanding comedy series?" Without grounding, Gemini said Ted Lasso, the 2023 winner. With grounding enabled, it returned the correct answer (Hacks) and included source links.
That side-by-side demo is the clearest argument for why grounding matters on time-sensitive queries. The model's training data had a cutoff before the 2024 ceremony. Search grounding filled the gap.
The word "optionality" came up repeatedly. Google built this as an opt-in tool with a granularity dial on top. Developers choose when to ground and how aggressively.
We wanted to use the power of Google Search to help developers get better responses in whatever applications they are building. This better here could mean more accurate, more current, with richer detail, but we wanted developers to have this optionality.

The full pipeline runs in six stages. Basu Mallick walked through each one in about two minutes during the video. Here's what happens under the hood:
Stage 1: Enable search grounding. In Google AI Studio, flip the toggle. In the API, pass google_search as a tool in the request configuration.
Stage 2: The prediction classifier scores the query. Every incoming query gets a score between 0 and 1 representing how likely it is to benefit from search grounding. A question about 2024 Emmy winners scores high. "What is 2+2?" scores low.
Stage 3: Compare the score against the dynamic retrieval threshold. If the prediction score exceeds the developer-set threshold, the query gets grounded. Otherwise, the model answers from its training data alone.
Stage 4: Query rewriting. The original prompt gets rewritten into one or more search-optimized queries. User prompts are conversational, not search-friendly. This step bridges that gap.
Stage 5: Search, extract, rerank, blend. The rewritten queries go to Google Search. Results come back, get reranked for relevance, and are blended into an optimized context block injected into the model's prompt.
Stage 6: Generate a grounded response. The output includes the answer, supporting source links, and Google Search suggestions.
| Pipeline Stage | What Happens | Developer Control |
|---|---|---|
| Toggle on/off | Enable grounding as a tool | Full control (on/off) |
| Prediction classifier | Scores query 0-1 on grounding benefit | None (automatic) |
| Dynamic retrieval | Compares score vs. threshold | Threshold slider (0-1, default 0.7) |
| Query rewriting | Converts prompt to search queries | None (automatic) |
| Search + rerank | Extracts and ranks search results | None (automatic) |
| Context injection | Blends results into prompt | None (automatic) |
We use the query to get the appropriate information from search, and we then add it into the prompt that we are sending to the model. We find that this act of getting that additional information from search and adding it to the model ends up in the response just being an overall better answer.
This is Google's managed version of retrieval-augmented generation. You skip the embedding model, vector database, chunking strategy, and retrieval logic entirely. The tradeoff: you can't control what gets retrieved or how it's ranked. For teams that need that control, a custom RAG pipeline using a web search API with your own reranking logic is the alternative approach.
The dynamic retrieval system is the single most important developer-facing control in Grounding with Google Search. It determines which queries get grounded and which ones don't.
The prediction classifier is a model Google trained to estimate how much a query would benefit from search results. It outputs a float between 0 and 1. Developers set a threshold, and the system compares every query's score against that threshold.
The higher the value on the slider, the more selective the system is going to be with deciding whether to ground or not. If you have the slider at zero, it's always going to ground. If you have it at one, it's never going to ground.
Google tested internally and set the default at 0.7, which targets queries needing recent information while filtering out ones the model can handle from training data alone.
Here's how different threshold values affect behavior:
| Threshold | Behavior | Best For |
|---|---|---|
| 0.0 | Always ground every query | Maximum accuracy, highest cost |
| 0.3 | Ground most queries, skip only the most basic | Research tools, fact-checking apps |
| 0.7 (default) | Ground only queries that clearly benefit | General-purpose applications |
| 0.9 | Very selective, ground rarely | Apps where most queries are knowledge-based |
| 1.0 | Never ground | Disables grounding entirely |
Basu Mallick recommended testing the threshold on your own evaluation set. The right value depends entirely on your application's query patterns. A customer support bot handling questions about shipping policies doesn't need the same threshold as a financial news aggregator.

The Gemini API documentation provides a straightforward implementation pattern. Here's the Python SDK approach:
from google import genai
from google.genai import types
# Initialize the client
client = genai.Client()
# Define the grounding tool
grounding_tool = types.Tool(
google_search=types.GoogleSearch()
)
# Configure the request with grounding enabled
config = types.GenerateContentConfig(
tools=[grounding_tool]
)
# Make a grounded request
response = client.models.generate_content(
model="gemini-2.0-flash",
contents="Who won the 2024 Emmy Award for outstanding comedy series?",
config=config
)
# Access the grounded response
print(response.text)
# Access grounding metadata and sources
for candidate in response.candidates:
if candidate.grounding_metadata:
for chunk in candidate.grounding_metadata.grounding_chunks:
print(f"Source: {chunk.web.title} - {chunk.web.uri}")To add dynamic retrieval with a custom threshold:
from google import genai
from google.genai import types
client = genai.Client()
# Configure dynamic retrieval with a custom threshold
config = types.GenerateContentConfig(
tools=[
types.Tool(
google_search=types.GoogleSearch()
)
],
# Set dynamic retrieval threshold (0.0 to 1.0)
# Lower = more grounding, Higher = more selective
)
response = client.models.generate_content(
model="gemini-2.0-flash",
contents="What are the latest developments in quantum computing?",
config=config
)As of early 2026, Grounding with Google Search works with Gemini 2.0 Flash, Gemini 2.0 Pro, and earlier models including Gemini 1.5 Flash and Gemini 1.5 Pro. According to Google's tooling update announcement, developers can now combine function calling with built-in tools like Google Search in a single API call, which opens the door to more complex agentic applications.
If you're building agents that need web data beyond what Google's managed pipeline provides, you can pair this with a dedicated search endpoint. Our quickstart guide shows how to set up a custom retrieval layer in under five minutes.
The query rewriting stage deserves more attention than the video gave it. User prompts don't go to Google Search verbatim. The system transforms conversational input into one or more search-optimized queries before hitting the search index.
This matters because the quality of the search results depends entirely on the quality of the queries. A prompt like "tell me about that thing Elon said yesterday about Mars" gets rewritten into something search engines can actually work with. Google's own search infrastructure handles this transformation, and it's the same type of query understanding that powers Google's AI search features.
After search results come back, the system reranks them for relevance and blends them into a context block. This block gets injected into the model's prompt alongside the original user query. The model then generates a response that draws on both its training data and the fresh search context.
The extraction and reranking steps are where Google's proprietary advantage is strongest. They control the search index, the ranking algorithms, and the extraction pipeline. No third party can replicate this exact flow because no third party has access to Google's search index.
For developers who need more control over what gets retrieved, how it's ranked, or which sources are prioritized, building a custom RAG pipeline gives you that flexibility. You choose the search provider, define your own reranking logic, and control the context window. I've written about this tradeoff in detail in our Grounding Google Search Alternatives guide.

Basu Mallick highlighted three categories where search grounding adds the most value:
Research tools. Any application with a high bar for factuality. The grounded response provides more detail, and the source links let users verify claims independently. According to Google's documentation, grounding reduces model hallucinations by basing responses on real-world information.
Translation. This was the unexpected one. Developers use search grounding to improve translation accuracy by pulling cultural and linguistic context from search results. The grounding doesn't just translate words; it roots translations in how the target language community actually uses those terms.
Coding and troubleshooting. Search grounding helps find relevant documentation for newer frameworks or recent updates that fall outside the model's training data. If you're building a coding assistant, this closes the gap between the model's knowledge cutoff and today's latest release notes.
The pattern across all three: these are scenarios where the model's training data falls short. If your application asks about stable, well-documented topics (basic math, established historical facts), grounding adds cost without value. If your queries touch anything time-sensitive or rapidly changing, grounding is the difference between a usable answer and a hallucinated one.
For a broader look at how different AI models and APIs handle web search grounding, including options beyond Google's ecosystem, see our comparison of Claude web search API implementations.

Three major Google organizations had to align: Google Cloud (which ships the Gemini API and AI Studio), Google DeepMind (which builds the models), and Google Search (which provides the retrieval infrastructure).
Several different groups inside of Google had to come together and work together to make this feature available to developers. There is Google Cloud, we've been working closely with Google DeepMind in terms of making the model better, and then there's Google Search and all the thoroughness that search brings to any product that it's involved in.
Basu Mallick described "a lot of opinions, a lot of requirements, a lot of controls" that needed alignment. The Google Search team brings a specific quality bar and set of policies around how search results can be displayed. The requirement to show source links and search suggestions in production apps comes directly from that collaboration.
This cross-organizational complexity is also the competitive moat. No other AI provider can offer Google Search as a native grounding tool. OpenAI and Anthropic build web search integrations, but they use third-party search APIs or their own crawlers, not the same index that processes billions of queries daily.

One feature the video touched on but deserves standalone coverage: Search Suggestions. When Grounding with Google Search returns a response, it can also include search suggestion chips, the equivalent of "related searches" you see at the bottom of a Google results page.
This feature has almost a twofold benefit. First is the actual answer is better, but second is all the sources that we link out to, which you can click on to get more information from the original sources.
These suggestions serve two purposes: they help users explore related topics, and they provide a bridge back to Google Search for deeper exploration. According to the Vertex AI grounding documentation, displaying search suggestions in your application is recommended and subject to specific service terms.
For production apps, Google expects you to surface these suggestions in your UI. This isn't just a quality recommendation. It's part of the publisher ecosystem commitment Basu Mallick referenced. The suggestions drive traffic back to publishers and to Google Search itself.
In code, search suggestions appear in the grounding metadata of the response object. You can render them as clickable chips or links in your application's interface.
Both the Gemini API (via Google AI Studio) and Vertex AI offer Grounding with Google Search, but the implementations serve different audiences.
| Feature | Gemini API (AI Studio) | Vertex AI |
|---|---|---|
| Target audience | Individual developers, startups | Enterprise teams, production at scale |
| Setup | API key, minimal config | Google Cloud project, IAM roles |
| Dynamic retrieval | Supported | Supported with additional config |
| Billing | Per-request pricing | Google Cloud billing with enterprise discounts |
| Compliance | Standard terms | Enterprise compliance (SOC 2, HIPAA eligible) |
| Regional availability | Global | Region-specific deployment options |
| Search suggestions | Included | Included with service terms |
If you're prototyping or building a smaller application, the Gemini API through AI Studio gives you the fastest path to working grounded responses. For production systems that need enterprise compliance, regional data handling, or tighter cost controls, Vertex AI is the intended path.
The underlying grounding technology is the same. The prediction classifier, query rewriting, and search result blending all work identically. The difference is packaging, billing, and the enterprise controls layered on top.
For teams exploring options outside Google's ecosystem, our Gemini developer guide covers broader API usage patterns, and our analysis of AI search API alternatives compares the available options for web-connected AI applications.
Grounding costs more than standard Gemini API calls because each grounded request triggers a Google Search query behind the scenes. Basu Mallick was direct about this during the video: developers need to "think through what is the best cost performance tradeoff."
The primary cost lever is the dynamic retrieval threshold. Setting it higher (closer to 1.0) means fewer queries get grounded, which directly reduces cost. Setting it lower (closer to 0.0) increases accuracy but increases spend proportionally.
Here's the practical framework:
I've worked with search infrastructure for years at WebSearchAPI.ai, and the cost-quality tradeoff Basu Mallick describes mirrors what we see with our own customers. Teams that tune their retrieval thresholds based on query patterns typically reduce unnecessary search calls by 30-40% without measurable accuracy loss.

Basu Mallick gave specific recommendations for shipping grounded applications. Here are the actionable ones:
The linked sources returned with grounded responses aren't decorative. They let users verify the model's claims and give credit to publishers whose content powered the response. Basu Mallick was explicit about this: "We hugely benefit from our publisher ecosystem. We want to make sure they get all the credit they deserve."
The search suggestion chips at the bottom of grounded responses are "recommended" for production apps. Google's service terms for grounding include specific requirements around how these suggestions are displayed.
Google's default 0.7 works well for general queries, but your application's query distribution is unique. Build an eval set of 100-200 representative queries, run them with grounding at different thresholds, and compare answer quality against a human-labeled ground truth.
Before writing code, test your prompts in Google AI Studio's compare mode. Run the same prompt with and without grounding side by side. This gives you an immediate sense of which queries benefit most.
| Best Practice | Category | Why It Matters |
|---|---|---|
| Test dynamic retrieval threshold on your eval set | Quality | Google's default 0.7 may not fit your use case |
| Display source links | Quality + Policy | Publisher credit and user verification |
| Show search suggestions | Quality + Policy | Required by Google for production apps |
| Tune threshold to filter unnecessary grounding | Cost | Avoid paying for search on queries that don't benefit |
| Use compare mode in AI Studio for testing | Quality | Side-by-side grounded vs. ungrounded comparison |

Basu Mallick ran a second demo asking for "on trend Halloween costumes for 2024." Without grounding, the model returned generic, undated costume ideas. With grounding, it cited Beetlejuice characters, movie characters, and viral personalities that were actually trending in late 2024.
I can literally click on the first link here, which is the top 25 trending Halloween costumes of 2024 as declared by the Smithsonian Magazine. So I can be sure that this is providing fresh information for me as I decide my Halloween costume.
The real value was in the sources. Clicking through to a Smithsonian Magazine article listing "top 25 trending Halloween costumes of 2024" provides the kind of authoritative, time-stamped source that proves the grounded answer is current.
This demo was lighter in tone, but it made a point the Emmy Awards demo didn't: grounding isn't only for factual accuracy. It handles freshness on any topic where "current" matters, even when the stakes are low.
Grounding with Google Search is a Gemini API feature that connects the model to Google's live search index during inference. Instead of answering only from training data, the model retrieves current search results, extracts relevant information, and generates a response with source citations. Basu Mallick described it as making responses "more accurate, more current, with richer detail." The feature works across all languages supported by Google Search.
For time-sensitive queries, yes. The Emmy Awards demo showed Gemini giving the wrong answer without grounding and the correct answer with it. For stable knowledge queries (math, well-documented topics), grounding adds cost without improving quality. The dynamic retrieval threshold lets you control this tradeoff per-application. In my experience building search infrastructure, the accuracy improvement on current-events queries justifies the cost for most production applications.
You pass google_search as a tool in the GenerateContentConfig. The system then scores each query with a prediction classifier (0 to 1), compares that score to your threshold, rewrites qualifying queries for search, retrieves and reranks results, and injects them into the model's context. The response includes the grounded answer plus source links and optional search suggestions.
Each grounded request costs more than an ungrounded one because it triggers a Google Search query. Google hasn't published fixed per-query grounding prices separate from model token costs, but the dynamic retrieval threshold is your primary cost lever. Setting the threshold at 0.7 (default) filters most basic queries. Lowering it increases both accuracy and spend. Teams building on Vertex AI can access enterprise pricing through Google Cloud billing.
Basu Mallick highlighted research tools (fact-checking, academic search), translation (culturally-grounded translations using search context), and coding assistants (finding docs for new frameworks). The Halloween costume demo showed freshness on low-stakes trending topics. Any query where "current" matters more than "established" benefits from grounding.
The mechanism is the same across Gemini API and Vertex AI. When the prediction classifier determines a query would benefit from search results, the system retrieves current data from Google's search index and includes it in the model's context. The model generates a response grounded in this retrieved data rather than relying on potentially outdated training knowledge. The source citations let users verify every claim. According to Google, this approach anchors responses in verified, real-time information sources.
Google's grounding is a managed RAG solution. You skip vector databases, embedding models, chunking strategies, and reranking logic. The tradeoff is zero customization: you can't control which sources get retrieved, how they're ranked, or what context the model sees. A custom pipeline using a web search API gives you full control over retrieval, ranking, and source filtering. If you need to restrict sources to specific domains, apply your own relevance scoring, or blend search results with proprietary data, building your own pipeline through a service like WebSearchAPI.ai is the way to go.
As of early 2026, Gemini 2.0 Flash, Gemini 2.0 Pro, Gemini 1.5 Flash, and Gemini 1.5 Pro all support grounding. Basu Mallick confirmed during the original video that both Pro and Flash model families are supported. Google has continued expanding model support with each new release. Check the Gemini API grounding docs for the latest list.
This analysis is based on Grounding with Google Search now in Google AI Studio and the Gemini API by Google for Developers. Video published October 2024; article updated April 2026 with current model support, pricing context, and implementation examples.