Hands-on Gemini 3 guide covering Thinking Levels, media resolution control, Google Search grounding, and migration tactics with runnable Python examples.
I still remember the late nights at WebSearchAPI.ai refactoring our retrieval pipeline for Gemini 1.5 Pro. We thought the 1M-token window was the endgame—until Gemini 3 landed and made reasoning control, multimodal efficiency, and agent workflows feel programmable again.
Credentials: B.Sc. Computer Science (Cambridge) • M.Sc. AI Systems (Imperial) • Google Cloud PCA • AWS SA Pro • Azure AI Engineer • CKA • TensorFlow Developer.
Gemini 3 isn't just a bigger parameter bump. Google rebuilt the stack around controllable "thinking" states, richer media ingestion, and first-class agent tooling, all while keeping it compatible with Gemini API + Vertex AI pipelines we already deployed (Google, 2025).
📊 Stats Alert: The AI agents market is on track to hit $139.12B by 2033 at a 43.88% CAGR, so every percentage point of accuracy or latency savings compounds fast (MarketsandMarkets AI Agents Market, 2025).
🎯 Goal: Upgrade your Gemini stack to 3 Pro without breaking existing RAG agents, while gaining Thinking Levels, media resolution tuning, structured outputs, and Search grounding.
Google locked the new configs behind the latest Python SDK. Anything older than 1.51.0 will throw when you pass Thinking configs, so upgrade before toggling features.
pip install -U 'google-genai>=1.51.0'import os
from google import genai
from google.genai import types
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
MODEL_ID = "gemini-3-pro-preview"📌 Pro Tip: Store the API key in your platform's secret manager (or Colab secret) so rotating staging vs. production keys doesn't require code changes.
Gemini 3 Pro defaults to dynamic thinking, meaning it decides when to brainstorm or dive deep. You can override that with thinking_level="low" for low-latency turns or "high" when you want explicit chain-of-thought tokens, per Google's thinking level guidance.
💡 Expert Insight: I start with dynamic thinking in production, then toggle low for autocomplete-like UX and high for agent subroutines (query decomposition, planner turns, compliance reviews).
from IPython.display import Markdown, display
prompt = """
Find what I'm thinking of:
It moves, but doesn't walk, run, or swim.
It has no fixed shape and keeps moving when broken apart.
It has no brain but solves mazes.
"""
response = client.models.generate_content(
model=MODEL_ID,
contents=prompt,
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(
thinking_level="High",
include_thoughts=True
)
)
)
for part in response.parts:
if part.thought:
print(f"--- THOUGHT PROCESS ({response.usage_metadata.thoughts_token_count} tokens) ---")
display(Markdown(part.text))
else:
print("\n--- FINAL ANSWER ---")
display(Markdown(part.text))⚠️ Warning: Gemini 3 is tuned for temperature=1.0. Dropping temperature to 0.1 (my old Gemini 2.5 trick) can cause looping or over-pruning with the new thinking stack.
📈 Case Study: Google highlighted Opu Clip measuring a 32% speed gain plus higher precision when they let Gemini 3 handle agent tool calls end-to-end in their launch post. My replication shows similar speedups when I offload cascade planning to thinking_level="High" and only drop to low for direct responses.
media_resolution finally lets us choose how many vision tokens Gemini consumes per file part. High resolution is fantastic for dense PDFs, but it can double billable tokens if you leave it on for screenshots. You can set it per Part or globally via generation_config, as detailed in Google's media resolution documentation.
import pathlib
import requests
IMG_URL = "https://storage.googleapis.com/generativeai-downloads/data/jetpack.png"
pathlib.Path("jetpack.png").write_bytes(requests.get(IMG_URL, timeout=10).content)
uploaded = client.files.upload(file="jetpack.png")
count_tokens_response = client.models.count_tokens(
model=MODEL_ID,
contents=[
types.Part(
file_data=types.FileData(
file_uri=uploaded.uri,
mime_type=uploaded.mime_type
),
media_resolution=types.PartMediaResolution(
level="MEDIA_RESOLUTION_HIGH"
)
)
],
)
print(
f"High resolution token cost: {count_tokens_response.total_tokens} tokens"
)📌 Pro Tip: Run count_tokens with all three levels (LOW, MEDIUM, HIGH) during QA. I log these to BigQuery so finance can predict the delta between standard and "inspection" runs.
⚠️ Warning: If you leave media_resolution unspecified, Gemini may choose a high-cost setting for dense PDFs. Explicitly set LOW for screenshots and MEDIUM for lightweight docs to avoid surprise invoices.
Grounding wires Gemini to live Google Search results, supports every language the model speaks, and returns a structured grounding_metadata payload so you can surface citations in your UI, per Google's grounding guide.
response = client.models.generate_content(
model=MODEL_ID,
contents="Who is the current Magic: The Gathering World Champion?",
config=types.GenerateContentConfig(
tools=[types.Tool(google_search=types.ToolGoogleSearch())]
)
)
print(response.text)
if response.candidates[0].grounding_metadata.web_search_queries:
print("Search queries:", response.candidates[0].grounding_metadata.web_search_queries)📌 Pro Tip: Persist web_search_queries + grounding_chunk IDs. When a customer disputes an answer, we replay the queries to confirm Google still surfaces those sources, which keeps compliance and support teams sane.
Gemini 3 lets you enforce a response_schema and call built-in tools (Search, URL context, Code Execution) inside the same response, so JSON payloads no longer conflict with tool calls, as outlined in Google's structured output docs.
from pydantic import BaseModel
import json
class CookieRecipe(BaseModel):
recipe_name: str
difficulty: str
prep_time_minutes: int
ingredients: list[str]
response = client.models.generate_content(
model=MODEL_ID,
contents="Give me a miso chocolate chip cookie recipe.",
config=types.GenerateContentConfig(
response_mime_type="application/json",
response_schema=CookieRecipe,
),
)
recipe = json.loads(response.text)
print(json.dumps(recipe, indent=2))🎯 Key Takeaway: Pairing structured output with Search grounding or URL context gives you typed objects plus citations—no more brittle regex parsing after the fact.
Gemini's code execution tool still runs Python only, but it's now fully compatible with Gemini 3, which means you can ask the model to write + execute helper scripts mid-conversation before committing to an answer, as described in Google's code execution guide.
response = client.models.generate_content(
model=MODEL_ID,
contents="Run Python code that counts the number of 'r' characters in 'strawberry'. Return the count.",
config=types.GenerateContentConfig(
tools=[types.Tool(code_execution=types.ToolCodeExecution())]
)
)
for part in response.candidates[0].content.parts:
if part.executable_code:
print("Generated Code:\n", part.executable_code.code)
if part.code_execution_result:
print("Execution Output:", part.code_execution_result.output)
if part.text:
print("Final Answer:", part.text)⚠️ Warning: Code execution tokens count as both input and output. Capture response.usage_metadata so budgets account for the prompt, generated code, and stdout—all three are billed.
media_resolution_low context previews for speed.count_tokens at all three resolutions, then pin HIGH only for rows flagged as contracts/financial statements in metadata.thoughts_token_count, media_resolution choices, and web_search_queries into the same trace so you can correlate cost regressions with reasoning depth.| Change | Why it matters | Action |
|---|---|---|
thinking_budget → thinking_level | Gemini 3 enforces low/high presets instead of raw budget integers (see Google's thinking level docs). | Map existing tiers to low (autocomplete) and high (planner). |
| Temperature tuning | Gemini 3 default is 1.0 and handles randomness internally (per Google's guidance). | Remove sub-0.5 temperatures unless you measured regressions. |
| Media controls | Gemini 3 exposes configurable media_resolution values for every part (see Google's media resolution docs). | Explicitly set the level per asset type to control cost. |
| Tool signatures | Certain flows now include thoughtSignature; missing values break validation (per Google's structured output section). | When replaying traces, include the stored signature or populate the documented placeholder. |
| Maps grounding gap | Maps grounding still lives on Gemini 2.5 Flash in my testing, so geo-heavy apps may stay hybrid. | Keep 2.5 Flash for map-rich tasks until Google flips the switch. |
usage_metadata field (tokens, thoughts, grounding counts) to the same trace ID you use for customer actions.thoughts_token_count / total_tokens; spikes usually mean a planner stuck in High thinking for too long.count_tokens nightly for representative documents to reconfirm resolution deltas.media_resolution_high or invokes code execution, attach a cost label so finance can apportion spend to the right product team.⭐ Key Takeaway: Gemini 3's Thinking Levels and media controls are powerful, but they still need a trustworthy search layer. Pairing Gemini with WebSearchAPI.ai keeps your agents grounded in Google-grade results while you experiment with thoughts, tools, and structured JSON.
Ready to see it in action? Start building with WebSearchAPI.ai and get Google-grade results in minutes.
Which Gemini 3 models are available right now?
gemini-3-pro-preview is live in the Gemini API and Vertex AI, with $2/M input + $12/M output pricing during preview, per Google's launch announcement.
What do Thinking Levels actually change?
High/Dynamic gives the model more time to reason; Low constrains it for latency-sensitive turns, replacing the old thinking_budget knobs, as described in Google's thinking level guidance.
How does media resolution impact billing?
Higher resolutions dramatically increase vision tokens. Set LOW or MEDIUM for lightweight screenshots instead of letting Gemini pick a more expensive value automatically, per Google's media resolution docs.
Does Google Search grounding support multilingual prompts?
Yes—grounding works across every language the model handles and always returns citation metadata you can log, according to Google's grounding guide.
Can I force JSON while still calling tools?
Gemini 3 explicitly supports Structured Output with Search, URL context, and Code Execution in the same response, so you can keep typed payloads while tooling, as confirmed in Google's structured output docs.
Is code execution limited to Python?
Yes. Gemini can write other languages, but only Python runs inside the managed sandbox today, per Google's code execution guide.
When should I keep Gemini 2.5 Flash around?
If you rely on Google Maps grounding or ultra-low-latency Flash-Lite behavior, keep a dual-stack while those features finish rolling out in 3 Pro preview.
How do you monitor cost regressions?
Stream usage_metadata, web_search_queries, and media_resolution choices into your telemetry platform, then alert when token-per-request deltas exceed set baselines.