Gemini19 Nov 2025

Gemini 3 Developer Guide: Examples, Cookbook & Migration Strategies

Hands-on Gemini 3 guide covering Thinking Levels, media resolution control, Google Search grounding, and migration tactics with runnable Python examples.

JBJames Bennett

8 minutes read

Gemini 3 developer guide showing thinking levels and media resolution controls

I still remember the late nights at WebSearchAPI.ai refactoring our retrieval pipeline for Gemini 1.5 Pro. We thought the 1M-token window was the endgame—until Gemini 3 landed and made reasoning control, multimodal efficiency, and agent workflows feel programmable again.

Credentials: B.Sc. Computer Science (Cambridge) • M.Sc. AI Systems (Imperial) • Google Cloud PCA • AWS SA Pro • Azure AI Engineer • CKA • TensorFlow Developer.

Introduction: Gemini 3 Resets Controllable Reasoning

Gemini 3 isn't just a bigger parameter bump. Google rebuilt the stack around controllable "thinking" states, richer media ingestion, and first-class agent tooling, all while keeping it compatible with Gemini API + Vertex AI pipelines we already deployed (Google, 2025).

📊 Stats Alert: The AI agents market is on track to hit $139.12B by 2033 at a 43.88% CAGR, so every percentage point of accuracy or latency savings compounds fast (MarketsandMarkets AI Agents Market, 2025).

🎯 Goal: Upgrade your Gemini stack to 3 Pro without breaking existing RAG agents, while gaining Thinking Levels, media resolution tuning, structured outputs, and Search grounding.

Diagram of Gemini 3 orchestrating media inputs, search grounding, and structured outputs

Gemini 3 Launch Snapshot

Pricing stays predictable: Gemini 3 Pro preview routes through the Gemini API at $2 per million input tokens and $12 per million output tokens, so you can drop it into existing budgets, per Google's Gemini 3 developer guide.
Best-in-class multimodal: Google reports new highs on MMMU-Pro and Video MMMU, which tracks with the document + video reasoning gains I've seen in benchmarks in the same launch update.
Agent-friendly tooling: Google Antigravity, Gemini CLI, Android Studio, Cursor/GitHub Copilot integrations, and the new Thinking/Media controls all ship day one according to Google's announcement.

Environment Setup: Google Gen AI SDK 1.51.0+

Google locked the new configs behind the latest Python SDK. Anything older than 1.51.0 will throw when you pass Thinking configs, so upgrade before toggling features.

pip install -U 'google-genai>=1.51.0'

import os
from google import genai
from google.genai import types
 
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
MODEL_ID = "gemini-3-pro-preview"

📌 Pro Tip: Store the API key in your platform's secret manager (or Colab secret) so rotating staging vs. production keys doesn't require code changes.

Feature 1: Thinking Levels for Controllable Reasoning

Gemini 3 Pro defaults to dynamic thinking, meaning it decides when to brainstorm or dive deep. You can override that with thinking_level="low" for low-latency turns or "high" when you want explicit chain-of-thought tokens, per Google's thinking level guidance.

💡 Expert Insight: I start with dynamic thinking in production, then toggle low for autocomplete-like UX and high for agent subroutines (query decomposition, planner turns, compliance reviews).

from IPython.display import Markdown, display
 
prompt = """
Find what I'm thinking of:
  It moves, but doesn't walk, run, or swim.
  It has no fixed shape and keeps moving when broken apart.
  It has no brain but solves mazes.
"""
 
response = client.models.generate_content(
  model=MODEL_ID,
  contents=prompt,
  config=types.GenerateContentConfig(
    thinking_config=types.ThinkingConfig(
      thinking_level="High",
      include_thoughts=True
    )
  )
)
 
for part in response.parts:
  if part.thought:
    print(f"--- THOUGHT PROCESS ({response.usage_metadata.thoughts_token_count} tokens) ---")
    display(Markdown(part.text))
  else:
    print("\n--- FINAL ANSWER ---")
    display(Markdown(part.text))

⚠️ Warning: Gemini 3 is tuned for temperature=1.0. Dropping temperature to 0.1 (my old Gemini 2.5 trick) can cause looping or over-pruning with the new thinking stack.

📈 Case Study: Google highlighted Opu Clip measuring a 32% speed gain plus higher precision when they let Gemini 3 handle agent tool calls end-to-end in their launch post. My replication shows similar speedups when I offload cascade planning to thinking_level="High" and only drop to low for direct responses.

Feature 2: Media Resolution Control

media_resolution finally lets us choose how many vision tokens Gemini consumes per file part. High resolution is fantastic for dense PDFs, but it can double billable tokens if you leave it on for screenshots. You can set it per Part or globally via generation_config, as detailed in Google's media resolution documentation.

Workflow diagram showing how Gemini 3 processes PDFs, images, video frames, and structured outputs

import pathlib
import requests
 
IMG_URL = "https://storage.googleapis.com/generativeai-downloads/data/jetpack.png"
pathlib.Path("jetpack.png").write_bytes(requests.get(IMG_URL, timeout=10).content)
uploaded = client.files.upload(file="jetpack.png")
 
count_tokens_response = client.models.count_tokens(
  model=MODEL_ID,
  contents=[
    types.Part(
      file_data=types.FileData(
        file_uri=uploaded.uri,
        mime_type=uploaded.mime_type
      ),
      media_resolution=types.PartMediaResolution(
        level="MEDIA_RESOLUTION_HIGH"
      )
    )
  ],
)
 
print(
  f"High resolution token cost: {count_tokens_response.total_tokens} tokens"
)

📌 Pro Tip: Run count_tokens with all three levels (LOW, MEDIUM, HIGH) during QA. I log these to BigQuery so finance can predict the delta between standard and "inspection" runs.

⚠️ Warning: If you leave media_resolution unspecified, Gemini may choose a high-cost setting for dense PDFs. Explicitly set LOW for screenshots and MEDIUM for lightweight docs to avoid surprise invoices.

Feature 3: Grounding with Google Search

Grounding wires Gemini to live Google Search results, supports every language the model speaks, and returns a structured grounding_metadata payload so you can surface citations in your UI, per Google's grounding guide.

response = client.models.generate_content(
  model=MODEL_ID,
  contents="Who is the current Magic: The Gathering World Champion?",
  config=types.GenerateContentConfig(
    tools=[types.Tool(google_search=types.ToolGoogleSearch())]
  )
)
 
print(response.text)
if response.candidates[0].grounding_metadata.web_search_queries:
  print("Search queries:", response.candidates[0].grounding_metadata.web_search_queries)

📌 Pro Tip: Persist web_search_queries + grounding_chunk IDs. When a customer disputes an answer, we replay the queries to confirm Google still surfaces those sources, which keeps compliance and support teams sane.

Feature 4: Structured Output + Tool Chaining

Gemini 3 lets you enforce a response_schema and call built-in tools (Search, URL context, Code Execution) inside the same response, so JSON payloads no longer conflict with tool calls, as outlined in Google's structured output docs.

from pydantic import BaseModel
import json
 
class CookieRecipe(BaseModel):
  recipe_name: str
  difficulty: str
  prep_time_minutes: int
  ingredients: list[str]
 
response = client.models.generate_content(
  model=MODEL_ID,
  contents="Give me a miso chocolate chip cookie recipe.",
  config=types.GenerateContentConfig(
    response_mime_type="application/json",
    response_schema=CookieRecipe,
  ),
)
 
recipe = json.loads(response.text)
print(json.dumps(recipe, indent=2))

🎯 Key Takeaway: Pairing structured output with Search grounding or URL context gives you typed objects plus citations—no more brittle regex parsing after the fact.

Feature 5: Code Execution for Deterministic Math

Gemini's code execution tool still runs Python only, but it's now fully compatible with Gemini 3, which means you can ask the model to write + execute helper scripts mid-conversation before committing to an answer, as described in Google's code execution guide.

response = client.models.generate_content(
  model=MODEL_ID,
  contents="Run Python code that counts the number of 'r' characters in 'strawberry'. Return the count.",
  config=types.GenerateContentConfig(
    tools=[types.Tool(code_execution=types.ToolCodeExecution())]
  )
)
 
for part in response.candidates[0].content.parts:
  if part.executable_code:
    print("Generated Code:\n", part.executable_code.code)
  if part.code_execution_result:
    print("Execution Output:", part.code_execution_result.output)
  if part.text:
    print("Final Answer:", part.text)

⚠️ Warning: Code execution tokens count as both input and output. Capture response.usage_metadata so budgets account for the prompt, generated code, and stdout—all three are billed.

Implementation Cookbook: Field-Tested Patterns

Query Decomposition Agents: High-thinking planner splits a multi-part customer query into WebSearchAPI.ai sub-queries, each executed with media_resolution_low context previews for speed.
PDF Intake Lanes: Run count_tokens at all three resolutions, then pin HIGH only for rows flagged as contracts/financial statements in metadata.
Search-Grounded JSON: Combine Google Search grounding with structured output so support agents see both citations and typed fields in their CRM.
Code Execution Health Checks: When Gemini suggests infrastructure commands via the CLI, require the model to execute a Python validator script first (e.g., confirm only safe tables get truncated).
Observability Hooks: Log thoughts_token_count, media_resolution choices, and web_search_queries into the same trace so you can correlate cost regressions with reasoning depth.

Migration Checklist: Gemini 2.5 → 3

Change	Why it matters	Action
`thinking_budget` → `thinking_level`	Gemini 3 enforces low/high presets instead of raw budget integers (see Google's thinking level docs).	Map existing tiers to `low` (autocomplete) and `high` (planner).
Temperature tuning	Gemini 3 default is 1.0 and handles randomness internally (per Google's guidance).	Remove sub-0.5 temperatures unless you measured regressions.
Media controls	Gemini 3 exposes configurable `media_resolution` values for every part (see Google's media resolution docs).	Explicitly set the level per asset type to control cost.
Tool signatures	Certain flows now include `thoughtSignature`; missing values break validation (per Google's structured output section).	When replaying traces, include the stored signature or populate the documented placeholder.
Maps grounding gap	Maps grounding still lives on Gemini 2.5 Flash in my testing, so geo-heavy apps may stay hybrid.	Keep 2.5 Flash for map-rich tasks until Google flips the switch.

Observability & Cost Guardrails

Log every usage_metadata field (tokens, thoughts, grounding counts) to the same trace ID you use for customer actions.
Set alerting thresholds on the ratio of thoughts_token_count / total_tokens; spikes usually mean a planner stuck in High thinking for too long.
Batch count_tokens nightly for representative documents to reconfirm resolution deltas.
Gating logic: if a request hits media_resolution_high or invokes code execution, attach a cost label so finance can apportion spend to the right product team.

Build Smarter Search With Us

⭐ Key Takeaway: Gemini 3's Thinking Levels and media controls are powerful, but they still need a trustworthy search layer. Pairing Gemini with WebSearchAPI.ai keeps your agents grounded in Google-grade results while you experiment with thoughts, tools, and structured JSON.

Ready to see it in action? Start building with WebSearchAPI.ai and get Google-grade results in minutes.

Frequently Asked Questions

Which Gemini 3 models are available right now?
gemini-3-pro-preview is live in the Gemini API and Vertex AI, with $2/M input + $12/M output pricing during preview, per Google's launch announcement.

What do Thinking Levels actually change?
High/Dynamic gives the model more time to reason; Low constrains it for latency-sensitive turns, replacing the old thinking_budget knobs, as described in Google's thinking level guidance.

How does media resolution impact billing?
Higher resolutions dramatically increase vision tokens. Set LOW or MEDIUM for lightweight screenshots instead of letting Gemini pick a more expensive value automatically, per Google's media resolution docs.

Does Google Search grounding support multilingual prompts?
Yes—grounding works across every language the model handles and always returns citation metadata you can log, according to Google's grounding guide.

Can I force JSON while still calling tools?
Gemini 3 explicitly supports Structured Output with Search, URL context, and Code Execution in the same response, so you can keep typed payloads while tooling, as confirmed in Google's structured output docs.

Is code execution limited to Python?
Yes. Gemini can write other languages, but only Python runs inside the managed sandbox today, per Google's code execution guide.

When should I keep Gemini 2.5 Flash around?
If you rely on Google Maps grounding or ultra-low-latency Flash-Lite behavior, keep a dual-stack while those features finish rolling out in 3 Pro preview.

How do you monitor cost regressions?
Stream usage_metadata, web_search_queries, and media_resolution choices into your telemetry platform, then alert when token-per-request deltas exceed set baselines.