All posts
Gemini

Gemini 3 Developer Guide: Examples, Cookbook & Migration Strategies

Complete Gemini 3 guide covering Pro, Flash & Image models, Thinking Levels, Thought Signatures, media resolution, native image generation, and migration tactics with Python examples. Updated December 2025.

JBJames Bennett
12 minutes read
Gemini 3 developer guide showing thinking levels and media resolution controls

I still remember the late nights at WebSearchAPI.ai refactoring our retrieval pipeline for Gemini 1.5 Pro. We thought the 1M-token window was the endgame—until Gemini 3 landed and made reasoning control, multimodal efficiency, and agent workflows feel programmable again.

Credentials: B.Sc. Computer Science (Cambridge) • M.Sc. AI Systems (Imperial) • Google Cloud PCA • AWS SA Pro • Azure AI Engineer • CKA • TensorFlow Developer.

Introduction: Gemini 3 Resets Controllable Reasoning

Gemini 3 isn't just a bigger parameter bump. Google rebuilt the stack around controllable "thinking" states, richer media ingestion, native image generation, and first-class agent tooling—all while keeping it compatible with Gemini API + Vertex AI pipelines we already deployed. With the December 2025 release of Gemini 3 Flash, the family now offers frontier-class models for every workload: Pro for complex reasoning, Flash for high-throughput at 3x the speed, and Pro Image for native generation (Google, 2025).

📊 Stats Alert: The AI agents market is on track to hit $139.12B by 2033 at a 43.88% CAGR, so every percentage point of accuracy or latency savings compounds fast (MarketsandMarkets AI Agents Market, 2025).

🎯 Goal: Upgrade your Gemini stack to the 3 family without breaking existing RAG agents, while gaining Thinking Levels, Thought Signatures, media resolution tuning, native image generation, and Search grounding.

Diagram of Gemini 3 orchestrating media inputs, search grounding, and structured outputs

Gemini 3 Launch Snapshot

The Gemini 3 family now includes three production-ready models, each optimized for different workloads:

ModelModel IDInput/Output PricingBest For
Gemini 3 Progemini-3-pro-preview$2/$12 per 1M tokens (under 200k context); $4/$18 (over 200k)Complex reasoning, agentic coding, multimodal analysis
Gemini 3 Flashgemini-3-flash-preview$0.50/$3 per 1M tokensHigh-throughput, cost-sensitive workloads, rapid iteration
Gemini 3 Pro Imagegemini-3-pro-image-preview$2 text input; $0.134 per image outputNative image generation and editing

All models support a 1M token input context with up to 64k tokens of output, and have a knowledge cutoff of January 2025.

  • Gemini 3 Pro (Nov 18, 2025): Google's flagship reasoning model—state-of-the-art on MMMU-Pro and Video MMMU benchmarks, optimized for complex agentic workflows per Google's announcement.
  • Gemini 3 Flash (Dec 17, 2025): Frontier-class performance at 3x the speed of 2.5 Pro. Achieves 90.4% on GPQA Diamond and 33.7% on Humanity's Last Exam. Now the default model in the Gemini app and AI Mode in Search per Google's Flash launch.
  • Gemini 3 Deep Think: Enhanced reasoning mode for Google AI Ultra subscribers—uses iterative rounds of reasoning to explore multiple hypotheses simultaneously.
  • Agent-friendly tooling: Google Antigravity, Gemini CLI, Android Studio, Cursor, JetBrains, GitHub Copilot, Replit, and Manus integrations all available per Google's developer docs.

Environment Setup: Google Gen AI SDK 1.51.0+

Google locked the new configs behind the latest Python SDK. Anything older than 1.51.0 will throw when you pass Thinking configs, so upgrade before toggling features.

pip install -U 'google-genai>=1.51.0'
import os
from google import genai
from google.genai import types
 
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
 
# Choose your model based on workload
MODEL_PRO = "gemini-3-pro-preview"      # Complex reasoning, agentic coding
MODEL_FLASH = "gemini-3-flash-preview"  # High-throughput, cost-sensitive
MODEL_IMAGE = "gemini-3-pro-image-preview"  # Native image generation
 
MODEL_ID = MODEL_PRO  # Default to Pro for this guide

📌 Pro Tip: Store the API key in your platform's secret manager (or Colab secret) so rotating staging vs. production keys doesn't require code changes. Flash has a free tier in the Gemini API—great for prototyping before scaling to Pro.

Feature 1: Thinking Levels for Controllable Reasoning

Gemini 3 uses dynamic thinking by default, meaning it decides when to brainstorm or dive deep. The thinking_level parameter lets you override this behavior—and Flash offers even more granular control than Pro.

Thinking LevelAvailable OnUse Case
minimalFlash onlyMatches "no thinking" for most queries—fastest responses
lowPro & FlashMinimizes latency/cost for simple tasks like summarization
mediumFlash onlyBalanced thinking approach
highPro & FlashMaximizes reasoning depth for complex analysis (default for Pro)

💡 Expert Insight: I use minimal or low on Flash for autocomplete-like UX, then switch to high on Pro for agent subroutines (query decomposition, planner turns, compliance reviews). Flash's medium is a sweet spot for structured data extraction where you need some reasoning but not full chain-of-thought.

⚠️ Warning: You cannot use both thinking_level and the legacy thinking_budget in the same request. Google recommends migrating to thinking_level for more predictable performance.

from IPython.display import Markdown, display
 
prompt = """
Find what I'm thinking of:
  It moves, but doesn't walk, run, or swim.
  It has no fixed shape and keeps moving when broken apart.
  It has no brain but solves mazes.
"""
 
response = client.models.generate_content(
  model=MODEL_ID,
  contents=prompt,
  config=types.GenerateContentConfig(
    thinking_config=types.ThinkingConfig(
      thinking_level="High",
      include_thoughts=True
    )
  )
)
 
for part in response.parts:
  if part.thought:
    print(f"--- THOUGHT PROCESS ({response.usage_metadata.thoughts_token_count} tokens) ---")
    display(Markdown(part.text))
  else:
    print("\n--- FINAL ANSWER ---")
    display(Markdown(part.text))

⚠️ Warning: Gemini 3 is tuned for temperature=1.0. Dropping temperature to 0.1 (my old Gemini 2.5 trick) can cause looping or over-pruning with the new thinking stack.

📈 Case Study: Google highlighted Opu Clip measuring a 32% speed gain plus higher precision when they let Gemini 3 handle agent tool calls end-to-end in their launch post. My replication shows similar speedups when I offload cascade planning to thinking_level="High" and only drop to low for direct responses.

Feature 2: Media Resolution Control

media_resolution controls how many tokens Gemini allocates per image or video frame. Higher resolutions improve the model's ability to read fine text and identify small details, but increase both token usage and latency. You can set it per Part or globally via generation_config, as detailed in Google's media resolution documentation.

Media TypeRecommended LevelMax TokensNotes
Images (general)media_resolution_high~1120Best for detailed analysis
PDFsmedia_resolution_medium~560Good balance for documents
Video (general)media_resolution_low or medium~70/frameCost-effective for video understanding
Video (text-heavy)media_resolution_high~280/frameWhen reading on-screen text matters
Screenshotsmedia_resolution_low~70Sufficient for UI screenshots

📌 New in December 2025: media_resolution_ultra_high is now available for individual parts (not globally) when you need maximum fidelity for dense technical diagrams or fine print.

Workflow diagram showing how Gemini 3 processes PDFs, images, video frames, and structured outputs
import pathlib
import requests
 
IMG_URL = "https://storage.googleapis.com/generativeai-downloads/data/jetpack.png"
pathlib.Path("jetpack.png").write_bytes(requests.get(IMG_URL, timeout=10).content)
uploaded = client.files.upload(file="jetpack.png")
 
count_tokens_response = client.models.count_tokens(
  model=MODEL_ID,
  contents=[
    types.Part(
      file_data=types.FileData(
        file_uri=uploaded.uri,
        mime_type=uploaded.mime_type
      ),
      media_resolution=types.PartMediaResolution(
        level="MEDIA_RESOLUTION_HIGH"
      )
    )
  ],
)
 
print(
  f"High resolution token cost: {count_tokens_response.total_tokens} tokens"
)

📌 Pro Tip: Run count_tokens with all three levels (LOW, MEDIUM, HIGH) during QA. I log these to BigQuery so finance can predict the delta between standard and "inspection" runs.

⚠️ Warning: If you leave media_resolution unspecified, Gemini may choose a high-cost setting for dense PDFs. Explicitly set LOW for screenshots and MEDIUM for lightweight docs to avoid surprise invoices.

Grounding wires Gemini to live Google Search results, supports every language the model speaks, and returns a structured grounding_metadata payload so you can surface citations in your UI, per Google's grounding guide.

response = client.models.generate_content(
  model=MODEL_ID,
  contents="Who is the current Magic: The Gathering World Champion?",
  config=types.GenerateContentConfig(
    tools=[types.Tool(google_search=types.ToolGoogleSearch())]
  )
)
 
print(response.text)
if response.candidates[0].grounding_metadata.web_search_queries:
  print("Search queries:", response.candidates[0].grounding_metadata.web_search_queries)

📌 Pro Tip: Persist web_search_queries + grounding_chunk IDs. When a customer disputes an answer, we replay the queries to confirm Google still surfaces those sources, which keeps compliance and support teams sane.

Feature 4: Structured Output + Tool Chaining

Gemini 3 lets you enforce a response_schema and call built-in tools (Search, URL context, Code Execution) inside the same response, so JSON payloads no longer conflict with tool calls, as outlined in Google's structured output docs.

from pydantic import BaseModel
import json
 
class CookieRecipe(BaseModel):
  recipe_name: str
  difficulty: str
  prep_time_minutes: int
  ingredients: list[str]
 
response = client.models.generate_content(
  model=MODEL_ID,
  contents="Give me a miso chocolate chip cookie recipe.",
  config=types.GenerateContentConfig(
    response_mime_type="application/json",
    response_schema=CookieRecipe,
  ),
)
 
recipe = json.loads(response.text)
print(json.dumps(recipe, indent=2))

🎯 Key Takeaway: Pairing structured output with Search grounding or URL context gives you typed objects plus citations—no more brittle regex parsing after the fact.

Feature 5: Code Execution for Deterministic Math

Gemini's code execution tool still runs Python only, but it's now fully compatible with Gemini 3, which means you can ask the model to write + execute helper scripts mid-conversation before committing to an answer, as described in Google's code execution guide.

response = client.models.generate_content(
  model=MODEL_ID,
  contents="Run Python code that counts the number of 'r' characters in 'strawberry'. Return the count.",
  config=types.GenerateContentConfig(
    tools=[types.Tool(code_execution=types.ToolCodeExecution())]
  )
)
 
for part in response.candidates[0].content.parts:
  if part.executable_code:
    print("Generated Code:\n", part.executable_code.code)
  if part.code_execution_result:
    print("Execution Output:", part.code_execution_result.output)
  if part.text:
    print("Final Answer:", part.text)

⚠️ Warning: Code execution tokens count as both input and output. Capture response.usage_metadata so budgets account for the prompt, generated code, and stdout—all three are billed.

Feature 6: Thought Signatures for Multi-Turn Reasoning

Starting with Gemini 3, Google introduced Thought Signatures—encrypted representations of the model's internal thought process. These signatures are essential for maintaining reasoning context across multi-turn conversations.

# Thought signatures are returned in the response
response = client.models.generate_content(
  model=MODEL_ID,
  contents="Analyze this complex problem step by step...",
  config=types.GenerateContentConfig(
    thinking_config=types.ThinkingConfig(thinking_level="High")
  )
)
 
# The thought_signature is embedded in content parts
# Pass it back in subsequent turns to maintain reasoning chain
for part in response.candidates[0].content.parts:
  if hasattr(part, 'thought_signature'):
    # Store this for the next turn
    print("Thought signature received:", part.thought_signature[:50] + "...")

⚠️ Critical Validation Rules:

  • Function calling: Returns HTTP 400 if thought signature is missing in multi-turn flows
  • Image generation/editing: Returns HTTP 400 if thought signature is missing
  • Text/chat: Recommended but not strictly enforced

📌 Pro Tip: If you use the official Google SDKs with standard chat history management, Thought Signatures are handled automatically. Custom implementations must explicitly capture and replay these signatures.

Feature 7: Native Image Generation

Gemini 3 Pro Image (gemini-3-pro-image-preview) introduces native image generation directly through the API—no more switching to separate image models. It supports 4K resolution output, accurate text rendering, and conversational editing.

from google.genai import types
 
# Generate an image
response = client.models.generate_content(
  model="gemini-3-pro-image-preview",
  contents="Generate a photorealistic image of a robot barista making latte art in a cozy coffee shop",
  config=types.GenerateContentConfig(
    response_modalities=["IMAGE", "TEXT"]
  )
)
 
# Access generated image
for part in response.candidates[0].content.parts:
  if part.inline_data:
    # Save or display the image
    with open("generated_image.png", "wb") as f:
      f.write(part.inline_data.data)

🎯 Key Capabilities:

  • Grounded generation: Combine with Google Search to generate images based on real-world references
  • Conversational editing: "Make the coffee cup larger" in follow-up turns
  • Text rendering: Accurate text overlay for marketing assets and diagrams

Implementation Cookbook: Field-Tested Patterns

  1. Query Decomposition Agents: High-thinking Pro planner splits a multi-part customer query into WebSearchAPI.ai sub-queries, each executed with Flash at thinking_level="minimal" for maximum throughput.
  2. PDF Intake Lanes: Run count_tokens at all four resolutions (low, medium, high, ultra_high), then pin high only for rows flagged as contracts/financial statements in metadata.
  3. Search-Grounded JSON: Combine Google Search grounding with structured output so support agents see both citations and typed fields in their CRM.
  4. Code Execution Health Checks: When Gemini suggests infrastructure commands via the CLI, require the model to execute a Python validator script first (e.g., confirm only safe tables get truncated).
  5. Observability Hooks: Log thoughts_token_count, thought_signature presence, media_resolution choices, and web_search_queries into the same trace so you can correlate cost regressions with reasoning depth.
  6. Flash-First Routing: Default to Flash for all requests, then route to Pro only when complexity metrics (token count, tool call depth, domain flags) exceed thresholds. Flash's medium thinking level often matches Pro's low quality at lower cost.
  7. Image Generation Pipelines: Use Pro Image for marketing asset generation with Search grounding to ensure brand-accurate visuals, then conversationally refine in follow-up turns.

Migration Checklist: Gemini 2.5 → 3

ChangeWhy it mattersAction
thinking_budgetthinking_levelGemini 3 enforces preset levels instead of raw budget integers. Flash adds minimal and medium options. Cannot use both params together (docs).Map existing tiers: low (autocomplete), medium (Flash balanced), high (planner).
Temperature tuningGemini 3 default is 1.0 and handles randomness internally. Lower values may cause looping or degraded performance (docs).Remove sub-0.5 temperatures unless you measured regressions.
Media resolution levelsFour levels now available: low, medium, high, ultra_high. Set per part or globally (docs).Explicitly set level per asset type—see recommendations table above.
Thought SignaturesMandatory for function calling and image generation in multi-turn flows. HTTP 400 if missing (docs).Capture and replay thought_signature from response parts, or use official SDKs.
Search grounding pricingBilling changes Jan 5, 2026: now $14 per 1k search queries (was $35/1k prompts flat rate).Update cost projections; consider caching search results where appropriate.
New model selectionFlash outperforms 2.5 Pro at 3x speed. Pro Image enables native image generation. Deep Think for complex reasoning.Evaluate Flash for high-throughput workloads; add Pro Image for generation tasks.
Maps grounding gapGoogle Maps grounding and Computer Use still not supported on Gemini 3 (docs).Keep 2.5 Flash for map-rich tasks and computer use until support is added.

Observability & Cost Guardrails

  • Log every usage_metadata field (tokens, thoughts, grounding counts) to the same trace ID you use for customer actions.
  • Set alerting thresholds on the ratio of thoughts_token_count / total_tokens; spikes usually mean a planner stuck in High thinking for too long.
  • Batch count_tokens nightly for representative documents to reconfirm resolution deltas.
  • Gating logic: if a request hits media_resolution_high or invokes code execution, attach a cost label so finance can apportion spend to the right product team.

Build Smarter Search With Us

Key Takeaway: Gemini 3's Thinking Levels and media controls are powerful, but they still need a trustworthy search layer. Pairing Gemini with WebSearchAPI.ai keeps your agents grounded in Google-grade results while you experiment with thoughts, tools, and structured JSON. For alternatives to Google's grounding capabilities, explore our grounding Google Search alternatives guide.

Ready to see it in action? Start building with WebSearchAPI.ai and get Google-grade results in minutes.

Frequently Asked Questions

Which Gemini 3 models are available right now? Three models are production-ready: gemini-3-pro-preview ($2/$12 per 1M tokens), gemini-3-flash-preview ($0.50/$3), and gemini-3-pro-image-preview for native image generation. Flash has a free tier in the Gemini API; Pro does not, per Google's pricing docs.

What's the difference between Gemini 3 Pro and Flash? Flash is 3x faster than 2.5 Pro while matching its quality on most benchmarks. It's ideal for high-throughput, cost-sensitive workloads. Pro excels at complex reasoning and agentic coding. Flash also offers additional thinking levels (minimal, medium) for finer control, per Google's Flash announcement.

What are Thought Signatures and do I need them? Thought Signatures are encrypted representations of Gemini's internal reasoning. They're mandatory for function calling and image generation in multi-turn flows—omitting them returns HTTP 400. The official SDKs handle this automatically; custom implementations must explicitly capture and replay them, per Google's docs.

What do Thinking Levels actually change? They control reasoning depth. Pro supports low and high; Flash adds minimal (fastest, no thinking) and medium (balanced). The legacy thinking_budget still works but cannot be used alongside thinking_level, per Google's guidance.

How does media resolution impact billing? Higher resolutions dramatically increase vision tokens. Use low (~70 tokens) for screenshots, medium (~560) for PDFs, high (~1120) for detailed images, and ultra_high for dense technical diagrams. If unspecified, Gemini picks optimal defaults based on media type, per Google's docs.

Does Google Search grounding support multilingual prompts? Yes—grounding works across every language the model handles and always returns citation metadata you can log, according to Google's grounding guide.

Can I force JSON while still calling tools? Gemini 3 explicitly supports Structured Output with Search, URL context, and Code Execution in the same response, so you can keep typed payloads while tooling, as confirmed in Google's structured output docs.

Is code execution limited to Python? Yes. Gemini can write other languages, but only Python runs inside the managed sandbox today. December 2025 added support for code execution with images (return images from executed code), per Google's code execution guide.

When should I keep Gemini 2.5 around? If you rely on Google Maps grounding or Computer Use—both are still unsupported on Gemini 3 as of December 2025. Keep 2.5 Flash for these specific use cases until Google adds support.

How do you monitor cost regressions? Stream usage_metadata, web_search_queries, thought_signature presence, and media_resolution choices into your telemetry platform. Alert when token-per-request deltas exceed set baselines. Note that Search grounding billing changes to $14/1k queries on Jan 5, 2026.