Why 1M Context Changes Code Generation
Most code generation use cases that fail with 128k-context models fail because of context — not capability. Gemini 1.5 Pro's 1M token context (roughly 750,000 words) changes what is tractable:
| Task | 128k context | 1M context | |---|---|---| | Single file review | Yes | Yes | | 50-file module review | Borderline | Yes | | Full repo analysis | No | Yes (most repos) | | Cross-repo dependency analysis | No | Large repos only | | "Find all usages of X across codebase" | No | Yes |
An average production codebase (50k–300k lines of code) fits comfortably in the 1M context window.
Google AI Studio for Prototyping
Before writing any code, AI Studio lets you drag-and-drop files into the prompt and interactively test Gemini 1.5 Pro with real repo content. This is useful for validating that your prompting approach works before building an API integration.
Python SDK
import google.generativeai as genai
import os
from pathlib import Path
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-pro-latest")
def upload_repo_files(repo_path: str) -> list:
"""Upload all Python files from a repo as Gemini file parts."""
files = []
for py_file in Path(repo_path).rglob("*.py"):
with open(py_file, "r") as f:
content = f.read()
files.append(f"# File: {py_file}\n{content}\n")
return files
repo_files = upload_repo_files("./my_project")
combined = "\n\n".join(repo_files)
response = model.generate_content(
f"""Analyze this Python codebase and identify:
1. Security vulnerabilities (SQL injection, hardcoded secrets, etc.)
2. Performance bottlenecks
3. Missing error handling
Codebase:
{combined}""",
generation_config={"temperature": 0.1, "max_output_tokens": 4096},
)
print(response.text)
Code Execution Tool
Gemini 1.5 Pro can execute Python code in a sandboxed environment during generation. This means it can write code, run it, see the output, and iterate — all in a single turn:
model_with_tools = genai.GenerativeModel(
"gemini-1.5-pro-latest",
tools="code_execution",
)
response = model_with_tools.generate_content(
"Write a function to calculate the Fibonacci sequence, then verify it produces correct results for n=10"
)
for part in response.candidates[0].content.parts:
if hasattr(part, "executable_code"):
print("Code:", part.executable_code.code)
elif hasattr(part, "code_execution_result"):
print("Result:", part.code_execution_result.output)
else:
print("Text:", part.text)
Multimodal Code Debugging
One of Gemini's underused capabilities: you can paste a screenshot of an error message (a terminal, a browser console, a monitoring dashboard) and ask it to debug the code responsible.
import PIL.Image
error_screenshot = PIL.Image.open("error_screenshot.png")
with open("suspect_file.py", "r") as f:
code = f.read()
response = model.generate_content([
error_screenshot,
f"This error appears when running the following code. Identify the bug and provide the fix:\n\n{code}",
])
print(response.text)
Pricing
- Standard context (up to 128k tokens): $1.25/1M input, $5/1M output
- Long context (128k–1M tokens): $3.50/1M input, $10.50/1M output
For a 500k-token repo analysis request, that is roughly $1.75 per query — reasonable for a periodic code review but potentially expensive if called on every commit. The practical pattern is to run full-repo analysis daily or on PRs, and use smaller-context models for per-file work.
Gemini vs GPT-4o for Coding
GPT-4o is stronger on HumanEval and SWE-Bench for standard code generation tasks. Gemini 1.5 Pro wins clearly when the task requires context beyond 128k tokens. The decision is largely about context length requirements — if your task fits in 128k, GPT-4o or Claude 3.5 Sonnet are both competitive or better.