A generic code review prompt returns a list of observations that sound impressive and help nothing. "The function is well-structured. Consider adding error handling." To get code review that is actually useful — the kind a strong senior engineer would give — you need to constrain focus, demand severity tiers, and frame the task adversarially.
The Problem With Generic Code Review Prompts
"Review this code" produces the model's general-purpose code review, which is optimized to be comprehensive and non-offensive. It will find obvious things, miss subtle things, and hedge every observation. It will not tell you that your mutex is held too long because that requires taking a position. It will not flag the race condition because that requires adversarial thinking. It will tell you to add comments because that is always defensible advice.
The fix is to be specific about what "good" means for your code and your situation.
Specify the Focus Area
The most important instruction in a code review prompt is what to focus on. Every real review has a focus — security, performance, correctness, testability, maintainability. A prompt without a focus gets all of these at shallow depth rather than any of them at useful depth.
Review the following code focusing exclusively on:
1. Security vulnerabilities (injection, authentication bypass, sensitive data exposure)
2. Race conditions and concurrency issues
3. Error handling — what happens when external calls fail
Do NOT comment on:
- Code style or formatting
- Naming conventions
- Documentation or comments
- Performance unless there is a severe O(n^2) or worse issue
Code:
[code here]
The "do not comment on" list is as important as the focus list. Without it, the model fills its response with style observations because those are easier and safer to make.
Severity Tiers
Without severity labels, every observation looks equally important. A missing semicolon and a SQL injection vulnerability appear in the same list with the same presentation. Require explicit severity tiers:
For each issue you find, label it with one of:
- MUST FIX: Will cause a bug, security vulnerability, data loss, or crash in production
- CONSIDER: Not a bug today but will likely cause problems as the codebase grows
- OPTIONAL: Minor improvement; reasonable engineers would disagree on this
Format each finding as:
[SEVERITY] Issue description
Why: One sentence explanation of the actual risk
Fix: Specific code change, not general advice
The "Fix: specific code change" instruction is critical. Without it, you get "consider adding validation" rather than the actual validation code. Asking for specific fixes forces the model to commit to a concrete recommendation.
Ask for Specific Fix Examples
Explicit fix code is the difference between review that requires a second conversation and review you can act on immediately:
For every MUST FIX issue, provide:
1. The problematic code (quoted directly)
2. The corrected code
3. One sentence explaining why the fix works
Do not provide general guidance. Provide the actual corrected code.
This instruction dramatically increases the usefulness of the output. The model knows how to fix most issues — the prompt just forces it to show its work instead of gesturing at a solution.
Adversarial Review Framing
The most powerful frame for security-focused review is adversarial: ask the model to find every possible issue rather than giving a balanced assessment:
You are a security researcher performing a security review of this code before it handles production user data. Your job is to find every possible vulnerability, attack vector, or dangerous assumption. Be adversarial. Assume the caller is malicious. Assume the environment is hostile.
Do not soften findings. Do not say "consider" for things that are actual vulnerabilities. If something is exploitable, say it is exploitable and explain how.
Code:
[code here]
The framing shift — from "helpful reviewer" to "adversarial security researcher" — reliably produces more thorough security findings because it removes the model's default tendency toward balanced, non-alarming assessments.
For general correctness review, a similar frame works:
Your goal is to find every way this code could fail in production. Assume edge cases are common, not rare. Assume external services fail. Assume inputs are malformed. Assume concurrent access. What breaks?
Diff Review vs Full-File Review
The right scope depends on what you are reviewing.
Diff review is appropriate for PRs and incremental changes. It limits the model's attention to what actually changed, which produces more relevant feedback:
Review this diff. Focus only on the changed lines (lines marked with + in the diff). Do not comment on unchanged code.
Specifically:
- Are the new changes correct?
- Do the new changes introduce any regressions?
- Are there edge cases in the changed logic that are not handled?
Diff:
[git diff output]
Full-file review is appropriate when you want to review an entire module or when the change is too large to isolate:
Review this entire file. Assume it will be deployed to production and must be production-ready.
Focus on:
- Correctness of the core logic
- Error handling completeness
- Any assumption that breaks under load or with malicious input
Do not combine both. Diffing a full file ("here is the whole file, tell me if the changes are good") confuses the model about what to focus on.
Language and Framework-Specific Review
Generic review prompts miss language-specific footguns. Add explicit language context:
This is a Node.js async function using the MongoDB driver. Specifically check for:
- Unhandled promise rejections (missing try/catch or .catch())
- Callback vs promise API mixing (the MongoDB driver has both)
- Missing awaits before async operations
- Memory leaks from unclosed cursors or connections
For security-critical languages like C/C++:
This is C code that processes untrusted input. Check specifically for:
- Buffer overflows (strcpy, sprintf without bounds)
- Integer overflow before allocation
- Use-after-free patterns
- Missing NULL checks after malloc
Language-specific checklists produce dramatically more relevant findings than generic security reviews.
Handling Long Code Files
For files over ~500 lines, the model's attention degrades toward the end. Two approaches:
Section-by-section review: Split the file into logical sections and review each separately. "Review only the authentication middleware (lines 45-130)."
Targeted review: If you know what concerns you, focus there. "I'm most concerned about the transaction logic in the processPayment function (lines 200-280). Review that section in depth."
For very large codebases, targeted review beats comprehensive review for finding real issues. The model's attention is a limited resource — spend it where it matters.
Keep Reading
- Prompt Testing Methodology Guide — tracking whether your review prompts find real bugs over time
- System Prompt Guide with Examples — setting up a persistent code review persona
- The Complete Prompt Engineering Guide (2026) — foundational techniques applied to technical tasks
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.