Prompting for Code Generation: Techniques That Actually Improve Output Quality

Code prompting is different because outputs are verifiably correct or wrong. Six techniques that consistently improve code quality, with specific examples and the most underused application.

Mahmudul Haque Qudrati

CEO & ML Engineer

May 17, 2026

9 min read

// tags

#code-generation#prompt-engineering#llm#typescript#python

FIG. ART-35

9 min read

“

Prompting for Code Generation: Techniques That Actually Improve Output Quality

// reading plan

sections

1,278

words

min read

// Machine Learning

GPT Architecture Explained: Beyond the Surface Level

GPT's autoregressive, decoder-only design enables text generation at scale. Here is how it actually works -- from pretraining data to emergent capabilities to GPT-4o.

9 min read

// Machine Learning

LLM Fine-Tuning in Practice: A Developer's Complete Walkthrough

Prompting for code generation differs from prompting for writing or analysis in one crucial way: the output is verifiably right or wrong. A generated essay can be "pretty good" on a spectrum. A generated function either handles null inputs correctly or it does not. This verifiability changes how you should prompt for code, because you can measure whether techniques actually work rather than estimating quality.

The techniques below are ones that produce measurable improvements in code correctness, security, and maintainability based on consistent application across real production codebases.

Technique 1: Specify Language, Version, and Constraints Upfront

Vague code requests produce code that works in some context but not necessarily yours. Be specific before describing what you want.

Vague:

Write a function to validate email addresses.

Output: varies wildly in language, library usage, and edge case handling.

Specific:

Write a Python 3.12 function to validate email addresses. Requirements:
- No external libraries (standard library only)
- Must handle: valid addresses, missing @, multiple @, leading/trailing spaces, internationalized domains
- Use type hints
- Return a tuple of (is_valid: bool, error_message: str | None)
- Maximum 20 lines

The constraints do three things: they eliminate solution paths you do not want (no external libraries), they define the edge cases the function must handle, and they set a complexity ceiling (20 lines) that prevents over-engineering.

Technique 2: Describe What the Code Should NOT Do

Specifying what to avoid is often more effective than describing what to do, because it rules out the most common over-engineering patterns.

Write a function to fetch user data from our PostgreSQL database. It should:
- Accept a user_id as a parameter
- Return a User object or None
- Use async/await

It should NOT:
- Use an ORM (raw SQL with asyncpg only)
- Cache results (caching is handled elsewhere)
- Handle database connection management (the connection is passed in as a parameter)
- Add logging (a decorator handles that)

The "should NOT" list eliminates entire categories of generated code you would have to review and remove. The model often adds caching, logging, and connection management "helpfully" when not told otherwise.

Technique 3: Provide Input/Output Examples (Few-Shot for Code)

For data transformation functions, showing example input/output pairs is more reliable than describing the transformation:

Write a Python function that transforms order data from the warehouse format to the reporting format.

Input example:
{"order_id": "W-1234", "cust": "ACME Corp", "items": [{"sku": "ABC-1", "qty": 3, "unit_price": 29.99}], "ship_date": "2026-05-15T14:30:00Z"}

Expected output:
{"id": "W-1234", "customer_name": "ACME Corp", "line_items": [{"product_code": "ABC-1", "quantity": 3, "price": 29.99, "total": 89.97}], "order_total": 89.97, "shipped_on": "2026-05-15"}

Handle: multiple items, missing ship_date (return None for shipped_on), quantities of zero (exclude from output).

The concrete examples define field mappings, naming conventions, computed fields (total, order_total), and edge cases more precisely than any prose description.

Technique 4: Ask for Tests Alongside the Implementation

Asking for the implementation and tests together produces better code because the model writes the code knowing it will also write tests for it. This biases toward testable, modular code.

Write a TypeScript function `parseSchedule` that parses a recurring meeting schedule string (e.g., "Every Monday at 2pm", "Weekdays at 9am", "Every other Thursday at 3:30pm") into a structured object. Also write Jest tests covering: valid inputs of each type, invalid strings, edge cases around times (noon, midnight, AM/PM ambiguity).

Return the function and the test suite as separate code blocks.

When the model writes tests and implementation together, it usually writes the implementation with the test cases in mind. Asking for implementation first and tests second often produces tests that only cover the happy path the implementation already handles.

Technique 5: Ask for the Simplest Solution First, Then Iterate

The default model behavior is to produce a complete, robust, production-ready solution with error handling, logging, documentation, and flexibility for future changes. This is rarely what you want for a first draft.

Write the simplest possible function that checks if a given date string is a weekday. No error handling, no edge cases, just the core logic. I will add robustness in a second pass.

Once you have the simple version, you can iterate:

Here is the function we wrote:
[code]

Now add: proper input validation with meaningful error messages for invalid date strings, handling for timezone-aware inputs (treat them as UTC), and a docstring.

Iterating from simple to complete is more efficient than requesting a complete solution upfront, because each iteration has a clear, narrow goal.

Technique 6: Chain of Thought for Algorithms

For algorithmic problems, asking the model to think through edge cases before writing code consistently produces better results:

I need a function to find all duplicate files in a directory tree based on content (not name). Before writing the code:
1. List the edge cases this function needs to handle
2. Describe the algorithm you will use and why (vs. alternatives)
3. Identify any performance considerations for large directory trees

Then write the Python implementation.

This produces code that handles the edge cases the model identified rather than just the happy path. It also surfaces algorithmic choices (hashing strategy, recursive vs. iterative traversal) that you can evaluate before committing to the implementation.

The Most Underused Application: The Code Review Prompt

Most developers use LLMs to write new code. Far fewer use them for code review, which is arguably the higher-value application. A specific, role-primed code review prompt surfaces bugs and vulnerabilities that generic questions miss:

You are a security-focused code reviewer. Review the following TypeScript API route for:
1. Security vulnerabilities (injection, authentication bypass, data exposure, CSRF)
2. Input validation gaps (what inputs are not validated before use?)
3. Error handling issues (what errors are silently swallowed?)
4. Type safety violations (any `as` casts or `any` types that could hide errors?)

For each issue found, state: the vulnerability type, the specific line or pattern, the potential impact, and the exact fix.

If there are no issues in a category, say "None found" for that category.

```typescript
// [code to review here]


The specificity of the review categories forces systematic coverage. A vague "review this code for bugs" prompt covers whatever the model happens to notice. A structured prompt with explicit categories covers all four categories on every review.

## When to Use Claude Code vs. Browser Claude vs. API for Coding Tasks

**Claude Code (or Cursor, Copilot):** Use for tasks that benefit from knowing the full codebase context, navigating multiple files, or running commands to verify the code works. The ambient context of your project makes these tools significantly better for tasks that touch multiple files.

**Browser Claude / ChatGPT:** Use for isolated algorithms, utility functions, or questions that do not require codebase context. Faster for quick questions. Paste the specific code you want reviewed or improved rather than hoping the model infers what matters.

**Direct API:** Use when you need to integrate code generation into a pipeline, customize the prompt systematically, or process many code snippets at scale. The programmatic control over temperature, system prompt, and response format is worth the extra setup.

---

## Keep Reading
- [Role Prompting: How to Use Personas to Get Better LLM Outputs](/blog/role-prompting-guide) — The code reviewer persona is one of the most effective role prompts; details on why specific roles outperform generic ones
- [Prompt Chaining: How to Break Complex Tasks Into Reliable Steps](/blog/prompt-chaining-guide) — Complex code generation tasks (spec, design, implement, test, review) are a natural fit for chaining
- [Getting Structured Output From LLMs](/blog/structured-output-prompting-guide) — For code generation pipelines that parse and apply generated code programmatically

---

*Pristren builds AI-powered software for teams. [Zlyqor](https://app.zlyqor.com/signup) is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. [Try it free.](https://app.zlyqor.com/signup)*

Prompting for Code Generation: Techniques That Actually Improve Output Quality

Related Articles

GPT Architecture Explained: Beyond the Surface Level

LLM Fine-Tuning in Practice: A Developer's Complete Walkthrough

Technique 1: Specify Language, Version, and Constraints Upfront

Technique 2: Describe What the Code Should NOT Do

Technique 3: Provide Input/Output Examples (Few-Shot for Code)

Technique 4: Ask for Tests Alongside the Implementation

Technique 5: Ask for the Simplest Solution First, Then Iterate

Technique 6: Chain of Thought for Algorithms

The Most Underused Application: The Code Review Prompt

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

TypeScript for React Developers: Practical Patterns That Actually Help

Prompting for Code Generation: Techniques That Actually Improve Output Quality

Related Articles

GPT Architecture Explained: Beyond the Surface Level

LLM Fine-Tuning in Practice: A Developer's Complete Walkthrough

Technique 1: Specify Language, Version, and Constraints Upfront

Technique 2: Describe What the Code Should NOT Do

Technique 3: Provide Input/Output Examples (Few-Shot for Code)

Technique 4: Ask for Tests Alongside the Implementation

Technique 5: Ask for the Simplest Solution First, Then Iterate

Technique 6: Chain of Thought for Algorithms

The Most Underused Application: The Code Review Prompt

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

TypeScript for React Developers: Practical Patterns That Actually Help

The workspace your team
actually needs