The question is not whether to use LLMs in your development workflow. The question is where they save real time and where they waste it. LLMs consistently perform well on a defined set of tasks: writing tests, generating documentation, explaining unfamiliar code, producing first-draft implementations, and handling regex and SQL. They reliably struggle on a different set: complex architectural decisions, debugging subtle logic errors, and understanding large codebases without well-structured context.
This guide is about the systematic approach: how to identify which parts of your workflow to hand to an LLM, how to evaluate whether the output is correct, and which tools fit which working styles.
Use Cases That Consistently Save Time
Writing Tests
Test generation is the single highest-return LLM use case for most developers. Given a function and its expected behavior, a model like Claude 3.5 Sonnet or GPT-4o can produce a comprehensive test suite in seconds. The tests cover happy paths, edge cases, and error conditions you might not have thought of.
The practical workflow: write the function, paste it into the LLM with the prompt "Write comprehensive unit tests for this function. Cover edge cases and error conditions." Review the tests, remove any that test implementation details instead of behavior, and add any cases the model missed.
Time saved: writing tests for a moderately complex function manually takes 20 to 40 minutes. LLM-generated tests take 5 to 10 minutes to review and adjust.
Documentation
Documentation is the most universally disliked part of software development, and it is one of the tasks LLMs do most reliably. Paste a function, module, or API endpoint definition and ask for a docstring, README section, or API documentation.
The quality is consistently high for standard documentation formats (JSDoc, docstrings, OpenAPI). The limitation is that the model cannot document your design intent or the non-obvious reasons behind architectural decisions, only what the code visibly does.
Understanding Unfamiliar Code
When you encounter a function, file, or library you have not seen before, pasting it into an LLM and asking "Explain what this does, including any non-obvious parts" is faster than reading the code cold. For well-written code, the LLM explanation is accurate. For poorly-written or obfuscated code, ask follow-up questions.
This is especially useful for: understanding third-party library internals, onboarding to a new codebase, and parsing dense code written by someone who did not write comments.
First-Draft Implementations
For a well-specified task, LLMs produce working first drafts faster than writing from scratch. The typical workflow: describe the function in detail, specify edge cases, get the draft, then review and refine.
The key word is "well-specified." Vague specifications produce vague implementations. If you cannot write a precise description of what the function should do, the model cannot implement it correctly.
Regex and SQL
Regex and SQL are two domains where the gap between "what I need" and "what I can write" is often large. LLMs are excellent at both. "Write a regex that matches valid email addresses but excludes subaddressing" or "Write a SQL query that finds users who placed more than 3 orders in the last 30 days but have not placed one in the last 7 days" — these are exactly the kind of precisely-specified problems models handle well.
Explaining Error Messages
Paste an error message and stack trace and ask "What caused this error and how do I fix it?" For common errors, the answer is usually correct and faster than reading documentation or Stack Overflow. For obscure or library-specific errors, the answer is a starting point, not a final answer.
Use Cases Where LLMs Still Struggle
Complex Architectural Decisions
Should you use event sourcing or a standard CRUD model for this feature? Should this be a microservice or a module in a monolith? Should you use a relational database or a document store here?
LLMs can describe the trade-offs of each option, and that description is often useful. But the decision depends on your team's expertise, your current infrastructure, your performance requirements, and a dozen other contextual factors the model cannot assess. Use LLMs to generate a list of trade-offs, then make the decision yourself.
Debugging Subtle Logic Errors
For an error that says "TypeError: cannot read property of undefined," the LLM will usually find the fix. For a logic bug where the code runs without errors but produces slightly wrong results under specific conditions, LLMs are much less reliable. Subtle logic errors require understanding the full execution context, often across many files, which exceeds the practical context management of most models.
Understanding Large Codebases Without Context
Pasting a single file and asking "Why is this code written this way?" rarely produces a useful answer. The answer depends on the other files, the history of the project, and the constraints that shaped the design decisions. Without that context, the model reasons about the code in isolation.
Tools like Claude Code and Cursor address this by indexing your codebase and giving the model access to related files when you ask a question. That makes large-codebase questions much more tractable.
The CLAUDE.md Pattern
One of the most effective practices for getting consistent LLM output is creating a project context file that you give to the model at the start of every conversation.
The pattern, popularized by Claude Code's use of CLAUDE.md, involves a file in your project root that describes: the tech stack and package manager, the coding conventions your team follows, key architectural decisions and why they were made, how authentication and database access work, and common patterns the codebase uses.
When you start a conversation about the codebase, include the contents of this file in your system prompt or as the first message. The model now has context that makes its answers significantly more accurate and consistent with your codebase conventions.
Tool Recommendations by Workflow
Terminal-first developers: Claude Code runs in the terminal, reads and writes files in your project, and understands your codebase through a project context file. It is the right choice if you spend most of your time in the terminal and want an AI assistant that fits into that workflow without switching to a browser or a separate application.
VS Code users who want deep integration: Cursor is a VS Code fork with multi-file AI editing, codebase indexing, and support for multiple models. It knows your file structure, can make edits across files in response to a single instruction, and keeps the full IDE experience intact.
Editor-agnostic developers: Continue.dev is an open-source extension for VS Code and JetBrains that connects to any model via API. You bring your own model (Claude, GPT-4o, local Ollama), and Continue handles the integration. Best if you want control over your model choice and do not want to pay for a proprietary tool.
How to Evaluate LLM-Generated Code Before Using It
The most important habit: never ship LLM-generated code you have not read and understood.
The evaluation checklist:
- Does the code do what you asked it to do? Read it against the specification.
- Are there edge cases the code does not handle? Think through empty inputs, null values, large inputs, concurrent access.
- Does the code follow your codebase's conventions? Check variable naming, error handling patterns, logging, and code organization.
- Are there security issues? Check for SQL injection, unvalidated input, improper error exposure, and credential handling.
- Do the tests actually test the behavior, not just the implementation? A test that passes only because it mirrors the implementation is not a useful test.
Running the tests is necessary but not sufficient. Tests catch obvious errors. Reading the code catches the subtle ones.
The goal is not to be suspicious of LLM-generated code. It is to maintain the same review standard you apply to code from any source.
Keep Reading
- Best LLM for Coding in 2026: Real Benchmark Scores Compared — Benchmark data on which model to use for development tasks
- Prompt Engineering Complete Guide 2026 — How to write prompts that produce better code output
- GPT-4o vs Claude 3.5 Sonnet: Which Is Better in 2026? — The head-to-head for the two most common coding LLMs
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.