The question is not whether to use LLMs in your development workflow. The question is where they save real time and where they waste it. LLMs consistently perform well on a defined set of tasks: writing tests, generating documentation, explaining unfamiliar code, producing first-draft implementations, and handling regex and SQL. They reliably struggle on a different set: complex architectural decisions, debugging subtle logic errors, and understanding large codebases without well-structured context.
This guide is about the systematic approach: how to identify which parts of your workflow to hand to an LLM, how to evaluate whether the output is correct, and which tools fit which working styles.
Use Cases That Consistently Save Time
Writing Tests
Test generation is the single highest-return LLM use case for most developers. Given a function and its expected behavior, a model like Claude 3.5 Sonnet or GPT-4o can produce a comprehensive test suite in seconds. The tests cover happy paths, edge cases, and error conditions you might not have thought of.
The practical workflow: write the function, paste it into the LLM with the prompt "Write comprehensive unit tests for this function. Cover edge cases and error conditions." Review the tests, remove any that test implementation details instead of behavior, and add any cases the model missed.
Time saved: writing tests for a moderately complex function manually takes 20 to 40 minutes. LLM-generated tests take 5 to 10 minutes to review and adjust.
Documentation
Documentation is the most universally disliked part of software development, and it is one of the tasks LLMs do most reliably. Paste a function, module, or API endpoint definition and ask for a docstring, README section, or API documentation.
The quality is consistently high for standard documentation formats (JSDoc, docstrings, OpenAPI). The limitation is that the model cannot document your design intent or the non-obvious reasons behind architectural decisions, only what the code visibly does.
Understanding Unfamiliar Code
When you encounter a function, file, or library you have not seen before, pasting it into an LLM and asking "Explain what this does, including any non-obvious parts" is faster than reading the code cold. For well-written code, the LLM explanation is accurate. For poorly-written or obfuscated code, ask follow-up questions.
This is especially useful for: understanding third-party library internals, onboarding to a new codebase, and parsing dense code written by someone who did not write comments.
First-Draft Implementations
For a well-specified task, LLMs produce working first drafts faster than writing from scratch. The typical workflow: describe the function in detail, specify edge cases, get the draft, then review and refine.
The key word is "well-specified." Vague specifications produce vague implementations. If you cannot write a precise description of what the function should do, the model cannot implement it correctly.
Regex and SQL
Regex and SQL are two domains where the gap between "what I need" and "what I can write" is often large. LLMs are excellent at both. "Write a regex that matches valid email addresses but excludes subaddressing" or "Write a SQL query that finds users who placed more than 3 orders in the last 30 days but have not placed one in the last 7 days" - these are exactly the kind of precisely-specified problems models handle well.
Explaining Error Messages
Paste an error message and stack trace and ask "What caused this error and how do I fix it?" For common errors, the answer is usually correct and faster than reading documentation or Stack Overflow. For obscure or library-specific errors, the answer is a starting point, not a final answer.
Use Cases Where LLMs Still Struggle
Complex Architectural Decisions
Should you use event sourcing or a standard CRUD model for this feature? Should this be a microservice or a module in a monolith? Should you use a relational database or a document store here?
LLMs can describe the trade-offs of each option, and that description is often useful. But the decision depends on your team's expertise, your current infrastructure, your performance requirements, and a dozen other contextual factors the model cannot assess. Use LLMs to generate a list of trade-offs, then make the decision yourself.
Debugging Subtle Logic Errors
For an error that says "TypeError: cannot read property of undefined," the LLM will usually find the fix. For a logic bug where the code runs without errors but produces slightly wrong results under specific conditions, LLMs are much less reliable. Subtle logic errors require understanding the full execution context, often across many files, which exceeds the practical context management of most models.
Understanding Large Codebases Without Context
Pasting a single file and asking "Why is this code written this way?" rarely produces a useful answer. The answer depends on the other files, the history of the project, and the constraints that shaped the design decisions. Without that context, the model reasons about the code in isolation.
Tools like Claude Code and Cursor address this by indexing your codebase and giving the model access to related files when you ask a question. That makes large-codebase questions much more tractable.