Text-to-SQL is one of the most practically useful applications of LLM prompting. The ability to ask a question in plain English and receive a working SQL query removes the database knowledge barrier for non-technical users and speeds up analytical work for technical ones. But reliability varies dramatically between simple and complex queries, and running unreviewed SQL on a production database is a different category of risk than displaying a bad summary. Here is how to build reliable text-to-SQL workflows.
The Text-to-SQL Pattern
The core pattern requires three elements in the prompt:
1. Database schema. Every table the model might need, with column names, data types, and primary/foreign key relationships. Without this, the model guesses at table and column names and will often guess wrong.
2. Sample data. 3-5 rows per table. This helps the model understand what values actually appear in each column — whether dates are stored as "2026-05-18" or "18/05/2026" or Unix timestamps, whether status fields use "active"/"inactive" or 1/0 or "true"/"false."
3. Natural language question. The question in plain English, with any necessary context about what the user actually wants to know.
A minimal example:
You are a SQL query generator. Generate a PostgreSQL query for the following question.
Database schema:
Table: users
- id (UUID, primary key)
- email (VARCHAR, unique)
- status (VARCHAR: 'active', 'inactive', 'pending')
- created_at (TIMESTAMP)
- organization_id (UUID, foreign key -> organizations.id)
Table: organizations
- id (UUID, primary key)
- name (VARCHAR)
- plan (VARCHAR: 'free', 'pro', 'enterprise')
- created_at (TIMESTAMP)
Sample data (users):
id | email | status | created_at | organization_id
abc-123 | alice@example.com | active | 2026-03-15 09:00:00 | org-001
def-456 | bob@example.com | inactive | 2026-01-10 14:00:00 | org-001
Sample data (organizations):
id | name | plan
org-001 | Acme Corp | pro
org-002 | Beta Inc | enterprise
Question: How many active users does each organization have, sorted by count descending?
Return ONLY the SQL query, no explanation.
Common Failure Modes
Understanding where text-to-SQL fails helps you add the right guardrails.
JOIN confusion with multiple foreign keys. When a table has two or more foreign keys pointing to the same table, the model sometimes JOINs on the wrong one. Example: an invoices table with both billed_to_org_id and issued_by_org_id — the model may JOIN on either when you ask about invoices for a specific organization. Fix: label foreign keys explicitly in your schema description and, if ambiguity is possible, include a note in the prompt ("'billed_to_org_id' is the customer organization that owes the invoice").
Incorrect aggregation. COUNT vs SUM vs AVG errors are common. A user asking "how much revenue did we generate?" might get COUNT(*) instead of SUM(amount). Fix: your sample data should include varied numeric values that make the difference between count and sum obvious from context.
Date arithmetic errors. "Last 30 days," "this quarter," and "year over year" are interpreted inconsistently. The model may use NOW() - INTERVAL '30 days' correctly in PostgreSQL but not in MySQL, or may get the quarter boundary logic wrong. Fix: specify the database dialect explicitly ("Generate a MySQL 8 query") and for complex date logic, provide the exact date range as a comment ("assume today is 2026-05-18 for date calculations").
Case sensitivity in string comparisons. On case-sensitive databases, WHERE status = 'Active' fails when the data contains 'active'. Your sample data should show the exact case used in the real data, and your schema description should note if comparisons are case-sensitive.
NULL handling. WHERE status != 'inactive' excludes rows where status IS NULL in SQL, which is often not what the user intended. Explicit instruction helps: "When filtering by a column value, consider whether NULL values should be included or excluded."
Providing Schema at Scale
For databases with many tables, including the full schema in every prompt is expensive. Two strategies:
Table filtering. First, use the model to identify which tables are relevant to the question. Send only the table names and one-line descriptions, ask the model which tables it needs, then send the full schema for only those tables. This significantly reduces token count for large schemas.
Schema embedding with semantic search. Store each table's schema as a vector embedding. For each incoming question, retrieve the top K most relevant tables by semantic similarity. This scales to hundreds of tables without manual table selection.
Reviewing Generated SQL Before Running It
Never run model-generated SQL directly on a production database without review. This is not a best practice — it is a hard rule. The model occasionally generates queries that are syntactically correct but semantically wrong (counting users when asked for revenue, grouping by the wrong column), and more rarely generates queries that would return very large result sets or lock tables.
Review process for production:
- Syntax check. Most database clients will validate syntax without executing. Use EXPLAIN or a dry-run to check the query plan.
- Logic check. Read the WHERE clause, the JOIN conditions, and the aggregation. Does this match what the user asked?
- Scope check. Does the query access tables it should not have access to? Does it return more data than needed?
- Test on staging. Run on a staging database before production, especially for queries involving UPDATE, INSERT, or DELETE.
For fully automated text-to-SQL pipelines (no human review), restrict access to read-only database users and limit queries to SELECT only. Block any generated query containing INSERT, UPDATE, DELETE, DROP, or CREATE.
When LLM SQL Generation Is Reliable
Simple single-table queries: SELECT, WHERE, ORDER BY, LIMIT. High reliability. The model can read and filter from one table with minimal error rate.
Aggregations on a single table: GROUP BY, COUNT, SUM, AVG with a clear natural language question. Reliable when the sample data makes the intended aggregation obvious.
Two-table JOINs with a single clear relationship: "Show all orders with customer names" where there is one obvious foreign key. High reliability.
When It Is Unreliable
Multi-table JOINs with ambiguous relationships. Three or more tables, multiple foreign keys, or self-joins degrade reliability significantly. Expect to review and correct these manually.
Complex window functions. RANK(), LAG(), LEAD(), and complex PARTITION BY logic are frequently incorrect in generated SQL. These require careful review.
Business logic embedded in query structure. "Revenue excluding refunded orders" where "refunded" requires a JOIN to a status table and a specific status value — the model may get the logic right or may miss the exclusion entirely depending on how clearly it is described.
Database-specific syntax. If you are using a non-mainstream database or a version with specific syntax requirements, the model defaults to standard SQL and may generate queries that do not execute on your database.
A Production-Ready Prompt Template
You are a SQL query generator for a [PostgreSQL/MySQL/SQLite] database.
Instructions:
- Generate a single, correct SQL query that answers the question below.
- Use only the tables and columns defined in the schema.
- Do not add LIMIT unless the question specifies a limit or asks for a top N result.
- For date comparisons, assume today is [CURRENT_DATE].
- Return only the SQL query. No explanation, no markdown code blocks, no comments.
Schema:
[schema here]
Sample data:
[sample rows here]
Question: [user question]
Summary
LLM SQL generation is reliable for simple to moderate queries and unreliable for complex multi-table logic. The text-to-SQL pattern requires a complete schema, sample data, and the database dialect in the prompt. Common failures — JOIN confusion, aggregation errors, date arithmetic, and case sensitivity — are predictable and can be mitigated with targeted prompt instructions. Always review generated SQL before running on production, and restrict permissions to read-only for fully automated pipelines.
Keep Reading
- LLM Output Parsing Guide — parsing and validating structured outputs from text-to-SQL workflows
- Prompt Engineering Complete Guide 2026 — the broader context for domain-specific prompt engineering
- Prompt Testing Methodology Guide — how to build a test suite for text-to-SQL queries
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.