LLMs are genuinely useful for data analysis, but not in the way many people initially expect. They cannot reliably perform arithmetic on large datasets stored in their context. What they can do is write accurate SQL queries, pandas code, and visualization scripts, which real computing infrastructure then executes. The correct mental model: LLMs write the code, computers run it.
What LLMs Can Do With Data
Write SQL Queries From Natural Language
This is one of the highest-value LLM data applications. Given a schema description and a natural language question, GPT-4o and Claude 3.5 Sonnet reliably produce correct SQL for moderately complex queries.
Schema: orders(id, customer_id, amount, created_at, status)
customers(id, name, email, country)
Question: What is the average order value by country for orders in the last 90 days,
only for countries with more than 100 orders?
Generated SQL:
SELECT c.country,
AVG(o.amount) as avg_order_value,
COUNT(*) as order_count
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at >= NOW() - INTERVAL '90 days'
AND o.status != 'cancelled'
GROUP BY c.country
HAVING COUNT(*) > 100
ORDER BY avg_order_value DESC;
The LLM writes this. Your actual database executes it against real data. You get correct results without the LLM ever touching the raw data directly.
Generate pandas Code
For Python-based data work, LLMs can write pandas transformation pipelines, cleaning scripts, and analysis code that runs in your actual environment:
# LLM-generated pandas code for calculating customer lifetime value cohorts
import pandas as pd
df["first_purchase_month"] = pd.to_datetime(df["first_purchase_date"]).dt.to_period("M")
ltv_by_cohort = (
df.groupby("first_purchase_month")
.agg(
customer_count=("customer_id", "nunique"),
total_revenue=("amount", "sum"),
avg_ltv=("amount", lambda x: x.sum() / df.loc[x.index, "customer_id"].nunique()),
)
.reset_index()
)
The model generates this code. You execute it in your environment. The calculation runs on your actual data.
Explain What a Dataset Means
Paste a sample of your data (headers + first 20 rows) and ask the model to explain what you are looking at, spot potential data quality issues, suggest analysis directions, or explain what a particular column likely represents. This is high-value for onboarding to unfamiliar datasets.
Create Visualization Code
LLMs write matplotlib, plotly, and seaborn code from a description of what you want to see. "Show me a heatmap of monthly revenue by product category" becomes working Python code in seconds.
Spot Anomalies in Small Tables
For tables small enough to fit in context (under ~100 rows), LLMs can identify patterns, flag outliers, and spot data quality issues. For example: pasting a week of daily metrics and asking "does anything look unusual?"
What LLMs Cannot Do Reliably
Perform Calculations on Large Datasets
If you paste 10,000 rows of data into a context window and ask "what is the total revenue?", the model will attempt to sum the values but will likely be wrong. LLMs are not calculators. They generate plausible-looking numbers based on training, not by actually iterating through values.
This is not an edge case. Arithmetic over large in-context datasets is a known, fundamental weakness of current LLMs. Always use actual computational tools for this.
Remember Data Accurately Across a Long Conversation
If you share a dataset at the start of a conversation and then ask about specific rows 20 turns later, the model may confuse values, hallucinate rows that were not there, or fail to correctly reference data from early in the conversation. Treat each data reference as a fresh lookup, not a persistent in-memory store.
Handle Truly Large Tables in Context
Current context windows (even at 128k or 1M tokens) fill quickly with tabular data. A CSV with 50,000 rows and 20 columns might take hundreds of thousands of tokens. Loading this directly into context is impractical and still does not give you reliable arithmetic.
The Correct Workflow
The right approach is a loop:
- LLM writes the code (SQL, Python, R)
- You run the code in your actual environment
- You share the results back with the LLM
- LLM interprets the results and generates follow-up code if needed
This loop is how professional data analysts already use LLMs effectively. The LLM is a pair programmer for data work, not a data processing engine.
You: "Write pandas code to find customers who have not ordered in 90 days
but had at least 3 orders before that"
LLM: [generates pandas code]
You: [runs code, gets 847 customers]
You: "Now write code to segment these by their total historical spend"
LLM: [generates segmentation code using the same dataframe]
Tools That Do This Well
ChatGPT Advanced Data Analysis (formerly Code Interpreter): uploads a CSV, runs Python in a sandboxed environment, shows results directly. The most accessible implementation of the correct workflow. Good for ad-hoc analysis where you do not need to export the code.
Claude with uploaded CSVs: Claude can analyze uploaded CSV files, though its in-context analysis has the arithmetic limitations described above. Works better when you ask it to generate code you run externally.
LlamaIndex for structured data: the Query Engine in LlamaIndex can convert natural language queries to SQL or pandas against your actual database or dataframe. This is the infrastructure layer for building production LLM data pipelines.
LangChain SQL Agent: connects an LLM to a SQL database, lets it query, observe results, and iterate. Production-viable for building natural language database interfaces.
Practical Data Analysis Applications
Analyst copilot: data analysts write queries faster with LLM assistance. Research shows 20-40% productivity gains for common query writing tasks.
Natural language database interface: non-technical users ask questions in English, the system generates SQL, runs it, and returns results. Reduces dependency on data team for ad-hoc requests.
Data quality auditing: paste schema and sample data, ask the model to suggest validation checks, identify potential integrity issues, and generate cleaning code.
Report generation: run analysis code, share results with LLM, ask it to write a natural language summary of the findings.
Keep Reading
- Function Calling in LLMs — How to give LLMs access to real computation tools
- LLM Embeddings Explained — The right way to search over large datasets semantically
- How Large Language Models Work: Complete Guide — Why LLMs cannot reliably do arithmetic
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.