Tracking raw LLM API spend tells you how much you paid but not whether you got value. The right unit of measurement is cost per meaningful task: cost per email classified, cost per meeting summarized, cost per code review completed. Calculate that number, compare it to the cost of doing the same task without AI (employee time multiplied by hourly rate), and you have a clear ROI signal. When AI cost per task is lower than human cost per task, the spend is justified. When it is not, optimize or cut the feature.
Why "API Spend" Is the Wrong Metric
Teams that track "LLM API spend" know their total bill but cannot answer the question that matters: is this spend generating value at a reasonable cost?
Two teams can both spend $1,000/month on LLM API calls. Team A uses it to process 500,000 customer support classifications that would cost $15,000 in human labor. Team B uses it to generate marketing copy that few people read. Same bill, radically different ROI.
The cost-per-task framework makes this distinction visible.
Defining Your Task Unit
The first step is choosing the right task unit for each LLM-powered feature. The task unit should be:
- Meaningful to the business. Not "API call completed" (internal), but "email classified" (business outcome).
- Countable. You can count the number of tasks completed per day/month.
- Comparable to a human alternative. There should be a non-AI way to do the same task so you can calculate the comparison cost.
Examples of good task units:
| Feature | Task Unit | |---------|-----------| | Support ticket classifier | Tickets classified per month | | Meeting summarizer | Meetings summarized per month | | Code review assistant | Pull requests reviewed per month | | Invoice data extractor | Invoices processed per month | | Content moderator | Items moderated per month |
Calculating AI Cost Per Task
The formula:
AI cost per task = (total LLM API cost for feature) / (number of tasks completed)
You need to track LLM costs at the feature level, not just globally. Add metadata tags to your API calls as described in the rate limiting guide, then aggregate by feature.
def calculate_cost_per_task(feature_name: str, month: str) -> dict:
# Query your token usage logs
usage = get_feature_usage(feature_name, month)
input_cost = (usage["input_tokens"] / 1_000_000) * MODEL_INPUT_PRICE
output_cost = (usage["output_tokens"] / 1_000_000) * MODEL_OUTPUT_PRICE
total_cost = input_cost + output_cost
tasks_completed = get_task_count(feature_name, month)
cost_per_task = total_cost / tasks_completed if tasks_completed > 0 else 0
return {
"feature": feature_name,
"month": month,
"total_cost": total_cost,
"tasks_completed": tasks_completed,
"cost_per_task": cost_per_task
}
Calculating Human Cost Per Task
For each feature, estimate the human cost of doing the same task without AI:
Human cost per task = time_to_complete_manually × hourly_rate
Be conservative in this estimate. Use actual task completion times measured from your team, not estimates. If your support team spends 2 minutes on average classifying a ticket (reading, deciding, tagging), and your average fully-loaded employee cost is $50/hour, the human cost per classification is:
2 minutes / 60 minutes × $50 = $1.67 per ticket
Now compare to AI cost. If your AI classifier processes 10,000 tickets per month on GPT-4o-mini at $0.15/1M tokens, with an average 200 input tokens + 10 output tokens per ticket:
Tokens per ticket: 210 Cost per ticket: (210 / 1,000,000) × $0.15 = $0.0000315 ≈ $0.000032
AI cost per ticket: $0.000032 Human cost per ticket: $1.67 ROI multiple: 52,000x
This is an extreme case (classification is simple and AI is very cheap for it), but it illustrates the framework. The monthly savings at 10,000 tickets: $1.67 × 10,000 - $0.32 = $16,699.68 in saved labor or time reallocation.
A More Realistic Example: Meeting Summaries
Meeting summaries are a more representative case where costs and quality both matter.
Setup: Your product summarizes meetings using Claude 3.5 Haiku. Average meeting transcript: 8,000 tokens input, 500 tokens output.
Cost per summary: Input: (8,000 / 1,000,000) × $0.80 = $0.0064 Output: (500 / 1,000,000) × $4.00 = $0.0020 Total: $0.0084 per summary
Human alternative: a person writing a meeting summary from notes takes 15-30 minutes. At $50/hour: $12.50 to $25.00 per summary.
ROI multiple: 1,500x to 3,000x
Even adding overhead for editing and quality-checking the AI summary (say, 5 minutes per summary = $4.17), the AI still delivers a 50-200x cost advantage while freeing up 10-25 minutes per meeting for the employee.
When the ROI Is Negative
Not all LLM features are cost-justified when you run this calculation. Common cases where AI cost exceeds human cost:
- Complex creative work where human expertise is genuinely rare and valuable (senior engineering design work, strategic planning)
- High-correction features where the human has to review and correct the AI output so heavily that the total time (AI generation + human correction) exceeds pure human work time
- Low-volume features where setup and maintenance costs (engineering time for the feature, prompt engineering, eval) amortize poorly across few tasks
Running the cost-per-task analysis reveals these cases. Features with negative ROI should be cut or deprioritized.
Building a Cost-Per-Task Dashboard
Track cost per task monthly for every LLM feature and display it in a simple table:
| Feature | Tasks/Month | AI Cost/Task | Human Cost/Task | ROI Multiple | Monthly Savings | |---------|-------------|--------------|-----------------|--------------|-----------------| | Support classification | 50,000 | $0.00003 | $1.67 | 55,000x | $83,497 | | Meeting summary | 2,000 | $0.008 | $16.67 | 2,000x | $33,324 | | Invoice extraction | 5,000 | $0.05 | $3.33 | 67x | $16,250 |
Review this table quarterly. Features where the ROI multiple is declining (because the human alternative got cheaper or the AI got worse) need attention.
Keep Reading
- AI Budget for Startups — How much to budget at each stage based on ROI analysis.
- LLM Rate Limiting and Cost Control — How to enforce the budget your ROI analysis sets.
- Cutting LLM API Costs: The Complete Guide — How to improve the AI cost side of the ROI equation.
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.