Few-shot prompting means giving a language model a small number of examples of the input-output pattern you want before presenting your actual input. The model uses those examples to infer what you want and applies the same pattern to your request. It works because LLMs are trained to continue patterns: if you show three examples of input X producing output Y, the model learns that for this conversation, X should produce Y-style output.
The key finding from Brown et al.'s GPT-3 paper (Brown et al., "Language Models are Few-Shot Learners," NeurIPS 2020) is that few-shot prompting approaches the performance of fine-tuned models on many tasks, at zero training cost and with the ability to change the behavior instantly by changing the examples.
Zero-Shot vs. One-Shot vs. Few-Shot: The Actual Output Difference
Let me show the difference concretely, using the same task across all three.
Task: Extract action items from a meeting transcript segment.
Zero-shot prompt:
Extract action items from this meeting transcript.
"John: We need to update the pricing page by Friday. Sarah, can you handle that? Also, we should follow up with the Acme client about their contract renewal. I'll send them an email this week."
Zero-shot output: "The action items are: update the pricing page by Friday (Sarah), follow up with Acme client about contract renewal (John, this week)."
Acceptable, but the format is inconsistent and the output is not machine-parseable.
One-shot prompt (one example provided):
Extract action items from meeting transcripts and return them as a list in this format:
- [ ] Owner: [name] | Task: [description] | Due: [deadline or "not specified"]
Example:
Input: "Mike, please send the onboarding docs to new users by end of day tomorrow. I'll schedule the review meeting for next week."
Output:
- [ ] Owner: Mike | Task: Send onboarding docs to new users | Due: End of day tomorrow
- [ ] Owner: [speaker] | Task: Schedule review meeting | Due: Next week
Now extract from:
"John: We need to update the pricing page by Friday. Sarah, can you handle that? Also, we should follow up with the Acme client about their contract renewal. I'll send them an email this week."
One-shot output:
- [ ] Owner: Sarah | Task: Update pricing page | Due: Friday
- [ ] Owner: John | Task: Follow up with Acme client re: contract renewal | Due: This week
The format is now consistent and parseable. The single example defined the output structure without requiring a long prose description.
Few-shot prompt (three examples provided):
Adding two more examples of varied meeting transcript styles further trains the pattern. The model now handles edge cases like multiple owners on a single task, tasks with no specified owner, and tasks with relative vs. absolute deadlines, because it has seen examples of each.
The output quality difference between zero-shot and one-shot here is larger than the difference between one-shot and few-shot. The first example does the heavy lifting.
How Many Examples Is "Few"?
Brown et al. tested 0, 1, 2, 4, 8, 16, and 32 examples across different tasks. Their finding: performance typically peaks between 3 and 5 examples for most tasks, with diminishing returns beyond that. For some tasks, 8 to 10 examples help. Beyond 10, performance often plateaus or slightly degrades as the examples start consuming context window space that could be used for the actual task.
The practical guidance: start with 3 examples. Test with 1, 3, and 5. Pick the minimum that achieves your quality target. More is not always better, and more always costs more tokens.
One important nuance from the research: the quality of examples matters more than the quantity. Three diverse, representative examples beat six redundant ones. Each example should show a distinct case or edge condition.