Small Models, Big Performance
Microsoft's Phi-3 Mini challenges the assumption that bigger always means better. At just 3.8 billion parameters, it outperforms models 10-20x its size on key benchmarks — a result of careful training data curation using a "textbook quality" synthetic dataset approach.
MT-Bench score: 8.38 vs Mixtral 8x7B at 8.30. That's a 47B parameter model losing to a 3.8B model on instruction following.
Why It's Fast Enough for the Edge
Phi-3 Mini's 3.8B parameters (in INT4 quantization: ~2GB) fit in the memory constraints of:
- Modern smartphones (iPhone 15 Pro, Pixel 8 Pro)
- WebGPU in Chrome/Edge (4GB GPU budget)
- Raspberry Pi 5 with 8GB RAM
- Single consumer GPU (RTX 3060 12GB)
The full paper details the training methodology, which relies heavily on filtered web data and synthetically generated "textbook" content.
128k Context
Despite its small size, Phi-3 Mini supports a 128k token context window — the same as GPT-4o and Llama 3.1. This is unusual for edge-class models and enables long-document tasks even on device.
Running in the Browser With WebGPU
Using transformers.js, Phi-3 Mini runs entirely client-side:
import { pipeline } from "@xenova/transformers";
const generator = await pipeline(
"text-generation",
"Xenova/Phi-3-mini-4k-instruct"
);
const result = await generator("Explain recursion briefly:", {
max_new_tokens: 200,
temperature: 0.7,
});
console.log(result[0].generated_text);
No API calls, no server, no data leaving the browser. First load downloads ~2GB of weights to the browser cache; subsequent loads are instant.
Android and iOS Deployment via ONNX
Export to ONNX for mobile inference:
pip install optimum onnxruntime
python -m optimum.exporters.onnx --model microsoft/Phi-3-mini-128k-instruct --task text-generation-with-past onnx_output/
The exported model integrates with ONNX Runtime Mobile for iOS/Android apps. Microsoft's ONNX Runtime GenAI provides a Swift/Kotlin API wrapper.
Ollama for Local Desktop Use
ollama pull phi3:mini
ollama run phi3:mini "Write a regex to extract email addresses."
Benchmark Comparison
| Benchmark | Phi-3 Mini (3.8B) | Mixtral 8x7B (47B) | Llama 3 8B | |-----------|-------------------|---------------------|------------| | MT-Bench | 8.38 | 8.30 | 8.15 | | MMLU | 68.8% | 70.5% | 66.6% | | HumanEval | 60.9% | 45.1% | 60.4% | | TriviaQA | 64.5% | 73.2% | 67.6% |
Summary
Phi-3 Mini is the best model in the sub-4B class for instruction following and code generation. Use it for on-device inference where privacy or latency is paramount, or in browser applications that can't rely on an API. Model available at HuggingFace.