Phi-3 Mini: Running a 3.8B Parameter LLM On Your Phone

Phi-3 Mini at 3.8B parameters outperforms Mixtral 8x7B on several benchmarks and runs in browsers via WebGPU or on Android/iOS via ONNX. Here's how.

Mahmudul Haque Qudrati

CEO & ML Engineer

March 31, 2026

7 min read

// tags

#phi-3#microsoft#edge-ai#small-models#on-device

FIG. ART-24

7 min read

“

Phi-3 Mini: Running a 3.8B Parameter LLM On Your Phone

// reading plan

sections

396

words

min read

// Developer Tools

Microsoft Starts Canceling Claude Code Licenses: What Developers Need to Know

Microsoft has started canceling Claude Code licenses for its employees, signaling a shift in AI tooling strategy. This post explains the context, implications, and what developers should consider.

3 min read

// Machine Learning

ONNX: Export Any ML Model and Run It Anywhere

128k Context

Despite its small size, Phi-3 Mini supports a 128k token context window - the same as GPT-4o and Llama 3.1. This is unusual for edge-class models and enables long-document tasks even on device.

Running in the Browser With WebGPU

Using transformers.js, Phi-3 Mini runs entirely client-side:

import { pipeline } from "@xenova/transformers";

const generator = await pipeline(
  "text-generation",
  "Xenova/Phi-3-mini-4k-instruct"
);

const result = await generator("Explain recursion briefly:", {
  max_new_tokens: 200,
  temperature: 0.7,
});

console.log(result[0].generated_text);

No API calls, no server, no data leaving the browser. First load downloads ~2GB of weights to the browser cache; subsequent loads are instant.

Android and iOS Deployment via ONNX

Export to ONNX for mobile inference:

pip install optimum onnxruntime
python -m optimum.exporters.onnx     --model microsoft/Phi-3-mini-128k-instruct     --task text-generation-with-past     onnx_output/

The exported model integrates with ONNX Runtime Mobile for iOS/Android apps. Microsoft's ONNX Runtime GenAI provides a Swift/Kotlin API wrapper.

Ollama for Local Desktop Use

ollama pull phi3:mini
ollama run phi3:mini "Write a regex to extract email addresses."

Benchmark Comparison

Benchmark	Phi-3 Mini (3.8B)	Mixtral 8x7B (47B)	Llama 3 8B
MT-Bench	8.38	8.30	8.15
MMLU	68.8%	70.5%	66.6%
HumanEval	60.9%	45.1%	60.4%
TriviaQA	64.5%	73.2%	67.6%

Summary

Phi-3 Mini is the best model in the sub-4B class for instruction following and code generation. Use it for on-device inference where privacy or latency is paramount, or in browser applications that can't rely on an API. Model available at HuggingFace.

Phi-3 Mini: Running a 3.8B Parameter LLM On Your Phone

Related Articles

Microsoft Starts Canceling Claude Code Licenses: What Developers Need to Know

Small Models, Big Performance

Why It's Fast Enough for the Edge

128k Context

Running in the Browser With WebGPU

Android and iOS Deployment via ONNX

Ollama for Local Desktop Use

Benchmark Comparison

Summary

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

ONNX: Export Any ML Model and Run It Anywhere

Supervised Learning Explained: How Models Learn from Labeled Examples

Phi-3 Mini: Running a 3.8B Parameter LLM On Your Phone

Related Articles

Microsoft Starts Canceling Claude Code Licenses: What Developers Need to Know

Small Models, Big Performance

Why It's Fast Enough for the Edge

128k Context

Running in the Browser With WebGPU

Android and iOS Deployment via ONNX

Ollama for Local Desktop Use

Benchmark Comparison

Summary

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

ONNX: Export Any ML Model and Run It Anywhere

Supervised Learning Explained: How Models Learn from Labeled Examples

The workspace your team
actually needs