What Makes R1 Different
Most frontier models are trained with supervised fine-tuning (SFT) on human-labeled reasoning chains, then refined with RLHF. DeepSeek R1 took a different path: it trained reasoning capability using pure reinforcement learning (GRPO) without any SFT cold-start.
The model learns to reason through trial-and-error against verifiable rewards (math answers, code correctness) rather than imitating human-written chains of thought. The result is a reasoning style that sometimes looks alien but achieves remarkable accuracy on hard problems.
The full research paper is publicly available and worth reading for anyone interested in RL-based training.
Benchmark Results
| Benchmark | DeepSeek R1 | o1-mini | o1 | |-----------|-------------|---------|-----| | AIME 2024 | 79.8% | 63.6% | 74.4% | | MATH-500 | 97.3% | 90.0% | 96.4% | | Codeforces | 96.3th %ile | 93.4th %ile | 96.6th %ile | | MMLU | 90.8% | 85.2% | 91.8% |
R1 ties or beats o1 on most benchmarks while being MIT licensed and available to run locally.
Architecture: MoE With 37B Active Parameters
R1 uses a 671B parameter Mixture of Experts architecture, but only 37B parameters are active per forward pass. This gives it frontier-level capacity while keeping inference costs closer to a 37B dense model.
Distilled variants (trained to imitate R1's reasoning traces using SFT) are available at 7B, 14B, 32B, and 70B — making reasoning capability accessible on consumer hardware.
API Access
The DeepSeek API is OpenAI-compatible and priced at $0.14 per million input tokens (cache hits: $0.014) and $2.19 per million output tokens — a fraction of o1's cost.
from openai import OpenAI
client = OpenAI(
api_key="your-deepseek-api-key",
base_url="https://api.deepseek.com"
)
response = client.chat.completions.create(
model="deepseek-reasoner",
messages=[
{"role": "user", "content": "Prove that sqrt(2) is irrational."}
]
)
# R1 returns both thinking and answer
print(response.choices[0].message.reasoning_content) # chain of thought
print(response.choices[0].message.content) # final answer
Running Locally With Ollama
# 7B distilled — runs on consumer GPU (8GB VRAM)
ollama pull deepseek-r1:7b
ollama run deepseek-r1:7b "What is the integral of x^2 from 0 to 3?"
# 32B distilled — runs on 2× 3090s
ollama pull deepseek-r1:32b
When to Use R1
R1 is best for tasks with verifiable correct answers: math, formal proofs, competitive programming, structured data extraction with strict schema constraints. For open-ended creative or conversational tasks, GPT-4o or Claude 3.5 Sonnet often produce more natural output.
Summary
DeepSeek R1 is a landmark in open-source AI: frontier reasoning capability, MIT license, and a published training methodology that's already influencing how the industry thinks about RL-based training. Download weights at HuggingFace or call the API at deepseek.com.