Deepseek V3 and Deepseek R1 are large language models from Deepseek AI, a Chinese company, released under MIT license in late 2024 and early 2025. Deepseek V3 performs comparably to GPT-4o on most benchmarks. Deepseek R1 rivals OpenAI's o1 on reasoning tasks. Both were reportedly trained for dramatically less than equivalent Western frontier models, and both are available as open weights you can run yourself.
The Training Cost Story
Deepseek AI reported training Deepseek V3 for approximately $5.6 million in compute costs. For context, training GPT-4 reportedly cost over $100 million, and frontier model training costs have continued to increase. Even if Deepseek's reported figure is underestimated by 2-3x, the efficiency gap is extraordinary.
How did they achieve this? A combination of factors:
- Mixture-of-experts architecture (only a subset of parameters active per token, reducing compute per token)
- Multi-head Latent Attention (MLA), a novel attention mechanism that reduces KV cache memory
- FP8 training (lower precision, less memory bandwidth)
- Aggressive pipeline and communication optimization for their H800 GPU cluster
The result: a model that competes with GPT-4o at dramatically lower training cost, and dramatically lower inference cost because of the MoE architecture.
Deepseek V3: The General-Purpose Model
Deepseek V3 is a 671B parameter model (37B active parameters per token due to MoE). Benchmark performance:
- MMLU: 88.5%, matching GPT-4o's 88.7%
- MATH: 90.2%, significantly stronger than GPT-4o
- HumanEval: 89.0%, competitive with leading coding models
- Multilingual (Chinese): significantly stronger than Western models for Chinese-language tasks
These benchmarks are from Deepseek's technical report (arXiv:2412.19437, December 2024), which has been independently largely corroborated by third-party evaluations.
Deepseek R1: The Reasoning Model
Deepseek R1 is a reasoning-optimized model, analogous to OpenAI's o1. It uses chain-of-thought reasoning (visible to the user, unlike o1) and was trained using reinforcement learning on verifiable tasks (math, code).
Benchmark performance:
- MATH: 97.3%, matching o1 (o1 achieves 96.4%)
- AIME 2024: 79.8%, competitive with o1's 79.2%
- Codeforces rating: 2029, placing it in the top competitive programming tier
These figures are from the Deepseek R1 technical report (arXiv:2501.12948, January 2025).
For math and reasoning-intensive tasks, R1 is genuinely competitive with the best models in the world.
How to Use Deepseek
Deepseek API (Cheapest Option)
Deepseek's own API is priced dramatically lower than OpenAI:
- Deepseek V3: $0.27 per 1M input tokens, $1.10 per 1M output tokens (Deepseek pricing, 2025)
- Deepseek R1: $0.55 per 1M input tokens, $2.19 per 1M output tokens
Compare to GPT-4o at $2.50/$10.00. For equivalent-quality general tasks, Deepseek V3 via the Deepseek API costs roughly 1/9th of GPT-4o.
The Deepseek API is OpenAI-compatible, so you can use the OpenAI SDK by changing the base URL:
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.DEEPSEEK_API_KEY,
baseURL: "https://api.deepseek.com",
});
const response = await client.chat.completions.create({
model: "deepseek-chat", // V3
messages: [{ role: "user", content: "Your prompt here" }],
});
OpenRouter
OpenRouter aggregates multiple providers and hosts Deepseek V3 and R1. Useful if you want a single API key for multiple models with automatic fallback.
Ollama (Local)
For local running:
ollama pull deepseek-v3
ollama run deepseek-v3
Note: the full 671B model in FP16 requires approximately 1.3TB of GPU memory, which is impractical for most users. Quantized versions (Q4) reduce this to roughly 400GB. In practice, local running of the full model requires a cluster of consumer GPUs or a single high-end workstation. For most users, using the Deepseek API or OpenRouter is more practical.
Distilled versions (Deepseek-R1-Distill-Llama-70B and Qwen-32B) are fine-tuned from R1's reasoning traces onto smaller Llama and Qwen base models, retaining much of the reasoning capability in a locally runnable package.
The Privacy and Regulatory Controversy
Deepseek is a Chinese company. Their privacy policy states that data is stored on servers in China. For companies with sensitive data, regulatory requirements (HIPAA, GDPR, FedRAMP), or policies against Chinese data jurisdiction, using the Deepseek API directly is not appropriate.
Mitigations:
- Run the open-weight model on your own infrastructure (no data leaves your environment)
- Use OpenRouter's hosting (data processed by OpenRouter's US infrastructure)
- Use cloud providers that host Deepseek (Azure, AWS Bedrock offer or plan to offer Deepseek hosting under their data terms)
The open weights mean that privacy concerns about the API do not apply to self-hosted deployments.
When Deepseek Is the Right Choice
Cost-sensitive production at scale: if you are processing high volumes of tokens and GPT-4o quality is sufficient for your use case, Deepseek V3 at 1/9th the price is worth serious evaluation.
Chinese language tasks: Deepseek significantly outperforms Western models on Chinese language understanding and generation. For Chinese-language applications, it is often the best choice.
Reasoning-heavy tasks: Deepseek R1's performance on MATH and logical reasoning is genuinely competitive with o1. For applications that benefit from chain-of-thought reasoning, R1 is worth benchmarking.
Self-hosted open source: for teams that want GPT-4o-class performance with fully self-hosted open weights, Deepseek is the strongest current option.
Keep Reading
- Llama 3.3 Complete Guide — The other major open source option
- LLM Comparison Guide 2026 — How Deepseek fits in the full model landscape
- Cutting LLM API Costs: Complete Guide — Deepseek fits into a broader cost optimization strategy
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.