Is DeepSeek V4 Pro open source?

Yes. DeepSeek V4-Pro ships under the MIT license with open weights on Hugging Face (deepseek-ai/DeepSeek-V4-Pro). You can self-host or use API providers like Fireworks and DeepSeek direct.

How does DeepSeek V4 Pro compare to Claude Opus 4.8?

DeepSeek V4-Pro scores 52 on the AA Intelligence Index vs Opus 4.8 at 61. DeepSeek wins on cost and deployment flexibility. Opus wins on highest-stakes coding, GDPval-AA, and enterprise support SLAs.

What GPU do I need to self-host DeepSeek V4 Pro?

V4-Pro (1.6T MoE, ~49B active) typically needs multi-GPU H100/H200 class hardware for full precision. V4-Flash (~13B active) can run on a single 80GB GPU when quantized.

Can DeepSeek V4 Pro run in an air-gapped environment?

Yes, with MIT weights you can deploy fully offline. Budget for GPU ops, model updates, and security patching. Most teams below 50M tokens/month are cheaper on API than self-hosting Kimi-tier models.

Why did Qwen3.7-Max go proprietary?

Alibaba moved Qwen3.7-Max to API-only to monetize frontier agentic capability (35-hour runs, MCP-Atlas 76.4) while keeping smaller tiers like Qwen3.6-35B-A3B open under Apache 2.0.

// back to blog

Open Source AI

DeepSeek V4 Pro and Kimi K2.6 vs Claude Opus 4.8: Open Weights at Frontier Level

MIT vs Modified MIT licenses, AA Index 52-54 vs 61, H100 self-host break-even math, and when open weights beat closed APIs. June 2026 guide.

Mahmudul Haque Qudrati

CEO & ML Engineer

June 2, 2026

11 min read

// tags

#deepseek-v4

// reading plan

sections

2,296

words

min read

// Open Source AI

OpenCode vs Claude Code: Open-Source Agentic CLI Compared

OpenCode runs Claude, GPT, Gemini, or local Ollama models in one terminal agent — Claude Code is official, polished, and Anthropic-native. Honest 2026 comparison.

5 min read

// LLMs & Language Models

Claude Opus 4.8 vs GPT-5.5 vs Gemini 3.1 Pro: June 2026 Benchmarks and Pricing

License Comparison: What You Can Actually Do

The legal picture is clearer than it was even six months ago, but it still requires a careful read.

Model	License	Commercial Use	Fine-Tuning	Redistribution	Model Output Restrictions
DeepSeek V4-Pro	MIT	Yes, unrestricted	Yes	Yes	None
Kimi K2.6	Modified MIT	Yes, with attribution	Yes	Yes, with attribution	Cannot claim outputs are human-generated
Qwen3.7-Max	Proprietary (closed API)	Via API only	No	No	Standard API ToS
Claude Opus 4	Anthropic ToS	Via API only	No (fine-tuning beta, restricted)	No	Standard API ToS
GPT-4.1	OpenAI ToS	Via API only	Yes (enterprise tier)	No	Standard API ToS

Two things stand out here.

First, DeepSeek V4-Pro is genuinely MIT. That means you can embed it in a product, ship it as part of a self-contained appliance, deploy it behind a firewall without phoning home, and build a competing product on top of it. MIT offers no restrictions that would concern a commercial software team.

Second, Kimi K2.6's Modified MIT adds two clauses: attribution (you must name the model in documentation or about pages) and a prohibition on claiming model outputs are human-generated without disclosure. The attribution requirement is operationally trivial. The output disclosure clause is consistent with EU AI Act requirements that will apply to most European deployments regardless of license anyway.

The Qwen3.7-Max Detour

Alibaba's Qwen line has an interesting recent history. Qwen2.5 and Qwen3.0 were released with permissive licenses that attracted a large open-source community. Qwen3.7-Max, the highest-capability model in the current generation, shipped as a closed API product with no public weights and no announced timeline for open release.

This matters because Qwen3.7-Max benchmarks at roughly 57 on the AA Index, and many teams had planned migration paths from earlier open Qwen models. The proprietary pivot stranded those plans and served as a concrete example of why "open weights today" does not guarantee "open weights tomorrow." If license stability over a 3-year product roadmap matters to you, Alibaba's recent behavior with Qwen3.7-Max should be in your risk model.

DeepSeek has maintained open releases consistently since V2. Moonshot AI has released all K2-series checkpoints under permissive terms. Neither company has announced any intent to close future releases, but neither has made a binding legal commitment either. This is a reputational track record, not a contractual guarantee.

Self-Hosting Cost Estimates: H100 vs API

This is where the economics get interesting, and where most "open source is free" conversations fall apart without actual numbers.

Hardware Requirements

DeepSeek V4-Pro is a 671-billion-parameter Mixture-of-Experts model. Running it at full precision requires approximately 8x H100 80GB GPUs in inference mode, or 4x H100 80GB with 4-bit GPTQ quantization at acceptable quality loss for most production workloads. Kimi K2.6 is a 235-billion-parameter dense model that fits in 4x H100 80GB at BF16 or 2x H100 80GB at INT8.

Cloud GPU Spot Pricing (June 2026)

Provider	H100 80GB SXM (hourly, spot)	8x H100 cluster (hourly)
Lambda Labs	$2.49	$19.92
RunPod	$2.71	$21.68
CoreWeave	$2.85	$22.80
AWS p4d.24xlarge (on-demand)	$32.77	-- (8x A100, not H100)

Reserved 1-year pricing on Lambda Labs brings the 8x H100 cluster to roughly $14.50 per hour.

API Pricing Comparison

Model	Input (per 1M tokens)	Output (per 1M tokens)
Claude Opus 4	$15.00	$75.00
Kimi K2.6 (API)	$0.60	$2.50
DeepSeek V4-Pro (API)	$0.27	$1.10
GPT-4.1 (API)	$2.00	$8.00

At Claude Opus 4 output pricing of $75 per million tokens, a team generating 50 million output tokens per month pays $3,750 per month on outputs alone. An 8x H100 spot cluster running continuously at 85 percent utilization costs roughly $12,150 per month. At that scale, the API is cheaper.

The break-even point, running DeepSeek V4-Pro self-hosted versus API, assuming 85 percent cluster utilization and 2,000 tokens per second throughput on the cluster, works out to roughly 180 million output tokens per month. Below that, use the API. Above that, consider self-hosting.

For Kimi K2.6, the smaller model fits on a 2x H100 cluster at $5.40 per hour spot, or roughly $3,890 per month at continuous operation. The break-even against Kimi's API pricing falls around 60 million output tokens per month. This is a much more reachable threshold for mid-sized product teams.

Hidden Costs of Self-Hosting

The hardware cost is the most visible line item, but not the largest. A realistic self-hosted inference stack adds:

Engineering time: Minimum 0.5 FTE to maintain vLLM or TGI serving stack, monitor GPU health, handle CUDA version conflicts, and manage rolling updates. Fully loaded engineering cost: $6,000 to $10,000 per month.
Inference optimization: Batching strategies, KV cache management, speculative decoding configuration. Expect 2-4 weeks of setup work before reaching production throughput targets.
Monitoring and observability: Prometheus, Grafana, custom alerting for GPU memory fragmentation. Not glamorous, but silently expensive when ignored.
Compliance overhead: If you are self-hosting to satisfy data residency requirements, you also need to budget for SOC 2 or ISO 27001 audit coverage of your inference infrastructure.

A conservative total cost of ownership for a self-hosted DeepSeek V4-Pro deployment at 200 million tokens per month lands between $18,000 and $24,000 per month including engineering overhead. At that output volume, the equivalent Claude Opus 4 API spend is $15,000 per month on outputs alone, not counting input costs. The self-hosted economics win only if you are also substituting away from Opus-tier pricing for workloads where 52 AA Index is sufficient.

When NOT to Self-Host

Self-hosting open weights models is the right call for some teams and the wrong call for many others. Here is where the economics and operational tradeoffs argue against it.

Team size below 20 engineers. The 0.5 FTE maintenance burden represents a meaningful percentage of a small team's capacity. Early-stage startups should stay on managed APIs until the unit economics clearly favor infrastructure investment.

Use cases that genuinely need frontier quality. The 9-point gap between Kimi K2.6 and Claude Opus 4 is real. If you are processing legal documents, writing medical content, or running a product where output quality is the primary differentiator, do not shave costs on the model.

Regulated environments with complex compliance requirements. Self-hosting sounds like a compliance win (data stays on your servers), but it adds infrastructure audit scope. For HIPAA, PCI-DSS, or FedRAMP environments, a managed API with a BAA or equivalent may have a lower total compliance cost than building your own secure inference cluster.

Teams without GPU infrastructure expertise. Running inference at production scale on multi-GPU clusters is operationally distinct from running training workloads. If your ML team's experience is primarily in training and fine-tuning, factor in the learning curve and the tail risk of a production incident during a peak traffic period.

Latency-sensitive applications. Managed API providers have spent years optimizing inference latency. Self-hosted vLLM on a cold cluster may not match API p99 latency without significant tuning investment. Benchmark your specific workload before committing.

Practical Decision Framework

Use this checklist before committing to self-hosted open weights deployment:

Monthly output tokens exceed 50M for smaller models (Kimi K2.6 tier) or 180M for larger models (DeepSeek V4-Pro tier)?
Engineering team can dedicate 0.5 FTE ongoing to inference infrastructure?
Data residency or IP requirements make third-party API use problematic or legally constrained?
AA Index score of 52-54 is sufficient for your specific use case (tested with real prompts, not assumed)?
GPU cloud budget approved and spot instance interruption risk is acceptable or mitigated with reserved capacity?

If you answer yes to all five, self-hosting is worth prototyping. If you answer no to any of the first three, stay on managed APIs and revisit in 12 months.

Looking Ahead: The Compression of the Quality Gap

The 9-point AA Index gap between Kimi K2.6 and Claude Opus 4 is the smallest it has ever been. Based on DeepSeek's published research trajectory and Moonshot AI's k2-series roadmap, both are targeting new releases in Q3 and Q4 2026. Independent projections from the Artificial Analysis team suggest open weights models could reach 58-62 on the index by end of year, which would effectively eliminate the justification for Opus-tier pricing for most structured tasks.

Anthropic's response to this trend is visible in Claude's pricing evolution: the gap between Haiku and Opus pricing has widened, not narrowed, suggesting Anthropic is positioning Haiku-tier for cost-sensitive applications and defending Opus pricing on quality-differentiated enterprise use cases. This is a reasonable strategy, but it also means that for teams doing high-volume structured work (code generation, data extraction, document parsing), the open weights alternative is becoming harder to dismiss.

What This Means for How You Build in 2026

The most important shift is not "which model to use" but "build your architecture to be model-agnostic." Whether you run DeepSeek V4-Pro today or switch to Kimi K3 in six months, a hard dependency on any single provider's API format or capability profile is a liability.

OpenAI-compatible endpoints (which DeepSeek V4-Pro supports) make this easier. LiteLLM and similar routing layers reduce lock-in further. The teams that will have the most flexibility over the next 18 months are the ones building model routing into their infrastructure now, before the switching cost of migrating grows with product complexity.

For more on how to build an efficient routing layer that balances quality and cost across models, see Post 4: LLM Token Optimization and Model Routing in 2026.

Summary

The open weights vs closed weights debate in 2026 is not a binary. DeepSeek V4-Pro at AA Index 52 and Kimi K2.6 at 54 are not replacements for Claude Opus 4 at 61 in every context. But they are credible alternatives for a large class of structured, English-first workloads, and they carry licenses that give engineering teams genuine deployment flexibility.

The economics favor self-hosting only at meaningful scale (50M+ tokens per month for Kimi-tier, 180M+ for DeepSeek-tier) with engineering overhead factored in. Below those thresholds, open model APIs (not self-hosted) often offer the best cost-quality tradeoff.

Qwen3.7-Max's proprietary pivot is a reminder that "open today" does not mean "open forever." Build model-agnostic infrastructure.

Part of the Pristren AI Sprint series. Continue reading:

DeepSeek V4 Pro and Kimi K2.6 vs Claude Opus 4.8: Open Weights at Frontier Level

Related Articles

OpenCode vs Claude Code: Open-Source Agentic CLI Compared

The Moment the Gap Became Uncomfortable

What the AA Index Actually Measures

License Comparison: What You Can Actually Do

The Qwen3.7-Max Detour

Self-Hosting Cost Estimates: H100 vs API

Hardware Requirements

Cloud GPU Spot Pricing (June 2026)

API Pricing Comparison

Hidden Costs of Self-Hosting

When NOT to Self-Host

Practical Decision Framework

Looking Ahead: The Compression of the Quality Gap

What This Means for How You Build in 2026

Summary

Frequently Asked Questions

Is DeepSeek V4 Pro open source?

How does DeepSeek V4 Pro compare to Claude Opus 4.8?

What GPU do I need to self-host DeepSeek V4 Pro?

Can DeepSeek V4 Pro run in an air-gapped environment?

Why did Qwen3.7-Max go proprietary?

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Claude Opus 4.8 vs GPT-5.5 vs Gemini 3.1 Pro: June 2026 Benchmarks and Pricing

We Built the Same 3D Website with Opus 4.8, Kimi K2.6, DeepSeek V4, and Gemini AI Studio

DeepSeek V4 Pro and Kimi K2.6 vs Claude Opus 4.8: Open Weights at Frontier Level

Related Articles

OpenCode vs Claude Code: Open-Source Agentic CLI Compared

The Moment the Gap Became Uncomfortable

What the AA Index Actually Measures

License Comparison: What You Can Actually Do

The Qwen3.7-Max Detour

Self-Hosting Cost Estimates: H100 vs API

Hardware Requirements

Cloud GPU Spot Pricing (June 2026)

API Pricing Comparison

Hidden Costs of Self-Hosting

When NOT to Self-Host

Practical Decision Framework

Looking Ahead: The Compression of the Quality Gap

What This Means for How You Build in 2026

Summary

Frequently Asked Questions

Is DeepSeek V4 Pro open source?

How does DeepSeek V4 Pro compare to Claude Opus 4.8?

What GPU do I need to self-host DeepSeek V4 Pro?

Can DeepSeek V4 Pro run in an air-gapped environment?

Why did Qwen3.7-Max go proprietary?

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Claude Opus 4.8 vs GPT-5.5 vs Gemini 3.1 Pro: June 2026 Benchmarks and Pricing

We Built the Same 3D Website with Opus 4.8, Kimi K2.6, DeepSeek V4, and Gemini AI Studio

The workspace your team
actually needs