DeepSeek-V4百万上下文开源大模型性能如何？值得用吗？

Q: DeepSeek-V4的上下文长度是多少？

DeepSeek-V4原生支持百万Token（1M）超长上下文，可处理大规模文本，适用于长文档分析和Agent应用。

Q: DeepSeek-V4支持哪两种推理模式？

支持非思考模式和思考模式，思考模式可通过reasoning_effort参数调节推理强度，平衡性能与效率。

Q: DeepSeek-V4 Pro和Flash版本有什么区别？

Pro版本参数更多，性能更高；Flash版本参数较少，但推理性能接近Pro，API成本显著降低，适合经济高效场景。

引言

DeepSeek-V4是DeepSeek推出的新一代大语言模型系列预览版，拥有百万字超长上下文窗口，在Agent能力、世界知识与推理性能方面达到开源领域领先水平。模型包含 deepseek-v4-pro 和 deepseek-v4-flash 两个版本，分别定位高性能与经济高效，均已开源并提供API服务，支持非思考与思考双模式，为长文本处理与智能体应用提供普惠化基础设施。

DeepSeek-V4 is a preview of the next-generation large language model series from DeepSeek, featuring a million-character ultra-long context window and achieving leading open-source performance in agent capabilities, world knowledge, and reasoning. The series includes two versions: deepseek-v4-pro (high performance) and deepseek-v4-flash (cost-efficient), both open-sourced and available via API. It supports both non-thinking and thinking modes, providing affordable infrastructure for long-text processing and agent applications.

主要功能

百万上下文处理：原生支持1M Token超长文本理解与记忆，为官方服务标配。

Million-token context handling: Natively supports understanding and memory of up to 1M tokens, standard for the official service.
混合注意力机制：CSA与HCA架构大幅降低长上下文计算与显存开销。

Hybrid attention mechanism: CSA and HCA architectures significantly reduce computation and memory overhead for long contexts.
双模式推理：支持非思考与思考模式，后者可通过 reasoning_effort 参数调节强度。

Dual-mode reasoning: Supports non-thinking and thinking modes; the latter can be tuned via the reasoning_effort parameter.
多领域专家融合：通过OPD蒸馏整合数学、代码、Agent等领域专家能力。

Multi-domain expert fusion: Integrates expert capabilities in math, code, agents, etc., via OPD distillation.
经济高效选择：Flash版本用更低参数实现接近Pro的推理性能，API成本显著降低。

Cost-efficient option: The Flash version achieves reasoning performance close to Pro with far fewer parameters, significantly reducing API costs.

技术原理

CSA压缩稀疏注意力：将每m个token的KV压缩为1个条目，通过Lightning Indexer计算索引分数并执行Top-k稀疏选择，结合滑动窗口与Attention Sink机制保留局部依赖。

CSA (Compressed Sparse Attention)压缩稀疏注意力，将每m个token的KV压缩为1个条目，通过稀疏选择降低长上下文计算量。: Compresses the KV of every m tokens into one entry, uses a Lightning Indexer to compute index scores and perform top‑k sparse selection, and retains local dependencies with sliding windows and Attention Sink.
HCA重度压缩注意力：以更大压缩比m’将KV条目合并为单个条目，保持密集注意力而不采用稀疏选择，进一步降低计算量。

HCA (Heavy Compression Attention)重度压缩注意力，以更大压缩比将KV条目合并为单个条目，保持密集注意力进一步降低计算量。: Merges KV entries into a single entry with an even larger compression ratio m’, maintaining dense attention without sparse selection to further reduce computation.
mHC流形约束超连接：将残差映射矩阵通过Sinkhorn-Knopp算法投影到双随机矩阵流形，约束谱范数不超过1，增强深层信号传播稳定性。

mHC (Manifold-Constrained Hyper-Connection): Projects the residual mapping matrix onto the doubly stochastic matrix manifold using the Sinkhorn-Knopp algorithm, constraining the spectral norm to ≤1 to enhance deep signal propagation stability.
Muon优化器：模型采用混合Newton-Schulz迭代对梯度矩阵进行正交化，分快速收敛与精确稳定两个阶段，支持大规模MoE高效训练。

Muon optimizer混合Newton-Schulz迭代对梯度矩阵正交化的优化器，支持大规模MoE高效训练。: Uses a hybrid Newton-Schulz iteration to orthogonalize the gradient matrix, divided into fast-convergence and precise-stability phases, enabling efficient training of large MoE models.
FP4量化感知训练：对MoE专家权重和CSA索引器QK路径进行FP4量化，用FP8扩展动态范围实现无损反量化，降低内存与计算开销。

FP4 quantization-aware training对MoE专家权重和CSA索引器QK路径进行FP4量化，用FP8扩展动态范围，降低内存与计算开销。: Quantizes MoE expert weights and the CSA indexer’s QK path to FP4, uses FP8 to extend dynamic range for lossless dequantization, reducing memory and computation overhead.

性能体现

知识能力


评测指标	DeepSeek-V4-Pro	对比模型表现
SimpleQA-Verified	57.9%	Gemini-3.1-Pro: 75.6%
Chinese-SimpleQA	84.4%	K2.6: 75.9%, GLM-5.1: 75.0%
MMLU-Pro	87.5%	GPT-5.4: ≈87.5%, Gemini-3.1-Pro: >90%
GPQA Diamond	90.1%	GPT-5.4: ~90%, Gemini-3.1-Pro: >92%

Knowledge Capabilities

Metric DeepSeek-V4-Pro Competitor Performance

SimpleQA-Verified 57.9% Gemini-3.1-Pro: 75.6%

Chinese-SimpleQA 84.4% K2.6: 75.9%, GLM-5.1: 75.0%

MMLU-Pro 87.5% GPT-5.4: ≈87.5%, Gemini-3.1-Pro: >90%

GPQA Diamond 90.1% GPT-5.4: ~90%, Gemini-3.1-Pro: >92%


Metric	DeepSeek-V4-Pro	Competitor Performance
SimpleQA-Verified	57.9%	Gemini-3.1-Pro: 75.6%
Chinese-SimpleQA	84.4%	K2.6: 75.9%, GLM-5.1: 75.0%
MMLU-Pro	87.5%	GPT-5.4: ≈87.5%, Gemini-3.1-Pro: >90%
GPQA Diamond	90.1%	GPT-5.4: ~90%, Gemini-3.1-Pro: >92%

推理与代码能力


评测指标	DeepSeek-V4-Pro	对比模型表现
HMMT 2026 Feb	95.2%	K2.6: ~92%, GLM-5.1: ~90%, GPT-5.4: ~96%
IMOAnswerBench	89.8%	GPT-5.4: ~91%, Opus-4.6: ~88%
Codeforces Rating	3206	GPT-5.4: 3168
Apex Shortlist	90.2%	GPT-5.4: 78.1%, Opus-4.6: 85.9%
LiveCodeBench	93.5%	All competitors: <90%

Reasoning & Coding

Metric DeepSeek-V4-Pro Competitor Performance

HMMT 2026 Feb 95.2% K2.6: ~92%, GLM-5.1: ~90%, GPT-5.4: ~96%

IMOAnswerBench 89.8% GPT-5.4: ~91%, Opus-4.6: ~88%

Codeforces Rating 3206 GPT-5.4: 3168

Apex Shortlist 90.2% GPT-5.4: 78.1%, Opus-4.6: 85.9%

LiveCodeBench 93.5% All competitors: <90%


Metric	DeepSeek-V4-Pro	Competitor Performance
HMMT 2026 Feb	95.2%	K2.6: ~92%, GLM-5.1: ~90%, GPT-5.4: ~96%
IMOAnswerBench	89.8%	GPT-5.4: ~91%, Opus-4.6: ~88%
Codeforces Rating	3206	GPT-5.4: 3168
Apex Shortlist	90.2%	GPT-5.4: 78.1%, Opus-4.6: 85.9%
LiveCodeBench	93.5%	All competitors: <90%

Agent能力


评测指标	DeepSeek-V4-Pro	对比模型表现
SWE Verified	80.6%	Opus-4.6: 80.8%
SWE Pro	55.4%	–
SWE Multilingual	76.2%	–
Terminal Bench 2.0	67.9%	K2.6: 66.7%, GLM-5.1: 63.5%, Opus-4.6: 65.4%
MCPAtlas Public	73.6%	–
Toolathlon	51.8%	–

Agent Capabilities

Metric DeepSeek-V4-Pro Competitor Performance

SWE Verified 80.6% Opus-4.6: 80.8%

SWE Pro 55.4% –

SWE Multilingual 76.2% –

Terminal Bench 2.0 67.9% K2.6: 66.7%, GLM-5.1: 63.5%, Opus-4.6: 65.4%

MCPAtlas Public 73.6% –

Toolathlon 51.8% –


Metric	DeepSeek-V4-Pro	Competitor Performance
SWE Verified	80.6%	Opus-4.6: 80.8%
SWE Pro	55.4%	–
SWE Multilingual	76.2%	–
Terminal Bench 2.0	67.9%	K2.6: 66.7%, GLM-5.1: 63.5%, Opus-4.6: 65.4%
MCPAtlas Public	73.6%	–
Toolathlon	51.8%	–

长上下文能力


评测指标	DeepSeek-V4-Pro	对比模型表现
MRCR 1M	83.5%	Gemini-3.1-Pro: 76.3%
CorpusQA 1M	62.0%	Gemini-3.1-Pro: 53.8%

Long-Context Capabilities

Metric DeepSeek-V4-Pro Competitor Performance

MRCR 1M 83.5% Gemini-3.1-Pro: 76.3%

CorpusQA 1M 62.0% Gemini-3.1-Pro: 53.8%


Metric	DeepSeek-V4-Pro	Competitor Performance
MRCR 1M	83.5%	Gemini-3.1-Pro: 76.3%
CorpusQA 1M	62.0%	Gemini-3.1-Pro: 53.8%

效率表现


指标	DeepSeek-V4-Pro	DeepSeek-V4-Flash	对比（V3.2）
单Token推理FLOPs (1M上下文)	V3.2的27%	V3.2的10%	–
累计KV缓存 (1M上下文)	V3.2的10%	V3.2的7%	–
专家权重存储	FP4	–	理论可再提升1/3效率

Efficiency

Metric DeepSeek-V4-Pro DeepSeek-V4-Flash Comparison (vs V3.2)

FLOPs per token (1M context) 27% of V3.2 10% of V3.2 –

Total KV cache (1M context) 10% of V3.2 7% of V3.2 –

Expert weight storage FP4 – Theoretical further 1/3 efficiency gain


Metric	DeepSeek-V4-Pro	DeepSeek-V4-Flash	Comparison (vs V3.2)
FLOPs per token (1M context)	27% of V3.2	10% of V3.2	–
Total KV cache (1M context)	10% of V3.2	7% of V3.2	–
Expert weight storage	FP4	–	Theoretical further 1/3 efficiency gain

如何使用DeepSeek-V4

网页端/App：访问 DeepSeek 官网或官方App，选择专家模式（Pro）或快速模式（Flash）。

Web/App: Visit DeepSeek's official website or app, select Expert mode (Pro) or Fast mode (Flash).
API调用：修改 model 参数为 deepseek-v4-pro 或 deepseek-v4-flash，base_url 保持不变。

API call: Set the model parameter to deepseek-v4-pro or deepseek-v4-flash; keep base_url unchanged.
思考模式：复杂Agent场景建议启用思考模式并设置 reasoning_effort: max。

Thinking mode: For complex agent scenarios, enable thinking mode and set reasoning_effort: max.
本地部署：通过Hugging Face或ModelScope下载开源权重自行部署。

Local deployment: Download open-source weights from Hugging Face or ModelScope for self‑deployment.

关键信息和使用要求

版本规格：Pro版1.6T参数/49B激活，Flash版284B参数/13B激活，预训练数据分别为33T与32T。

Model specs: Pro: 1.6T total / 49B active parameters; Flash: 284B total / 13B active parameters. Pre-training data: 33T and 32T tokens respectively.
上下文长度：两个版本均支持1M Token，旧接口 deepseek-chat 与 deepseek-reasoner 将于2026-07-24停用。

Context length: Both versions support 1M tokens. Legacy interfaces deepseek-chat and deepseek-reasoner will be deprecated on 2026-07-24.
API定价（每百万Token）：Pro输入缓存命中1元/未命中12元，输出24元；Flash输入缓存命中0.2元/未命中1元，输出2元。

API pricing (per million tokens): Pro – input cached 1 RMB, uncached 12 RMB, output 24 RMB; Flash – input cached 0.2 RMB, uncached 1 RMB, output 2 RMB.
算力限制：Pro版当前服务吞吐有限，预计下半年昇腾950超节点批量上市后价格将大幅下调。

Capacity constraints: The Pro version currently has limited throughput; prices are expected to drop significantly after the mass availability of Ascend 950 supernodes in the second half of the year.

核心优势

百万上下文普惠化：1M Token超长上下文成为官方服务标配，突破传统注意力机制的二次计算瓶颈，使长文本任务与测试时缩放真正可行。

Democratizing million-token context: The 1M token ultra-long context becomes standard in the official service, breaking the quadratic computation bottleneck of traditional attention, making long-text tasks and test‑time scaling truly viable.
极致长上下文效率：通过CSA压缩稀疏注意力与HCA重度压缩注意力的混合架构，1M上下文下V4-Pro的单Token推理FLOPs仅为V3.2的27%，KV缓存仅10%，Flash版更是低至10%与7%。

Extreme long-context efficiency: With the hybrid CSA+HCA architecture, V4-Pro's FLOPs per token at 1M context is only 27% of V3.2, and KV cache is only 10%; Flash version goes as low as 10% and 7% respectively.
开源模型性能新标杆：V4-Pro-Max在知识、推理、代码竞赛等评测中全面领先前代开源模型，Agent编码能力内部评测优于Claude Sonnet 4.5，交付质量接近Opus 4.6非思考模式。

New open-source performance benchmark: V4-Pro-Max surpasses previous open-source models across knowledge, reasoning, and coding benchmarks; internal agent coding evaluations outperform Claude Sonnet 4.5 and approach Opus 4.6 non-thinking mode quality.
双版本灵活覆盖：Pro版（1.6T/49B）定位顶级性能，Flash版（284B/13B）以极小激活参数实现接近的推理能力，API价格低至Pro的1/12，普惠不同预算场景。

Flexible dual‑version coverage: Pro (1.6T/49B) targets top-tier performance; Flash (284B/13B) achieves near‑Pro reasoning with minimal active parameters, API cost as low as 1/12 of Pro, serving various budget scenarios.
Agent能力原生增强：针对Claude Code、OpenClaw等主流Agent框架专项优化，支持跨用户消息边界的连贯推理保留，在SWE、Terminal Bench等Agent评测中表现优异。

Native agent enhancement: Specifically optimized for mainstream agent frameworks like Claude Code and OpenClaw; supports coherent reasoning retention across user message boundaries, excelling in agent benchmarks such as SWE and Terminal Bench.

项目地址

HuggingFace模型库：https://huggingface.co/collections/deepseek-ai/deepseek-v4

HuggingFace model collection: https://huggingface.co/collections/deepseek-ai/deepseek-v4
技术论文：https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

Technical paper: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

竞品对比


对比维度	DeepSeek-V4-Pro	Claude Opus 4.6	Kimi K2.6
模型定位	开源高性能MoE	闭源顶级通用	开源Agent智能
开源状态	完全开源	闭源API	开源/开放API
总参数量	1.6T	未公开	未公开
激活参数	49B	未公开	未公开
上下文长度	1M Token	200K	1M Token
核心架构	CSA+HCA混合注意力	传统Transformer	MoE+长上下文
MMLU-Pro	87.5	89.1	87.1
SimpleQA	57.9	46.2	36.9
Codeforces	3206	–	–
SWE Verified	80.6	80.8	80.2
Terminal Bench	67.9	65.4	66.7
MRCR 1M	83.5	92.9	–
API输入价格	12元/百万Token	约150元/百万Token	约60元/百万Token
长上下文效率	KV缓存仅为V3.2的10%	标准KV缓存	高效但细节未公开

Competitor Comparison

Dimension DeepSeek-V4-Pro Claude Opus 4.6 Kimi K2.6

Model positioning Open-source high-performance MoE Closed-source top-tier general Open-source agent intelligence

Open-source status Fully open-source Closed API Open-source / open API

Total parameters 1.6T Not disclosed Not disclosed

Active parameters 49B Not disclosed Not disclosed

Context length 1M tokens 200K 1M tokens

Core architecture CSA+HCA hybrid attention Traditional Transformer MoE + long context

MMLU-Pro 87.5 89.1 87.1

SimpleQA 57.9 46.2 36.9

Codeforces 3206 – –

SWE Verified 80.6 80.8 80.2

Terminal Bench 67.9 65.4 66.7

MRCR 1M 83.5 92.9 –

API input price 12 RMB/M tokens ~150 RMB/M tokens ~60 RMB/M tokens

Long-context efficiency KV cache = 10% of V3.2 Standard KV cache Efficient but details not disclosed


Dimension	DeepSeek-V4-Pro	Claude Opus 4.6	Kimi K2.6
Model positioning	Open-source high-performance MoE	Closed-source top-tier general	Open-source agent intelligence
Open-source status	Fully open-source	Closed API	Open-source / open API
Total parameters	1.6T	Not disclosed	Not disclosed
Active parameters	49B	Not disclosed	Not disclosed
Context length	1M tokens	200K	1M tokens
Core architecture	CSA+HCA hybrid attention	Traditional Transformer	MoE + long context
MMLU-Pro	87.5	89.1	87.1
SimpleQA	57.9	46.2	36.9
Codeforces	3206	–	–
SWE Verified	80.6	80.8	80.2
Terminal Bench	67.9	65.4	66.7
MRCR 1M	83.5	92.9	–
API input price	12 RMB/M tokens	~150 RMB/M tokens	~60 RMB/M tokens
Long-context efficiency	KV cache = 10% of V3.2	Standard KV cache	Efficient but details not disclosed

应用场景

长文档分析：支持百万字级论文、报告、法律合同的全文理解与跨章节推理。

Long-document analysis: Supports full-text understanding and cross‑chapter reasoning for millions-of-words papers, reports, and legal contracts.
智能体编码：在Claude Code、OpenClaw等框架中执行复杂代码生成、重构与调试任务。

Agent coding: Executes complex code generation, refactoring, and debugging in frameworks like Claude Code and OpenClaw.
多轮工具调用：在Agent工作流中保留完整推理历史，支持跨用户消息边界的连贯思考。

Multi-turn tool calling: Retains complete reasoning history in agent workflows, enabling coherent thinking across user message boundaries.
知识密集型问答：在世界知识评测中大幅领先开源模型，适用于教育、科研与专业咨询。

Knowledge-intensive Q&A: Significantly outperforms open-source models in world knowledge benchmarks, suitable for education, research, and professional consulting.
白领办公任务：模型在中文写作、信息分析、文档生成与编辑等场景表现优异。

Office automation tasks: Excels in Chinese writing, information analysis, document generation, and editing.

本文基于DeepSeek官方发布的技术资料整理，所有数据截至2026年4月。

This article is compiled from official DeepSeek technical materials; all data as of April 2026.

常见问题（FAQ）

DeepSeek-V4的上下文长度是多少？