GEO
热门DeepSeek

DeepSeek-V4百万上下文开源大模型性能如何?值得用吗?

2026/4/29
DeepSeek-V4百万上下文开源大模型性能如何?值得用吗?

AI Summary (BLUF)

DeepSeek-V4 is a preview of the next-generation large language model with 1M context, leading open-source performance in knowledge, reasoning, and agent capabilities. It comes in Pro and Flash version

引言

DeepSeek-V4是DeepSeek推出的新一代大语言模型系列预览版,拥有百万字超长上下文窗口,在Agent能力、世界知识与推理性能方面达到开源领域领先水平。模型包含 deepseek-v4-pro 和 deepseek-v4-flash 两个版本,分别定位高性能与经济高效,均已开源并提供API服务,支持非思考与思考双模式,为长文本处理与智能体应用提供普惠化基础设施。

DeepSeek-V4 is a preview of the next-generation large language model series from DeepSeek, featuring a million-character ultra-long context window and achieving leading open-source performance in agent capabilities, world knowledge, and reasoning. The series includes two versions: deepseek-v4-pro (high performance) and deepseek-v4-flash (cost-efficient), both open-sourced and available via API. It supports both non-thinking and thinking modes, providing affordable infrastructure for long-text processing and agent applications.

主要功能

  • 百万上下文处理:原生支持1M Token超长文本理解与记忆,为官方服务标配。

    Million-token context handling: Natively supports understanding and memory of up to 1M tokens, standard for the official service.

  • 混合注意力机制:CSA与HCA架构大幅降低长上下文计算与显存开销。

    Hybrid attention mechanism: CSA and HCA architectures significantly reduce computation and memory overhead for long contexts.

  • 双模式推理:支持非思考与思考模式,后者可通过 reasoning_effort 参数调节强度。

    Dual-mode reasoning: Supports non-thinking and thinking modes; the latter can be tuned via the reasoning_effort parameter.

  • 多领域专家融合:通过OPD蒸馏整合数学、代码、Agent等领域专家能力。

    Multi-domain expert fusion: Integrates expert capabilities in math, code, agents, etc., via OPD distillation.

  • 经济高效选择:Flash版本用更低参数实现接近Pro的推理性能,API成本显著降低。

    Cost-efficient option: The Flash version achieves reasoning performance close to Pro with far fewer parameters, significantly reducing API costs.

技术原理

  • CSA压缩稀疏注意力:将每m个token的KV压缩为1个条目,通过Lightning Indexer计算索引分数并执行Top-k稀疏选择,结合滑动窗口与Attention Sink机制保留局部依赖。

    CSA (Compressed Sparse Attention): Compresses the KV of every m tokens into one entry, uses a Lightning Indexer to compute index scores and perform top‑k sparse selection, and retains local dependencies with sliding windows and Attention Sink.

  • HCA重度压缩注意力:以更大压缩比m’将KV条目合并为单个条目,保持密集注意力而不采用稀疏选择,进一步降低计算量。

    HCA (Heavy Compression Attention): Merges KV entries into a single entry with an even larger compression ratio m’, maintaining dense attention without sparse selection to further reduce computation.

  • mHC流形约束超连接:将残差映射矩阵通过Sinkhorn-Knopp算法投影到双随机矩阵流形,约束谱范数不超过1,增强深层信号传播稳定性。

    mHC (Manifold-Constrained Hyper-Connection): Projects the residual mapping matrix onto the doubly stochastic matrix manifold using the Sinkhorn-Knopp algorithm, constraining the spectral norm to ≤1 to enhance deep signal propagation stability.

  • Muon优化器:模型采用混合Newton-Schulz迭代对梯度矩阵进行正交化,分快速收敛与精确稳定两个阶段,支持大规模MoE高效训练。

    Muon optimizer: Uses a hybrid Newton-Schulz iteration to orthogonalize the gradient matrix, divided into fast-convergence and precise-stability phases, enabling efficient training of large MoE models.

  • FP4量化感知训练:对MoE专家权重和CSA索引器QK路径进行FP4量化,用FP8扩展动态范围实现无损反量化,降低内存与计算开销。

    FP4 quantization-aware training: Quantizes MoE expert weights and the CSA indexer’s QK path to FP4, uses FP8 to extend dynamic range for lossless dequantization, reducing memory and computation overhead.

性能体现

知识能力

评测指标 DeepSeek-V4-Pro 对比模型表现
SimpleQA-Verified 57.9% Gemini-3.1-Pro: 75.6%
Chinese-SimpleQA 84.4% K2.6: 75.9%, GLM-5.1: 75.0%
MMLU-Pro 87.5% GPT-5.4: ≈87.5%, Gemini-3.1-Pro: >90%
GPQA Diamond 90.1% GPT-5.4: ~90%, Gemini-3.1-Pro: >92%

Knowledge Capabilities

Metric DeepSeek-V4-Pro Competitor Performance
SimpleQA-Verified 57.9% Gemini-3.1-Pro: 75.6%
Chinese-SimpleQA 84.4% K2.6: 75.9%, GLM-5.1: 75.0%
MMLU-Pro 87.5% GPT-5.4: ≈87.5%, Gemini-3.1-Pro: >90%
GPQA Diamond 90.1% GPT-5.4: ~90%, Gemini-3.1-Pro: >92%

推理与代码能力

评测指标 DeepSeek-V4-Pro 对比模型表现
HMMT 2026 Feb 95.2% K2.6: ~92%, GLM-5.1: ~90%, GPT-5.4: ~96%
IMOAnswerBench 89.8% GPT-5.4: ~91%, Opus-4.6: ~88%
Codeforces Rating 3206 GPT-5.4: 3168
Apex Shortlist 90.2% GPT-5.4: 78.1%, Opus-4.6: 85.9%
LiveCodeBench 93.5% All competitors: <90%

Reasoning & Coding

Metric DeepSeek-V4-Pro Competitor Performance
HMMT 2026 Feb 95.2% K2.6: ~92%, GLM-5.1: ~90%, GPT-5.4: ~96%
IMOAnswerBench 89.8% GPT-5.4: ~91%, Opus-4.6: ~88%
Codeforces Rating 3206 GPT-5.4: 3168
Apex Shortlist 90.2% GPT-5.4: 78.1%, Opus-4.6: 85.9%
LiveCodeBench 93.5% All competitors: <90%

Agent能力

评测指标 DeepSeek-V4-Pro 对比模型表现
SWE Verified 80.6% Opus-4.6: 80.8%
SWE Pro 55.4%
SWE Multilingual 76.2%
Terminal Bench 2.0 67.9% K2.6: 66.7%, GLM-5.1: 63.5%, Opus-4.6: 65.4%
MCPAtlas Public 73.6%
Toolathlon 51.8%

Agent Capabilities

Metric DeepSeek-V4-Pro Competitor Performance
SWE Verified 80.6% Opus-4.6: 80.8%
SWE Pro 55.4%
SWE Multilingual 76.2%
Terminal Bench 2.0 67.9% K2.6: 66.7%, GLM-5.1: 63.5%, Opus-4.6: 65.4%
MCPAtlas Public 73.6%
Toolathlon 51.8%

长上下文能力

评测指标 DeepSeek-V4-Pro 对比模型表现
MRCR 1M 83.5% Gemini-3.1-Pro: 76.3%
CorpusQA 1M 62.0% Gemini-3.1-Pro: 53.8%

Long-Context Capabilities

Metric DeepSeek-V4-Pro Competitor Performance
MRCR 1M 83.5% Gemini-3.1-Pro: 76.3%
CorpusQA 1M 62.0% Gemini-3.1-Pro: 53.8%

效率表现

指标 DeepSeek-V4-Pro DeepSeek-V4-Flash 对比(V3.2)
单Token推理FLOPs (1M上下文) V3.2的27% V3.2的10%
累计KV缓存 (1M上下文) V3.2的10% V3.2的7%
专家权重存储 FP4 理论可再提升1/3效率

Efficiency

Metric DeepSeek-V4-Pro DeepSeek-V4-Flash Comparison (vs V3.2)
FLOPs per token (1M context) 27% of V3.2 10% of V3.2
Total KV cache (1M context) 10% of V3.2 7% of V3.2
Expert weight storage FP4 Theoretical further 1/3 efficiency gain

如何使用DeepSeek-V4

  • 网页端/App:访问 DeepSeek 官网或官方App,选择专家模式(Pro)或快速模式(Flash)。

    Web/App: Visit DeepSeek's official website or app, select Expert mode (Pro) or Fast mode (Flash).

  • API调用:修改 model 参数为 deepseek-v4-prodeepseek-v4-flashbase_url 保持不变。

    API call: Set the model parameter to deepseek-v4-pro or deepseek-v4-flash; keep base_url unchanged.

  • 思考模式:复杂Agent场景建议启用思考模式并设置 reasoning_effort: max

    Thinking mode: For complex agent scenarios, enable thinking mode and set reasoning_effort: max.

  • 本地部署:通过Hugging Face或ModelScope下载开源权重自行部署。

    Local deployment: Download open-source weights from Hugging Face or ModelScope for self‑deployment.

关键信息和使用要求

  • 版本规格:Pro版1.6T参数/49B激活,Flash版284B参数/13B激活,预训练数据分别为33T与32T。

    Model specs: Pro: 1.6T total / 49B active parameters; Flash: 284B total / 13B active parameters. Pre-training data: 33T and 32T tokens respectively.

  • 上下文长度:两个版本均支持1M Token,旧接口 deepseek-chatdeepseek-reasoner 将于2026-07-24停用。

    Context length: Both versions support 1M tokens. Legacy interfaces deepseek-chat and deepseek-reasoner will be deprecated on 2026-07-24.

  • API定价(每百万Token):Pro输入缓存命中1元/未命中12元,输出24元;Flash输入缓存命中0.2元/未命中1元,输出2元。

    API pricing (per million tokens): Pro – input cached 1 RMB, uncached 12 RMB, output 24 RMB; Flash – input cached 0.2 RMB, uncached 1 RMB, output 2 RMB.

  • 算力限制:Pro版当前服务吞吐有限,预计下半年昇腾950超节点批量上市后价格将大幅下调。

    Capacity constraints: The Pro version currently has limited throughput; prices are expected to drop significantly after the mass availability of Ascend 950 supernodes in the second half of the year.

核心优势

  • 百万上下文普惠化:1M Token超长上下文成为官方服务标配,突破传统注意力机制的二次计算瓶颈,使长文本任务与测试时缩放真正可行。

    Democratizing million-token context: The 1M token ultra-long context becomes standard in the official service, breaking the quadratic computation bottleneck of traditional attention, making long-text tasks and test‑time scaling truly viable.

  • 极致长上下文效率:通过CSA压缩稀疏注意力与HCA重度压缩注意力的混合架构,1M上下文下V4-Pro的单Token推理FLOPs仅为V3.2的27%,KV缓存仅10%,Flash版更是低至10%与7%。

    Extreme long-context efficiency: With the hybrid CSA+HCA architecture, V4-Pro's FLOPs per token at 1M context is only 27% of V3.2, and KV cache is only 10%; Flash version goes as low as 10% and 7% respectively.

  • 开源模型性能新标杆:V4-Pro-Max在知识、推理、代码竞赛等评测中全面领先前代开源模型,Agent编码能力内部评测优于Claude Sonnet 4.5,交付质量接近Opus 4.6非思考模式。

    New open-source performance benchmark: V4-Pro-Max surpasses previous open-source models across knowledge, reasoning, and coding benchmarks; internal agent coding evaluations outperform Claude Sonnet 4.5 and approach Opus 4.6 non-thinking mode quality.

  • 双版本灵活覆盖:Pro版(1.6T/49B)定位顶级性能,Flash版(284B/13B)以极小激活参数实现接近的推理能力,API价格低至Pro的1/12,普惠不同预算场景。

    Flexible dual‑version coverage: Pro (1.6T/49B) targets top-tier performance; Flash (284B/13B) achieves near‑Pro reasoning with minimal active parameters, API cost as low as 1/12 of Pro, serving various budget scenarios.

  • Agent能力原生增强:针对Claude Code、OpenClaw等主流Agent框架专项优化,支持跨用户消息边界的连贯推理保留,在SWE、Terminal Bench等Agent评测中表现优异。

    Native agent enhancement: Specifically optimized for mainstream agent frameworks like Claude Code and OpenClaw; supports coherent reasoning retention across user message boundaries, excelling in agent benchmarks such as SWE and Terminal Bench.

项目地址

竞品对比

对比维度 DeepSeek-V4-Pro Claude Opus 4.6 Kimi K2.6
模型定位 开源高性能MoE 闭源顶级通用 开源Agent智能
开源状态 完全开源 闭源API 开源/开放API
总参数量 1.6T 未公开 未公开
激活参数 49B 未公开 未公开
上下文长度 1M Token 200K 1M Token
核心架构 CSA+HCA混合注意力 传统Transformer MoE+长上下文
MMLU-Pro 87.5 89.1 87.1
SimpleQA 57.9 46.2 36.9
Codeforces 3206
SWE Verified 80.6 80.8 80.2
Terminal Bench 67.9 65.4 66.7
MRCR 1M 83.5 92.9
API输入价格 12元/百万Token 约150元/百万Token 约60元/百万Token
长上下文效率 KV缓存仅为V3.2的10% 标准KV缓存 高效但细节未公开

Competitor Comparison

Dimension DeepSeek-V4-Pro Claude Opus 4.6 Kimi K2.6
Model positioning Open-source high-performance MoE Closed-source top-tier general Open-source agent intelligence
Open-source status Fully open-source Closed API Open-source / open API
Total parameters 1.6T Not disclosed Not disclosed
Active parameters 49B Not disclosed Not disclosed
Context length 1M tokens 200K 1M tokens
Core architecture CSA+HCA hybrid attention Traditional Transformer MoE + long context
MMLU-Pro 87.5 89.1 87.1
SimpleQA 57.9 46.2 36.9
Codeforces 3206
SWE Verified 80.6 80.8 80.2
Terminal Bench 67.9 65.4 66.7
MRCR 1M 83.5 92.9
API input price 12 RMB/M tokens ~150 RMB/M tokens ~60 RMB/M tokens
Long-context efficiency KV cache = 10% of V3.2 Standard KV cache Efficient but details not disclosed

应用场景

  • 长文档分析:支持百万字级论文、报告、法律合同的全文理解与跨章节推理。

    Long-document analysis: Supports full-text understanding and cross‑chapter reasoning for millions-of-words papers, reports, and legal contracts.

  • 智能体编码:在Claude Code、OpenClaw等框架中执行复杂代码生成、重构与调试任务。

    Agent coding: Executes complex code generation, refactoring, and debugging in frameworks like Claude Code and OpenClaw.

  • 多轮工具调用:在Agent工作流中保留完整推理历史,支持跨用户消息边界的连贯思考。

    Multi-turn tool calling: Retains complete reasoning history in agent workflows, enabling coherent thinking across user message boundaries.

  • 知识密集型问答:在世界知识评测中大幅领先开源模型,适用于教育、科研与专业咨询。

    Knowledge-intensive Q&A: Significantly outperforms open-source models in world knowledge benchmarks, suitable for education, research, and professional consulting.

  • 白领办公任务:模型在中文写作、信息分析、文档生成与编辑等场景表现优异。

    Office automation tasks: Excels in Chinese writing, information analysis, document generation, and editing.


本文基于DeepSeek官方发布的技术资料整理,所有数据截至2026年4月。

This article is compiled from official DeepSeek technical materials; all data as of April 2026.

常见问题(FAQ)

DeepSeek-V4的上下文长度是多少?

DeepSeek-V4原生支持百万Token(1M)超长上下文,可处理大规模文本,适用于长文档分析和Agent应用。

DeepSeek-V4支持哪两种推理模式?

支持非思考模式和思考模式,思考模式可通过reasoning_effort参数调节推理强度,平衡性能与效率。

DeepSeek-V4 Pro和Flash版本有什么区别?

Pro版本参数更多,性能更高;Flash版本参数较少,但推理性能接近Pro,API成本显著降低,适合经济高效场景。

← 返回文章列表
分享到:微博

版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。

文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。

若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。