DeepSeek-V3震撼发布:性能比肩GPT-4o,生成速度提升3倍!
DeepSeek-V3正式发布,这款6710亿参数的MoE模型在多项评测中比肩GPT-4o和Claude-3.5-Sonnet,生成速度提升3倍,API服务同步更新并提供45天优惠体验期。
今天,我们迎来了全新系列模型DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed.的首个版本正式上线并同步开源!登录官网chat.deepseek.com即可与最新版V3模型进行对话。API服务已同步更新,接口配置无需任何改动。需要说明的是,当前版本的DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed.暂不支持多模态输入输出功能。
性能表现:与顶尖闭源模型并驾齐驱
DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed.是我们自主研发的MoE(混合专家)模型,拥有6710亿参数,激活参数达370亿,在14.8万亿token上进行了深度预训练。
论文链接:https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf
令人振奋的是,DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed.在多项评测中不仅超越了Qwen2.5-72BA 72-billion parameter open-source large language model developed by Alibaba Cloud's Qwen team.和Llama-3.1-405BA 405-billion parameter open-source large language model developed by Meta.等优秀开源模型,更在性能上与全球顶尖的闭源模型GPT-4oAn AI language model developed by OpenAI, known for its advanced natural language processing capabilities.以及Claude-3.5-SonnetA leading closed-source large language model developed by Anthropic, noted for its strong performance on knowledge and coding tasks.不相上下!
各领域能力详解
百科知识:在知识类任务(MMLUA benchmark test (Massive Multitask Language Understanding) where Gemini Ultra scored over 90%, achieving human-expert level.、MMLUA benchmark test (Massive Multitask Language Understanding) where Gemini Ultra scored over 90%, achieving human-expert level.-Pro、GPQAGraduate-Level Google-Proof Q&A, a challenging benchmark for expert-level knowledge in biology, physics, and chemistry.、SimpleQA)上,DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed.相比前代DeepSeek-V2.5有显著提升,已接近当前表现最佳的模型Claude-3.5-SonnetA leading closed-source large language model developed by Anthropic, noted for its strong performance on knowledge and coding tasks.-1022。
长文本处理:在长文本测评中,包括DROPDiscrete Reasoning Over Paragraphs, a reading comprehension benchmark requiring discrete reasoning.、FRAMES和LongBench v2A benchmark suite for evaluating long-context capabilities of language models.等测试集上,DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed.的平均表现超越了其他所有模型。
代码能力:在算法类代码场景(CodeforcesA competitive programming platform used as a benchmark for algorithmic coding ability in AI models.)中,DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed.遥遥领先于市面上所有非o1类模型;在工程类代码场景(SWE-Bench VerifiedA benchmark for evaluating software engineering capabilities, specifically for resolving real-world GitHub issues.)中,已逼近Claude-3.5-SonnetA leading closed-source large language model developed by Anthropic, noted for its strong performance on knowledge and coding tasks.-1022的水平。
数学推理:在美国数学竞赛(AIME 2024A benchmark test used to evaluate the performance of reasoning models, particularly in mathematical reasoning.、MATHA benchmark dataset of challenging high school and undergraduate-level mathematics problems.)和全国高中数学联赛(CNMO 2024Chinese National Mathematics Olympiad 2024, a premier high school mathematics competition in China used as a benchmark.)上,DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed.大幅超越了所有开源和闭源模型。
中文能力:在教育类测评C-EvalA comprehensive Chinese evaluation benchmark for assessing language model capabilities in Chinese contexts.和代词消歧等评测集上,DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed.与Qwen2.5-72BA 72-billion parameter open-source large language model developed by Alibaba Cloud's Qwen team.表现相近,但在事实知识C-SimpleQAA Chinese factual knowledge question-answering benchmark.上更为领先。
生成速度:实现3倍飞跃
通过算法和工程上的创新突破,DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed.的生成吐字速度从20 TPS大幅提升至60 TPS,相比V2.5模型实现了3倍的惊人提升,为用户带来前所未有的流畅使用体验。
API服务:性能升级,价格优化
随着性能更强、速度更快的DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed.更新上线,我们的模型API服务定价也进行了相应调整:
- 每百万输入tokens:0.5元(缓存命中)/ 2元(缓存未命中)
- 每百万输出tokens:8元
限时优惠:45天超值体验
为了回馈广大用户,我们特别为全新模型设置了长达45天的优惠价格体验期:从即日起至2025年2月8日,DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed.的API服务价格将维持大家熟悉的优惠价:
- 每百万输入tokens:0.1元(缓存命中)/ 1元(缓存未命中)
- 每百万输出tokens:2元
已经注册的老用户和在此期间内注册的新用户均可享受以上优惠价格!
开源支持:本地部署更便捷
DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed.采用FP88-bit floating-point precision, a numerical format used for efficient training and inference of large neural networks.训练,并开源了原生FP88-bit floating-point precision, a numerical format used for efficient training and inference of large neural networks.权重。得益于开源社区的鼎力支持:
- SGLangA framework or library that supports native FP8 inference for models like DeepSeek-V3.和LMDeployA toolkit for deploying and serving large language models, supporting native FP8 inference for DeepSeek-V3.已第一时间支持V3模型的原生FP88-bit floating-point precision, a numerical format used for efficient training and inference of large neural networks.推理
- TensorRT-LLMNVIDIA's TensorRT-LLM, an SDK for optimizing and deploying LLMs, supporting BF16 inference for DeepSeek-V3.和MindIEA tool or framework that supports BF16 inference for models like DeepSeek-V3.则实现了BF16Brain Floating Point 16-bit, a numerical format offering a good balance between range and precision for deep learning.推理
- 为方便社区适配和拓展应用场景,我们提供了从FP88-bit floating-point precision, a numerical format used for efficient training and inference of large neural networks.到BF16Brain Floating Point 16-bit, a numerical format offering a good balance between range and precision for deep learning.的转换脚本
模型权重下载和更多本地部署信息:https://huggingface.co/deepseek-ai/DeepSeek-V3-Base
展望未来:持续创新,共创AGI
“以开源精神和长期主义追求普惠AGI”是DeepSeek一直以来的坚定信念。我们非常兴奋能与社区分享在模型预训练方面的阶段性进展,也十分欣喜地看到开源模型和闭源模型的能力差距正在进一步缩小。
这只是一个全新的开始!未来我们将在DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed.基座模型上继续打造深度思考、多模态等更加丰富的功能,并将持续与社区分享我们最新的探索成果。让我们携手共进,共创人工智能的美好未来!
Data Analysis
| 特性/能力 | DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed. 表现 |
|---|---|
| 模型类型与规模 | MoE (混合专家) 模型,6710亿总参数,370亿激活参数 |
| 预训练数据量 | 14.8万亿 token |
| 整体性能对标 | 与 GPT-4oAn AI language model developed by OpenAI, known for its advanced natural language processing capabilities.、Claude-3.5-SonnetA leading closed-source large language model developed by Anthropic, noted for its strong performance on knowledge and coding tasks. 不相上下 |
| 百科知识能力 | 接近当前最佳模型 Claude-3.5-SonnetA leading closed-source large language model developed by Anthropic, noted for its strong performance on knowledge and coding tasks.-1022 |
| 长文本处理能力 | 在 DROPDiscrete Reasoning Over Paragraphs, a reading comprehension benchmark requiring discrete reasoning.、FRAMES、LongBench v2A benchmark suite for evaluating long-context capabilities of language models. 等测试中平均表现超越所有模型 |
| 代码能力 (算法) | 在 CodeforcesA competitive programming platform used as a benchmark for algorithmic coding ability in AI models. 上领先所有非o1类模型 |
| 代码能力 (工程) | 逼近 Claude-3.5-SonnetA leading closed-source large language model developed by Anthropic, noted for its strong performance on knowledge and coding tasks.-1022 (SWE-Bench VerifiedA benchmark for evaluating software engineering capabilities, specifically for resolving real-world GitHub issues.) |
| 数学推理能力 | 在 AIME 2024A benchmark test used to evaluate the performance of reasoning models, particularly in mathematical reasoning.、MATHA benchmark dataset of challenging high school and undergraduate-level mathematics problems.、CNMO 2024Chinese National Mathematics Olympiad 2024, a premier high school mathematics competition in China used as a benchmark. 上大幅超越所有开源和闭源模型 |
| 中文能力 (综合) | 与 Qwen2.5-72BA 72-billion parameter open-source large language model developed by Alibaba Cloud's Qwen team. 相近,在 C-SimpleQAA Chinese factual knowledge question-answering benchmark. 上更领先 |
| 生成速度 (TPS) | 60 TPS (相比 V2.5 提升3倍) |
| 多模态支持 | 当前版本暂不支持 |
| API 标准价格 (输入) | 0.5元/百万token (缓存命中) / 2元/百万token (缓存未命中) |
| API 标准价格 (输出) | 8元/百万token |
| API 优惠价格 (输入) | 0.1元/百万token (缓存命中) / 1元/百万token (缓存未命中) (至2025年2月8日) |
| API 优惠价格 (输出) | 2元/百万token (至2025年2月8日) |
| 开源权重格式 | 原生 FP88-bit floating-point precision, a numerical format used for efficient training and inference of large neural networks. |
| 推理框架支持 | SGLangA framework or library that supports native FP8 inference for models like DeepSeek-V3.、LMDeployA toolkit for deploying and serving large language models, supporting native FP8 inference for DeepSeek-V3. (FP88-bit floating-point precision, a numerical format used for efficient training and inference of large neural networks.推理);TensorRT-LLMNVIDIA's TensorRT-LLM, an SDK for optimizing and deploying LLMs, supporting BF16 inference for DeepSeek-V3.、MindIEA tool or framework that supports BF16 inference for models like DeepSeek-V3. (BF16Brain Floating Point 16-bit, a numerical format offering a good balance between range and precision for deep learning.推理) |
Source/Note: 信息综合自提供的 DeepSeek-V3The third-generation DeepSeek model released in December 2024, featuring strong performance, 128K context length, and 60 TPS generation speed. 发布文本。性能对比基于文中引用的各项评测结果。
版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。
文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。
若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。