DeepSeek-V2.5震撼发布:通用对话与代码能力完美融合的开源AI新标杆
DeepSeek正式发布V2.5版本,成功融合通用对话与专业代码能力,在安全性和实用性方面实现双重优化,现已在HuggingFace平台开源。
今天,我们正式宣布DeepSeek-V2.5An upgraded version of the DeepSeek-V2 series, released in September 2024, positioned between V2 and V3.的发布——这是DeepSeek-V2-ChatAn AI model specialized in general conversational dialogue capabilities.和DeepSeek-Coder-V2An AI model specialized in code processing and generation tasks.两大模型的完美融合成果。这款全新的开源模型不仅保留了原有Chat模型的通用对话能力和Coder模型的强大代码处理能力,还在人类偏好对齐方面实现了显著优化。
全能型AI模型的诞生
DeepSeek-V2.5An upgraded version of the DeepSeek-V2 series, released in September 2024, positioned between V2 and V3.在写作任务、指令跟随等多个关键领域都实现了大幅提升,为用户带来更简洁、智能、高效的使用体验。模型现已在网页端及API全面上线,API接口保持向前兼容,用户通过deepseek-coder或deepseek-chat均可访问这一全新模型。
核心功能保持不变
- Function Calling:完整的函数调用能力
- FIM补全:增强的代码补全功能
- Json Output:标准化的JSON输出格式
模型升级历程
DeepSeek团队一直致力于模型的持续改进和优化。回顾升级历程:
- 6月份重大升级:用Coder V2的Base模型替换原有Chat的Base模型,显著提升代码生成和推理能力,发布DeepSeek-V2-Chat-0628June 2024 version of DeepSeek-V2-Chat with Coder V2 base model integration.版本
- 7月份对齐优化:DeepSeek-Coder-V2An AI model specialized in code processing and generation tasks.在原有Base模型基础上,通过对齐优化大幅提升通用能力,推出0724版本
- 最终融合:成功将Chat和Coder两个模型合并,推出全新的DeepSeek-V2.5An upgraded version of the DeepSeek-V2 series, released in September 2024, positioned between V2 and V3.版本
重要提示:由于本次模型版本变动较大,如果在某些场景中出现效果变差的情况,建议重新调整System Prompt和Temperature参数,以获得最佳性能。
通用能力评测表现
基准测试结果
我们使用业界通用的测试集对DeepSeek-V2.5An upgraded version of the DeepSeek-V2 series, released in September 2024, positioned between V2 and V3.进行全面测评。在中文和英文四个核心测试集上,DeepSeek-V2.5An upgraded version of the DeepSeek-V2 series, released in September 2024, positioned between V2 and V3.的表现均优于之前的DeepSeek-V2-0628以及DeepSeek-Coder-V2-0724July 2024 version of DeepSeek-Coder-V2 with enhanced general capabilities through alignment optimization.版本。
竞品对比优势
在我们内部的中文评测中,DeepSeek-V2.5An upgraded version of the DeepSeek-V2 series, released in September 2024, positioned between V2 and V3.与GPT-4o miniA smaller version of GPT-4o used as a competitive benchmark in evaluations.、ChatGPT-4o-latestThe latest version of ChatGPT based on GPT-4o architecture used for competitive comparison.的对战胜率(裁判为GPT-4oAn AI language model developed by OpenAI, known for its advanced natural language processing capabilities.)相较于DeepSeek-V2-0628均有明显提升。评测涵盖创作、问答等通用能力领域,用户体验将得到实质性改善。
安全能力优化
安全性与实用性之间的平衡一直是DeepSeek迭代开发的重点关注领域。在DeepSeek-V2.5An upgraded version of the DeepSeek-V2 series, released in September 2024, positioned between V2 and V3.版本中,我们对模型安全问题的边界做了更加清晰的划分:
| 模型版本 | 安全综合得分* | 安全外溢比例** |
|---|---|---|
| DeepSeek-V2-0628 | 74.4% | 11.3% |
| DeepSeek-V2.5An upgraded version of the DeepSeek-V2 series, released in September 2024, positioned between V2 and V3. | 82.6% | 4.6% |
*基于内部测试集合的得分,分数越高代表模型的整体安全性越高
**基于内部测试集合的得分,比例越低代表模型的安全策略对于正常问题的影响越小
关键改进:
- 强化模型对各种越狱攻击的安全性
- 减少安全策略过度泛化到正常问题的倾向
代码能力保持领先
在代码处理方面,DeepSeek-V2.5An upgraded version of the DeepSeek-V2 series, released in September 2024, positioned between V2 and V3.完整保留了DeepSeek-Coder-V2-0724July 2024 version of DeepSeek-Coder-V2 with enhanced general capabilities through alignment optimization.的强大能力:
基准测试表现
- HumanEval PythonA benchmark test suite for evaluating Python code generation capabilities of AI models.:显著改进
- LiveCodeBenchA benchmark test for evaluating AI performance in coding, where Gemini has achieved leading results.(2024年1月-9月):显著改进
- HumanEval MultilingualA benchmark test suite for evaluating code generation capabilities across multiple programming languages.:DeepSeek-Coder-V2-0724July 2024 version of DeepSeek-Coder-V2 with enhanced general capabilities through alignment optimization.略胜一筹
- AiderA benchmark test where Gemini AI has achieved leading performance results.测试:DeepSeek-Coder-V2-0724July 2024 version of DeepSeek-Coder-V2 with enhanced general capabilities through alignment optimization.略胜一筹
- SWE-verifiedA testing framework for evaluating software engineering capabilities of AI models.测试:两个版本表现均较低,需要进一步优化
实际应用优化
- FIM补全任务:内部评测集DS-FIM-EvalDeepSeek's internal evaluation dataset for measuring Fill-in-the-Middle code completion performance.评分提升5.1%,带来更好的插件补全体验
- 代码常见场景:针对实际使用场景进行优化
- 主观评测:在DS-Arena-CodeDeepSeek's internal arena-style evaluation platform for subjective code generation assessments.中,对战竞品的胜率(GPT-4oAn AI language model developed by OpenAI, known for its advanced natural language processing capabilities.为裁判)取得显著提升
开源承诺
秉承持久的开源精神,DeepSeek-V2.5An upgraded version of the DeepSeek-V2 series, released in September 2024, positioned between V2 and V3.现已开源至HuggingFaceA platform for sharing and collaborating on machine learning models and datasets.平台:
开源地址:https://huggingface.co/deepseek-ai/DeepSeek-V2.5
DeepSeek团队将继续致力于推动开源AI生态的发展,为开发者和研究者提供更强大、更易用的AI工具。
Data Analysis
| 模型版本 | 安全综合得分* | 安全外溢比例** |
|---|---|---|
| DeepSeek-V2-0628 | 74.4% | 11.3% |
| DeepSeek-V2.5An upgraded version of the DeepSeek-V2 series, released in September 2024, positioned between V2 and V3. | 82.6% | 4.6% |
| *基于内部测试集合的得分,分数越高代表模型的整体安全性越高 | ||
| **基于内部测试集合的得分,比例越低代表模型的安全策略对于正常问题的影响越小 |
版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。
文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。
若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。