GEO

最新文章

50
DeepSeek是否从GPT蒸馏而来?2025知识蒸馏技术分析 | Geoz.com.cn

DeepSeek是否从GPT蒸馏而来?2025知识蒸馏技术分析 | Geoz.com.cn

Knowledge distillation is a model training technique where a smaller student model learns from a larger teacher model, improving efficiency while maintaining performance. This article analyzes whether DeepSeek models were distilled from GPT, examining data, logits, and feature distillation methods. (知识蒸馏是一种模型训练技术,通过教师-学生架构让小模型从大模型中学习知识,在提升效率的同时保持性能。本文深入分析DeepSeek是否从GPT蒸馏而来,探讨数据蒸馏、Logits蒸馏和特征蒸馏三种方法。)
DeepSeek2026/2/16
阅读全文 →
Qwen3重磅发布:开源大模型新标杆,双思考模式引领AI新浪潮

Qwen3重磅发布:开源大模型新标杆,双思考模式引领AI新浪潮

Qwen3 is the latest open-source large language model series featuring dual thinking modes (reasoning vs. fast response), support for 119 languages, and enhanced agent capabilities. It includes both dense and MoE architectures with models ranging from 0.6B to 235B parameters, all released under Apache 2.0 license. (Qwen3是最新开源的大型语言模型系列,具备双思考模式(推理与快速响应)、支持119种语言和增强的Agent能力。包含密集和MoE架构,模型参数从0.6B到235B不等,均以Apache 2.0许可证开源。)
AI大模型2026/1/24
阅读全文 →
DeepSeek V4前瞻:代码提交揭示下一代AI模型的架构革新与编程能力飞跃

DeepSeek V4前瞻:代码提交揭示下一代AI模型的架构革新与编程能力飞跃

DeepSeek is reportedly developing a new flagship AI model, DeepSeek V4, with enhanced coding capabilities, set to launch around Chinese New Year in mid-February. Recent GitHub code updates reveal a new model identifier "MODEL1" with distinct technical features including KV cache layout, sparsity handling, and FP8 decoding support, suggesting optimized memory and computational efficiency. The model may also incorporate recent research on optimized residual connections and biologically-inspired AI memory modules. (DeepSeek据称正在开发新一代旗舰AI模型DeepSeek V4,具备更强的编程能力,计划于2月中旬农历新年期间发布。近期GitHub代码更新显示新的模型标识符“MODEL1”具有独特技术特征,包括键值缓存布局、稀疏性处理和FP8解码支持,表明在内存优化和计算效率方面进行了针对性设计。该模型可能整合优化残差连接和受生物学启发的AI记忆模块等最新研究成果。)
DeepSeek2026/1/24
阅读全文 →
DeepSeek发布FlashMLA:专为Hopper GPU优化的高效MLA解码内核,AI推理性能大幅提升

DeepSeek发布FlashMLA:专为Hopper GPU优化的高效MLA解码内核,AI推理性能大幅提升

FlashMLA is an efficient MLA decoding kernel optimized for NVIDIA Hopper GPUs, delivering up to 3000 GB/s memory bandwidth and 580 TFLOPS compute performance while reducing KV cache requirements by 93.3% for faster, more cost-effective AI inference. (FlashMLA是DeepSeek针对NVIDIA Hopper GPU优化的高效MLA解码内核,在内存受限配置下可达3000 GB/s带宽,计算受限配置下可达580 TFLOPS峰值性能,同时将KV缓存需求减少93.3%,实现更快、更经济的AI推理。)
DeepSeek2026/1/23
阅读全文 →
DeepSeek FlashMLA代码分析:揭秘未公开的MODEL1高效推理架构

DeepSeek FlashMLA代码分析:揭秘未公开的MODEL1高效推理架构

DeepSeek's FlashMLA repository reveals two distinct model architectures: V3.2 optimized for maximum performance and precision, and MODEL1 designed for efficiency and deployability with lower memory footprint and specialized long-sequence handling. (DeepSeek的FlashMLA代码库揭示了两种不同的模型架构:V3.2针对最大性能和精度优化,而MODEL1则针对效率和可部署性设计,具有更低的内存占用和专门的长序列处理能力。)
DeepSeek2026/1/23
阅读全文 →
FlashMLA:DeepSeek开源的高效MLA解码内核,专为NVIDIA Hopper GPU优化

FlashMLA:DeepSeek开源的高效MLA解码内核,专为NVIDIA Hopper GPU优化

FlashMLA is an open-source, high-performance Multi-Head Linear Attention (MLA) decoding kernel optimized for NVIDIA Hopper architecture GPUs, designed to handle variable-length sequences efficiently. It enhances memory and computational efficiency through optimized KV caching and BF16 data format support, achieving up to 3000 GB/s memory bandwidth and 580 TFLOPS computational performance on H800 SXM5 GPUs. FlashMLA is ideal for large language model (LLM) inference and natural language processing (NLP) tasks requiring efficient decoding. (FlashMLA是DeepSeek开源的高效MLA解码内核,专为NVIDIA Hopper架构GPU优化,用于处理可变长度序列。通过优化KV缓存和采用BF16数据格式,提升了内存和计算效率,在H800 SXM5 GPU上内存带宽可达3000 GB/s,计算性能可达580 TFLOPS。适用于大语言模型推理和需要高效解码的自然语言处理任务。)
DeepSeek2026/1/23
阅读全文 →