
DeepSeek是否从GPT蒸馏而来?2025知识蒸馏技术分析 | Geoz.com.cn
Knowledge distillation is a model training technique where a smaller student model learns from a larger teacher model, improving efficiency while maintaining performance. This article analyzes whether DeepSeek models were distilled from GPT, examining data, logits, and feature distillation methods. (知识蒸馏是一种模型训练技术,通过教师-学生架构让小模型从大模型中学习知识,在提升效率的同时保持性能。本文深入分析DeepSeek是否从GPT蒸馏而来,探讨数据蒸馏、Logits蒸馏和特征蒸馏三种方法。)
DeepSeek2026/2/16
阅读全文 →