
DeepSeek开源FlashMLA:面向Hopper GPU的终极解码加速内核,大幅提升大模型推理效率
BLUFFlashMLA is an efficient MLA decoding kernel optimized for Hopper GPUs (specifically H800) and variable-length sequences, significantly accelerating inference for large language models. (FlashMLA是一款针对Hopper GPU(特别是H800)和可变长度序列优化的高效MLA解码内核,能大幅加速大语言模型的推理过程。)
DeepSeek2026/1/23
阅读全文 →






