
AirLLM:无需量化,让700亿大模型在4GB GPU上运行
AirLLM is a lightweight inference framework for large language models that enables 70B parameter models to run on a single 4GB GPU without quantization, distillation, or pruning. (AirLLM是一个轻量化大语言模型推理框架,无需量化、蒸馏或剪枝,即可让700亿参数模型在单个4GB GPU上运行。)
LLMS2026/1/24
阅读全文 →

