GEO

GLM通用语言模型是什么?2026年技术解析与应用指南

2026/3/14
GLM通用语言模型是什么?2026年技术解析与应用指南
AI Summary (BLUF)

GLM (General Language Model) is an autoregressive blank-filling language model developed by THUDM, supporting both English and Chinese tasks with models up to 10B parameters, including specialized Chinese versions and ChatGLM-6B for dialogue.

原文翻译: GLM(通用语言模型)是由THUDM开发的自回归空白填充语言模型,支持中英文任务,参数量最高达100亿,包含专门的中文版本和用于对话的ChatGLM-6B。

GLM (General Language Model) 是一个基于自回归空白填充目标进行预训练的通用语言模型,可以针对多种自然语言理解和生成任务进行微调。

GLM (General Language Model) 是一个基于自回归空白填充目标进行预训练的通用语言模型,可以针对多种自然语言理解和生成任务进行微调。

有关 GLM 的详细描述,请参阅我们的论文:

GLM: General Language Model Pretraining with Autoregressive Blank Infilling (ACL 2022)
Zhengxiao Du*, Yujie Qian*, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang (*: 同等贡献)

有关 GLM 的详细描述,请参阅我们的论文:
GLM: General Language Model Pretraining with Autoregressive Blank Infilling (ACL 2022)
Zhengxiao Du*, Yujie Qian*, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang (*: 同等贡献)

最新动态:我们发布了 ChatGLM-6B,这是一个基于 GLM 框架、拥有 60 亿参数、针对中文问答和对话优化的开源预训练语言模型。

News: We release ChatGLM-6B, an open pre-trained language model with 6 billion parameters optimized for Chinese QA and dialogue based on the GLM framework.

核心概念与模型架构

GLM 的核心创新在于其 自回归空白填充 的预训练目标。与传统的单向自回归模型(如 GPT)或仅编码器模型(如 BERT)不同,GLM 将输入文本中的部分片段随机替换为 [MASK] 标记,然后以自回归的方式(从左到右)预测这些被遮盖的片段。这种方法巧妙地统一了理解和生成任务。

The core innovation of GLM lies in its autoregressive blank-filling pre-training objective. Unlike traditional unidirectional autoregressive models (e.g., GPT) or encoder-only models (e.g., BERT), GLM randomly replaces parts of the input text with [MASK] tokens and then predicts these masked spans in an autoregressive manner (left to right). This approach elegantly unifies understanding and generation tasks.

为了实现这一目标,GLM 引入了 2D 位置编码片段重排 技术。2D 位置编码能同时表示一个标记在原始文本中的位置和在被遮盖片段内的位置。片段重排则允许模型以任意顺序预测被遮盖的片段,增强了模型的灵活性。

To achieve this, GLM introduces 2D positional encoding and span shuffling techniques. The 2D positional encoding can represent both a token's position in the original text and its position within a masked span. Span shuffling allows the model to predict masked spans in any arbitrary order, enhancing the model's flexibility.

预训练模型概览

论文中使用的预训练模型可以从 OneDrive清华云盘 下载。

The pretrained models used in the paper can be downloaded from OneDrive or Tsinghua-Cloud.

下表列出了主要的预训练模型:

The following table lists the main pretrained models:

模型名称 参数量 语言 训练语料 训练目标 文件 配置文件
GLM-Base 110M 英文 Wiki+Book Token glm-base-blank.tar.bz2 model_blocklm_base.sh
GLM-Large 335M 英文 Wiki+Book Token glm-large-blank.tar.bz2 model_blocklm_large.sh
GLM-Large-Chinese 335M 中文 WuDaoCorpora Token+Sent+Doc glm-large-chinese.tar.bz2 model_blocklm_large_chinese.sh
GLM-Doc 335M 英文 Wiki+Book Token+Doc glm-large-generation.tar.bz2 model_blocklm_large_generation.sh
GLM-410M 410M 英文 Wiki+Book Token+Doc glm-1.25-generation.tar.bz2 model_blocklm_1.25_generation.sh
GLM-515M 515M 英文 Wiki+Book Token+Doc glm-1.5-generation.tar.bz2 model_blocklm_1.5_generation.sh
GLM-RoBERTa 335M 英文 RoBERTa Token glm-roberta-large-blank.tar.bz2 model_blocklm_roberta_large.sh
GLM-2B 2B 英文 Pile Token+Sent+Doc glm-2b.tar.bz2 model_blocklm_2B.sh
GLM-10B 10B 英文 Pile Token+Sent+Doc 下载 model_blocklm_10B.sh
GLM-10B-Chinese 10B 中文 WuDaoCorpora Token+Sent+Doc 下载 model_blocklm_10B_chinese.sh
Name Params Language Corpus Objective File Config
GLM-Base 110M English Wiki+Book Token glm-base-blank.tar.bz2 model_blocklm_base.sh
GLM-Large 335M English Wiki+Book Token glm-large-blank.tar.bz2 model_blocklm_large.sh
GLM-Large-Chinese 335M Chinese WuDaoCorpora Token+Sent+Doc glm-large-chinese.tar.bz2 model_blocklm_large_chinese.sh
GLM-Doc 335M English Wiki+Book Token+Doc glm-large-generation.tar.bz2 model_blocklm_large_generation.sh
GLM-410M 410M English Wiki+Book Token+Doc glm-1.25-generation.tar.bz2 model_blocklm_1.25_generation.sh
GLM-515M 515M English Wiki+Book Token+Doc glm-1.5-generation.tar.bz2 model_blocklm_1.5_generation.sh
GLM-RoBERTa 335M English RoBERTa Token glm-roberta-large-blank.tar.bz2 model_blocklm_roberta_large.sh
GLM-2B 2B English Pile Token+Sent+Doc glm-2b.tar.bz2 model_blocklm_2B.sh
GLM-10B 10B English Pile Token+Sent+Doc Download model_blocklm_10B.sh
GLM-10B-Chinese 10B Chinese WuDaoCorpora Token+Sent+Doc Download model_blocklm_10B_chinese.sh

注意:下载文件后,请将其解压到本地文件夹,并在相应的脚本中将 CHECKPOINT_PATH 设置为该文件夹路径。

Note: After downloading the file, unzip it into a local folder and set CHECKPOINT_PATH in the corresponding scripts to the folder path.

性能表现

GLM 在多个基准测试中展现了强大的性能,证明了其作为通用骨干模型的有效性。

GLM has demonstrated strong performance across multiple benchmarks, proving its effectiveness as a general-purpose backbone model.

SuperGLUE 基准测试

SuperGLUE 开发集上,采用单模型、单任务微调的结果如下(GLM-10B 表现优异):

Results on the SuperGLUE dev set with single-model, single-task finetuning are as follows (GLM-10B performs excellently):

模型 COPA WSC RTE WiC CB MultiRC BoolQ ReCoRD
GLM-10B 98.0 95.2 93.1 75.7 98.7/98.2 88.1/63.3 88.7 94.4/94.0
DeBERTa-XXLarge-v2 97.0 - 93.5 - - 87.8/63.6 88.3 94.1/93.7

序列到序列生成任务

在文本摘要任务上,GLM-10B 同样取得了具有竞争力的结果。

On text summarization tasks, GLM-10B also achieves competitive results.

CNN/Daily Mail (测试集,未使用额外数据):

CNN/Daily Mail (test set, no additional data used):

模型 ROUGE-1 ROUGE-2 ROUGE-L
GLM-10B 44.7 21.4 41.4
T5-11B 43.5 21.6 40.7
PEGASUS-Large 44.2 21.5 41.4
BART-Large 44.2 21.3 40.9

XSum (测试集,未使用额外数据):

XSum (test set, no additional data used):

模型 ROUGE-1 ROUGE-2 ROUGE-L
GLM-10B 48.9 25.7 40.4
PEGASUS-Large 47.2 24.6 39.3
BART-Large 45.1 22.3 37.3

语言建模能力

在零样本设置下的语言建模评估中,GLM-10B 也表现不俗。

In zero-shot language modeling evaluation, GLM-10B also performs well.

模型 LAMBADA (准确率) Wikitext103 (困惑度)
GLM-10B (双向) 72.35 11.33
GLM-10B (单向) 67.18 12.22
GPT-2 52.66 17.48
Megatron-LM (8.3B) 66.51 10.81
Turing-NLG 67.98 10.21

快速开始

通过 Hugging Face Hub 使用

你可以通过 HuggingFace Hub 访问 GLM 模型。请安装 transformers>=4.23.1,并在此处查找所有可用模型。

You can access GLM models via HuggingFace Hub. Please install transformers>=4.23.1 and find all the available models here.

生成示例

以下代码展示了如何使用 GLM-10B 进行空白填充生成:

The following code shows how to use GLM-10B for blank-filling generation:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-10b", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("THUDM/glm-10b", trust_remote_code=True)
model = model.half().cuda()
model.eval()

# 推理
inputs = tokenizer("Ng is an adjunct professor at [MASK] (formerly associate professor and Director of its Stanford AI Lab or SAIL ). Also a pioneer in online education, Ng co-founded Coursera and deeplearning.ai.", return_tensors="pt")
inputs = tokenizer.build_inputs_for_generation(inputs, max_gen_length=512)
inputs = inputs.to('cuda')
outputs = model.generate(**inputs, max_length=512, eos_token_id=tokenizer.eop_token_id)
print(tokenizer.decode(outputs[0].tolist()))

# 训练
inputs = tokenizer(
    ["Tsinghua University is located in [MASK].", "One minus one equals zero, is it correct? Answer: [MASK]"],
    return_tensors="pt", padding=True)
inputs = tokenizer.build_inputs_for_generation(inputs, targets=["Beijing", "No"], max_gen_length=8, padding=False)
inputs = inputs.to('cuda')
outputs = model(**inputs)
loss = outputs.loss
logits = outputs.logits

分类示例

以下代码展示了如何将 GLM 用于多项选择任务:

The following code shows how to use GLM for multiple-choice tasks:

from transformers import AutoTokenizer, AutoModelForMultipleChoice
tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-10b", trust_remote_code=True)
model = AutoModelForMultipleChoice.from_pretrained("THUDM/glm-10b", trust_remote_code=True)
model = model.half().cuda()
model.eval()

inputs = tokenizer(["Tsinghua University is located in [MASK].",
                    "One minus one equals zero, is it correct? Answer: [MASK]"], return_tensors="pt", padding=True)
choices = [["Beijing", "Shanghai"], ["Yes", "No"]]
inputs = tokenizer.build_inputs_for_multiple_choice(inputs, choices)
inputs = inputs.to('cuda')
outputs = model(**inputs)
logits = outputs.logits

提示:你也可以使用 scripts/convert_glm_checkpoint_to_transformers.py 脚本将微调后的检查点转换为 Transformers 格式。

Tip: You can also convert the finetuned checkpoints with scripts/convert_glm_checkpoint_to_transformers.py.

手动安装与环境配置

  1. 克隆仓库:

    1. Clone the repo:
    git clone https://github.com/THUDM/GLM
    cd GLM
    
  2. 安装依赖:

    1. Install dependencies:
      请先安装 PyTorch(我们使用 1.7.0 版本)和 apex,然后通过 pip install -r requirements.txt 安装其他依赖项。

      Please first install PyTorch (we use 1.7.0) and apex, and then install other dependencies by pip install -r requirements.txt.

  3. 模型并行(可选,用于大模型):

    1. Model Parallelism (Optional, for large models):
      如果遇到 CUDA out of memory 错误,意味着 GPU 内存有限,可以尝试使用模型并行将参数划分到多个 GPU 上。以双向模型并行为例,首先运行 change_mp.py 来划分检查点:

      If your encounter the CUDA out of memory error, which means you GPU memory is limited, you can try the model parallelism to divide the parameters into multiple GPUs. Take the two-way model parallelism as an example. First run change_mp.py to divide the checkpoint:

    python change_mp.py path_to_the_checkpoint 2
    

    然后更新模型配置文件(如 config_tasks/model_blocklm_10B.sh)中的检查点路径,并将脚本(如 scripts/ds_finetune_superglue.sh)中的 MP_SIZE 改为 2

    Then update the checkpoint path in the model

常见问题(FAQ)

GLM模型的核心创新点是什么?

GLM的核心创新在于自回归空白填充预训练目标,它通过随机遮盖文本片段并以自回归方式预测,统一了理解和生成任务,并引入了2D位置编码和片段重排技术增强灵活性。

GLM有哪些专门的中文版本模型?

GLM提供专门的中文版本,包括GLM-Large-Chinese(3.35亿参数)和GLM-10B-Chinese(100亿参数),均使用悟道语料库训练,支持中文自然语言任务。

如何快速开始使用GLM进行开发?

可通过Hugging Face Hub直接使用GLM模型,或从论文提供的清华云盘/OneDrive链接下载预训练模型文件,最新开源的ChatGLM-6B(60亿参数)专门针对中文对话优化。

← 返回文章列表
分享到:微博

版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。

文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。

若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。