什么是思维链技术？解锁大语言模型推理能力详解

Unlocking the Reasoning Capabilities of Large Language Models: A Detailed Look at Chain-of-Thought (CoT) Technology

背景：从提示学习到思维链

Background: From Prompt Learning to Chain-of-Thought

2021年，提示学习（prompt learning）浪潮兴起，而早在2020年，OpenAI 就在论文 Language Models are Few-Shot Learners 中提出了如何使用 prompt learning 提升大模型的推理能力。论文中提出了 Zero-shot、One-shot、Few-shot 三种不同的 prompt 方法。

In 2021, the wave of prompt learning surged. As early as 2020, OpenAI proposed in the paper Language Models are Few-Shot Learners how to use prompt learning to enhance the reasoning capabilities of large models. The paper introduced three different prompting methods: Zero-shot, One-shot, and Few-shot.

图1: 不同提示方法与微调的对比

Figure 1: Comparison of different prompting methods and fine-tuning

Few-Shot (FS)：指模型在推理时给予少量样本，但不允许进行权重更新。Few-shot 的主要优点是大幅度降低了对特定任务数据的需求，并减少了从微调数据集中学习过度狭窄分布。主要缺点是该方法的结果迄今为止远不如最先进的微调模型。
One-Shot (1S)：与 Few-Shot 类似，只允许一个样本。将 One-Shot 与 Few-Shot、Zero-Shot 区分开的原因是它最接近某些任务与人类沟通的方式。
Zero-Shot (0S)：不允许提供样本，只给出描述任务的自然语言指令。该方法提供了最大的方便性、稳健性以及避免虚假相关的可能性，但也是最具挑战性的设置。

Few-Shot (FS): Refers to providing the model with a small number of examples during inference without allowing weight updates. The main advantage of Few-shot is that it significantly reduces the need for task-specific data and mitigates learning an overly narrow distribution from a fine-tuning dataset. Its main drawback is that its results so far are far inferior to state-of-the-art fine-tuned models.

One-Shot (1S): Similar to Few-Shot, but only one example is allowed. The reason for distinguishing One-Shot from Few-Shot and Zero-Shot is that it most closely resembles how humans communicate for certain tasks.

Zero-Shot (0S): No examples are allowed; only a natural language instruction describing the task is given. This method offers the greatest convenience, robustness, and potential to avoid spurious correlations, but it is also the most challenging setting.

然而，即使是 Few-Shot，这种方法对于需要逻辑推理的问题（如简单的算术应用题）效果依然有限。于是，思维链（Chain-of-Thought，CoT）很自然地被提出了。

However, even Few-Shot has significant limitations for problems requiring logical reasoning, such as simple arithmetic word problems. Consequently, Chain-of-Thought (CoT) was naturally proposed.

大模型“涌现”的思维链

The "Emergent" Chain-of-Thought in Large Models

在大模型领域，“涌现”指的是当模型突破某个规模时，性能显著提升，表现出让人惊艳、意想不到的能力，如逻辑推理能力。强大的逻辑推理是大语言模型“智能涌现”出的核心能力之一，而推理能力的关键，在于思维链技术。

In the field of large models, "emergence" refers to a significant performance improvement and the display of surprising, unexpected capabilities—such as logical reasoning—when a model scales beyond a certain threshold. Powerful logical reasoning is one of the core emergent capabilities of large language models, and the key to this reasoning ability lies in Chain-of-Thought (CoT) technology.

1. 思维链的开山之作：Chain-of-Thought Prompting

The Seminal Work: Chain-of-Thought Prompting

1.1 提出者与核心思想

1.1 The Proposer and Core Idea

思维链的概念由华人科学家 Jason Wei 等人于2022年初提出。其核心思想是通过向大语言模型展示少量包含推理过程的示例，引导模型在回答问题时也展示其推理步骤，从而得出更准确的结果。

The concept of Chain-of-Thought was proposed by Chinese scientist Jason Wei and others in early 2022. Its core idea is to guide large language models to also display their reasoning steps when answering questions by showing them a few examples that include the reasoning process, thereby leading to more accurate results.

1.2 技术细节与效果

1.2 Technical Details and Effectiveness

CoT 在实现上修改了提示中每个示例的“目标”（target）。Source（输入）保持不变，但 target 从原先的最终答案（answer, a）替换成了推理依据（rationale, r）加上答案（a），即 r + a。模型被引导生成推理过程而不仅仅是答案。

CoT modifies the "target" of each example in the prompt during implementation. The source (input) remains unchanged, but the target is replaced from the original final answer (a) to the rationale (r) plus the answer (a), i.e., r + a. The model is guided to generate the reasoning process, not just the answer.

以一个数学题为例：
标准提示：问：“罗杰有5个网球。他又买了两罐网球。每罐有3个网球。他现在有多少个网球？” 答：“11”
CoT提示：问：“罗杰有5个网球。他又买了两罐网球。每罐有3个网球。他现在有多少个网球？” 答：“罗杰先有5个球。2盒3个网球等于6个，5 + 6 = 11。所以答案是11。”

Take a math problem as an example:
Standard Prompting: Q: "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?" A: "11"
CoT Prompting: Q: "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?" A: "Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11."

这种方法将多步骤推理问题分解为中间步骤，分配更多计算量，使模型的决策过程更具可解释性。实验表明，在数学推理数据集（如GSM8K）上，CoT 能将大模型（如PaLM 540B）的性能提升300%以上，甚至超过当时有监督学习的最优表现。

This method decomposes multi-step reasoning problems into intermediate steps, allocating more computational resources and making the model's decision-making process more interpretable. Experiments show that on mathematical reasoning datasets (e.g., GSM8K), CoT can improve the performance of large models (e.g., PaLM 540B) by over 300%, even surpassing the then state-of-the-art supervised learning performance.

2. 零样本思维链：Zero-shot-CoT

Zero-shot Chain-of-Thought: Zero-shot-CoT

零样本思维链是对 CoT 的后续发展。研究者发现，只需在问题末尾附加“Let‘s think step by step”这句话，大语言模型就能自主生成推理链，从而提升答案准确性。

Zero-shot Chain-of-Thought is a follow-up development to CoT. Researchers found that simply appending the phrase "Let‘s think step by step" to the end of a question can lead large language models to autonomously generate a reasoning chain, thereby improving answer accuracy.

从技术上讲，Zero-shot-CoT 是一个两阶段的流程：

生成推理链：使用包含“Let‘s think step by step”的提示让模型生成思考过程。
提取答案：将第一步生成的推理链和原始问题组合，配合如“The answer is ”这样的提示，激励模型输出最终答案。

Technically, Zero-shot-CoT is a two-stage pipeline:

Generate Reasoning Chain: Use a prompt containing "Let‘s think step by step" to make the model generate the thinking process.

Extract Answer: Combine the reasoning chain generated in the first step with the original question, and use a prompt like "The answer is " to encourage the model to output the final answer.

实验显示，Zero-shot-CoT 能显著提升模型在推理任务上的表现，例如将 GPT-3 在某个任务上的准确率从 17% 提升至 78%。

Experiments show that Zero-shot-CoT can significantly improve model performance on reasoning tasks, for example, increasing GPT-3's accuracy on a certain task from 17% to 78%.

3. 自洽性：通过多数投票提升性能

Self-Consistency: Improving Performance via Majority Voting

自洽性（Self-consistency）是 CoT 的一个重要改进。其核心思想不是只生成一条推理路径，而是利用大模型的随机性（通过调整温度等参数）生成多条不同的推理路径，然后从这些路径得出的答案中选取出现次数最多的那个作为最终答案。

Self-Consistency is an important improvement to CoT. Its core idea is not to generate only one reasoning path, but to leverage the randomness of large models (by adjusting parameters like temperature) to generate multiple different reasoning paths, and then select the most frequent answer from these paths as the final answer.

这种方法类似于“三个臭皮匠，顶个诸葛亮”，通过聚合多个可能不完美的推理过程来获得更可靠的结果。实验证明，自洽性能够持续稳定地提升 CoT 在各种推理任务上的性能。

This method is similar to "two heads are better than one," obtaining more reliable results by aggregating multiple potentially imperfect reasoning processes. Experiments prove that Self-Consistency can consistently and steadily improve CoT's performance on various reasoning tasks.

4. 最少到最多提示：Least-to-Most Prompting

Least-to-Most Prompting

最少到最多提示过程将思维链提示进一步发展，专门用于解决复杂问题。其策略分为两步：

分解：将原问题分解为一系列顺序相关的子问题。
依次解决：逐个解决子问题，并将已解决的子问题及其答案作为上下文，用于解决下一个子问题，直至解决最终问题。

Least-to-Most prompting further develops Chain-of-Thought prompting, specifically for solving complex problems. Its strategy is divided into two steps:

Decomposition: Break down the original problem into a series of sequentially related sub-problems.

Sequential Solving: Solve the sub-problems one by one, using the solved sub-problems and their answers as context to address the next sub-problem until the final problem is solved.

这种方法模仿了人类解决复杂问题时的策略（“分而治之”），并且可以与 CoT 结合使用（即在解决每个子问题时使用 CoT）。在诸如 SCAN（指令映射）等复杂任务上，LtM 配合 Codex 模型能将准确率从 16% 提升至接近 100%。

This method mimics the human strategy for solving complex problems ("divide and conquer") and can be combined with CoT (i.e., using CoT when solving each sub-problem). On complex tasks like SCAN (instruction mapping), LtM combined with the Codex model can increase accuracy from 16% to nearly 100%.

5. 指令微调与 CoT 的结合：Flan-PaLM/T5

Combining Instruction Tuning with CoT: Flan-PaLM/T5

Google 的 Flan 系列工作探索了如何通过大规模指令微调来极大提升模型的泛化能力。Flan-PaLM/T5 的关键创新之一是将 CoT 数据纳入微调任务。

Google's Flan series of work explored how to significantly improve model generalization through large-scale instruction tuning. One key innovation of Flan-PaLM/T5 is the incorporation of CoT data into the tuning tasks.

其方法核心包括：

任务统一：将 1800+ 个 NLP 任务统一为相同的文本输入输出格式。
提示模板：根据任务是否需要推理（CoT）和是否需要示例（Few-shot），设计了四种提示模板。
混合训练：在众多任务中混合了少量（如9个）CoT 任务进行微调。

The core of its method includes:

Task Unification: Unifying over 1800+ NLP tasks into the same text input-output format.

Prompt Templates: Designing four prompt templates based on whether the task requires reasoning (CoT) and whether it requires examples (Few-shot).

Mixed Training: Mixing a small number (e.g., 9) of CoT tasks among many tasks for fine-tuning.

结果表明，即使 CoT 任务只占微调任务的很小一部分，也能显著提升模型在所有任务（包括非 CoT 任务）上的整体推理和泛化能力，且不会对非推理任务造成损害。这证明了 CoT 与指令微调框架的优秀兼容性。

The results show that even if CoT tasks constitute only a small fraction of the tuning tasks, they can significantly improve the model's overall reasoning and generalization capabilities on all tasks (including non-CoT tasks) without harming performance on non-reasoning tasks. This demonstrates the excellent compatibility of CoT with the instruction tuning framework.

6. 知识蒸馏：让小模型获得推理能力

Knowledge Distillation: Enabling Reasoning in Smaller Models

CoT 的一个主要局限是依赖超大规模模型（百亿/千亿参数）。为了将推理能力赋能给更易部署的小模型，研究者提出了 Fine-tune-CoT 方法。

A major limitation of CoT is its reliance on extremely large-scale models (tens/hundreds of billions of parameters). To empower smaller, more deployable models with reasoning capabilities, researchers proposed the Fine-tune-CoT method.

其核心流程是利用具备强大 Zero-shot-CoT 能力的大模型（如 ChatGPT）作为“教师”，为大量问题生成带有推理步骤的答案，然后用这些生成的〈问题，推理链，答案〉数据对“学生”小模型进行微调。这种方法无需人工标注推理数据，即可将推理能力有效迁移到小模型上。

Its core pipeline uses a large model with strong Zero-shot-CoT capabilities (e.g., ChatGPT) as a "teacher" to generate answers with reasoning steps for a large number of questions. Then, these generated 〈question, reasoning chain, answer〉 data are used to fine-tune the "student" small model. This method effectively transfers reasoning capabilities to smaller models without the need for manually annotated reasoning data.

7. 思维链的局限性

Limitations of Chain-of-Thought

尽管强大，思维链仍有明显局限：

规模依赖：CoT 的“涌现”能力严重依赖模型规模（通常需百亿参数以上），限制了其在资源受限场景的应用。
领域局限：目前 CoT 主要在数学、常识推理等特定领域效果显著，在其他任务（如机器翻译）上的提升有待验证。
并非真正理解：即使使用 CoT，模型仍可能犯低级计算错误（如 6*13=68）。这表明其推理更多是“模式模仿”而非真正理解数学或逻辑原理。
错误传播：在 LtM 等多步方法中，前期子步骤的错误会导致后续步骤失败。

Despite its power, Chain-of-Thought still has significant limitations:

Scale Dependence: The "emergent" capability of CoT heavily relies on model scale (typically requiring tens of billions of parameters or more), limiting its application in resource-constrained scenarios.

Domain Limitations: Currently, CoT shows significant effects mainly in specific domains like mathematics and commonsense reasoning; its improvement on other tasks (e.g., machine translation) remains to be verified.

Not True Understanding: Even with CoT, models can still make basic calculation errors (e.g., 6*13=68). This suggests their reasoning is more "pattern imitation" than true understanding of mathematical or logical principles.

Error Propagation: In multi-step methods like LtM, errors in early sub-steps can lead to failure in subsequent steps.

总结与展望

Summary and Outlook

思维链及其衍生技术（Zero-shot-CoT， Self-Consistency， LtM）通过显式地引导大语言模型展示中间推理步骤，显著提升了其在数学、常识推理等复杂任务上的性能。它不仅是提升模型能力的有效工具，也增强了模型决策过程的透明度和可解释性。

Chain-of-Thought and its derivative technologies (Zero-shot-CoT, Self-Consistency, LtM) significantly improve the performance of large language models on complex tasks like mathematics and commonsense reasoning by explicitly guiding them to display intermediate reasoning steps. It is not only an effective tool for enhancing model capabilities but also increases the transparency and interpretability of the model's decision-making process.

未来的方向包括：

探索在更小模型上实现有效推理的技术（如更高效的知识蒸馏）。
将 CoT 原则扩展到更多模态和任务类型。
深入研究 CoT 的机理，并与模型的可解释性研究结合。
克服其当前局限，例如对错误更鲁棒，以及实现更深层次的理解而非表面模仿。

Future directions include:

Exploring techniques to achieve effective reasoning in smaller models (e.g., more efficient knowledge distillation).

Extending CoT principles to more modalities and task types.

Deepening the study of CoT mechanisms and integrating it with model interpretability research.

Overcoming its current limitations, such as being more robust to errors and achieving deeper understanding rather than superficial imitation.

思维链揭示了大语言模型当前的能力边界：它们擅长利用海量数据与算力进行模式关联和生成，但在需要精确、抽象、深层次逻辑理解的领域，依然面临挑战。善用如 CoT 这样的工具，让人脑的精确逻辑与 AI 的生成能力协同工作，才是智能时代的有效策略。

Chain-of-Thought reveals the current capability boundaries of large language models: they excel at pattern association and generation using massive data and computing power, but still face challenges in areas requiring precise, abstract, and deep logical understanding. Effectively using tools like CoT to synergize the precise logic of the human brain with