系统提示学习(SPL)如何让LLM自主优化策略?2026年深度解析
System Prompt Learning (SPL) enables Large Language Models to autonomously learn and refine problem-solving strategies through experience, creating a transparent, human-readable database of effective approaches that improves performance on specific task types over time.
原文翻译: 系统提示学习(SPL)让大型语言模型能够通过经验自主学习和优化问题解决策略,创建一个透明、人类可读的有效方法数据库,随着时间的推移在特定任务类型上提升性能。
Introduction
I built a system that lets LLMs automatically learn and improve problem-solving strategies over time, inspired by Andrej Karpathy's idea of a "third paradigm" for LLM learning. The basic idea: instead of using static system prompts, the LLM builds up a database of strategies that actually work for different problem types. When you give it a new problem, it selects the most relevant strategies, applies them, then evaluates how well they worked and refines them.
我构建了一个系统,让大语言模型(LLM)能够随着时间的推移自动学习和改进解决问题的策略,其灵感来源于 Andrej Karpathy 提出的 LLM 学习“第三范式”的理念。核心理念是:不再使用静态的系统提示词,而是让 LLM 建立一个针对不同问题类型真正有效的策略数据库存储LLM学习到的问题解决策略的集合,以人类可读的JSON格式组织,每个策略包含具体的解决步骤和适用条件。。当给出一个新问题时,系统会选择最相关的策略,应用它们,然后评估其效果并进行优化。
For example, after seeing enough word problems, it learned this strategy: 1) Read carefully and identify unknowns, 2) Define variables with units, 3) Write equations, 4) Solve step-by-step, 5) Verify the answer. All strategies are stored as human-readable JSON that you can inspect and edit.
例如,在处理了足够多的文字应用题后,系统学会了这样一个策略:1) 仔细阅读并识别未知数,2) 用单位定义变量,3) 列出方程,4) 分步求解,5) 验证答案。所有策略都以人类可读的 JSON 格式存储,可供检查和编辑。
Core Concept and Implementation
The implementation is an open-source plugin for optillm一个推理优化代理项目,SPL作为其开源插件实现,提供多种推理优化技术,可与各种OpenAI兼容的API配合使用。 (our inference optimization proxy). It works with any OpenAI-compatible API - you just add "spl-" to your model name. Has two modes: inference-only (uses existing strategies) and learning mode (creates and refines strategies).
该实现是 optillm一个推理优化代理项目,SPL作为其开源插件实现,提供多种推理优化技术,可与各种OpenAI兼容的API配合使用。(我们的推理优化代理)的一个开源插件。它适用于任何 OpenAI 兼容的 API——只需在模型名称前加上 "spl-" 即可。它有两种模式:仅推理模式(使用现有策略)和学习模式(创建和优化策略)。
What's interesting is that it bridges the gap between the sophisticated system prompts that production AI uses and the basic prompts most of us work with. Your model literally gets better at the types of problems you throw at it.
有趣的是,它弥合了生产环境 AI 使用的复杂系统提示词与大多数开发者使用的基础提示词之间的差距。你的模型会实实在在地在你所处理的问题类型上变得更好。
Built it because I noticed ChatGPT, Claude etc. have incredibly detailed system prompts with problem-solving frameworks, but most developers use basic prompts and miss out on those performance gains.
构建这个系统的原因是,我注意到 ChatGPT、Claude 等模型拥有包含问题解决框架的极其详细的系统提示词,但大多数开发者只使用基础提示词,从而错失了这些性能提升。
Technical Details and Findings
I tested it on math benchmarks and saw decent improvements - 8.6% better on Arena Hard, 6.67% on AIME24. After 500 queries, the system had created 129 strategies and refined 97 of them.
我在数学基准测试上进行了验证,看到了不错的改进——在 Arena Hard 上提升了 8.6%,在 AIME24 上提升了 6.67%。经过 500 次查询后,系统创建了 129 个策略,并优化了其中的 97 个。
The system maintains two separate limits: a storage limit (max 10 strategies per problem type in the database) and an inference limit (max 3 strategies applied per query). This keeps the database manageable while ensuring the system prompt doesn't get too long.
系统维护两个独立的限制:存储限制(数据库中每种问题类型最多 10 个策略)和推理限制(每次查询最多应用 3 个策略)。这确保了数据库的可管理性,同时防止系统提示词变得过长。
One interesting finding was that strategies only get used for inference once they have at least 5 attempts and a 40% success rate. This prevents the system from applying unproven strategies to new problems.
一个有趣的发现是,策略只有在至少尝试了 5 次且成功率超过 40% 后,才会被用于推理。这防止了系统将未经证实的策略应用于新问题。
The approach works particularly well with reasoning models like DeepSeek-R1 and QwQ - the learned strategies seem to guide their thinking process effectively.
这种方法特别适用于 DeepSeek-R1 和 QwQ 这类推理模型——学习到的策略似乎能有效引导它们的思考过程。
Example of a Learned Strategy
This was the strategy discovered by optiLLM一个推理优化代理项目,SPL作为其开源插件实现,提供多种推理优化技术,可与各种OpenAI兼容的API配合使用。 for solving word problems:
Refined Strategy for Solving Word Problems:
- Understand:
- Read the problem carefully (multiple times).
- Identify the question (what are you trying to find?).
- List all given information (facts, numbers, units).
- Clarify ambiguous terms/units.
- Organize Information & Identify Unknowns:
- Choose an organization method: (e.g., table, diagram, list, drawing).
- Clearly identify the unknowns (what you need to solve for).
- Plan and Translate:
- Define all variables with units (e.g.,
p = number of pennies,c = number of compartments). - Identify relationships between knowns and unknowns.
- Convert units if necessary.
- Write equations or expressions, including units, that relate the knowns and unknowns.
- Ensure units are consistent throughout the equations.
- Outline the solution steps.
- Define all variables with units (e.g.,
- Solve:
- Show work step-by-step.
- Track units throughout calculations.
- Calculate accurately.
- Solve for the unknowns.
- Evaluate and Verify:
- Check if the answer is reasonable.
- Verify the answer.
- Summarize:
- State the answer with units.
这是 optiLLM一个推理优化代理项目,SPL作为其开源插件实现,提供多种推理优化技术,可与各种OpenAI兼容的API配合使用。 为解文字应用题发现的策略:
优化后的文字应用题解决策略:
- 理解:
- 仔细阅读问题(多次)。
- 识别问题(你要找什么?)。
- 列出所有已知信息(事实、数字、单位)。
- 澄清模糊的术语/单位。
- 组织信息并识别未知数:
- 选择一种组织方法:(例如,表格、图表、列表、绘图)。
- 明确识别未知数(你需要求解什么)。
- 计划和转化:
- 用单位定义所有变量(例如,
p = 便士数量,c = 隔间数量)。- 识别已知数和未知数之间的关系。
- 必要时转换单位。
- 写出关联已知数和未知数的方程或表达式,包括单位。
- 确保整个方程中的单位一致。
- 概述解决步骤。
- 求解:
- 逐步展示工作。
- 在整个计算过程中跟踪单位。
- 准确计算。
- 求解未知数。
- 评估和验证:
- 检查答案是否合理。
- 验证答案。
- 总结:
- 陈述答案并带上单位。
Future Directions and Community Questions
The approach is inspired by Andrej Karpathy's tweet about a "third paradigm" for LLM learning beyond just pretraining and fine-tuning. The strategies are completely transparent - you can see exactly what the system learned and why it's making certain decisions. No black box learning.
该方法的灵感来源于 Andrej Karpathy 关于 LLM 学习“第三范式”(超越预训练和微调)的推文。策略是完全透明的——你可以确切地看到系统学到了什么以及它为何做出某些决策。没有黑盒学习。
I'm especially curious about:
- How this might work with different model families (这种方法在不同模型家族上的效果如何)
- Whether the community sees value in sharing strategy databases between users (社区是否认为在用户之间共享策略数据库存储LLM学习到的问题解决策略的集合,以人类可读的JSON格式组织,每个策略包含具体的解决步骤和适用条件。有价值)
- Ideas for extending beyond text-based reasoning to multimodal problems (将应用范围从基于文本的推理扩展到多模态问题的想法)
Next I'm thinking about meta-learning - having the system learn how to create better strategies more efficiently. Also exploring collaborative strategy sharing.
接下来,我正在考虑元学习——让系统学习如何更高效地创建更好的策略。同时也在探索协作策略共享。
Conclusion
System Prompt Learning presents a novel approach to enhancing LLM capabilities through experience-driven strategy refinement. By creating a dynamic, transparent, and self-improving library of problem-solving techniques, it moves beyond static prompting towards adaptive intelligence. The open-source nature of the project invites further experimentation and collaboration to explore the full potential of this "third paradigm" in LLM learning.
系统提示词学习提出了一种通过经验驱动的策略优化来增强 LLM 能力的新方法。通过创建一个动态、透明且自我改进的解题技巧库,它超越了静态提示,走向了自适应智能。该项目的开源性质欢迎进一步的实验和协作,以探索 LLM 学习中这种“第三范式”的全部潜力。
Project Resources:
- GitHub Repository: https://github.com/codelion/optillm/tree/main/optillm/plugins/spl
- Inspiration Source: Andrej Karpathy's Tweet on the "Third Paradigm"
常见问题(FAQ)
系统提示学习(SPL)具体是如何让大语言模型变得更聪明的?
SPL让模型通过经验自主学习和优化问题解决策略,建立一个透明、可读的策略数据库存储LLM学习到的问题解决策略的集合,以人类可读的JSON格式组织,每个策略包含具体的解决步骤和适用条件。。当遇到新问题时,它会选择并应用最相关的策略,然后评估效果并持续优化,从而在特定任务上不断提升性能。
SPL插件怎么使用?需要复杂的配置吗?
它是一个开源插件,适用于任何OpenAI兼容的API。使用非常简单:只需在模型名称前加上“spl-”即可。提供两种模式:仅推理模式(使用现有策略)和学习模式(创建和优化策略)。
SPL在实际测试中效果如何?有什么具体数据吗?
在数学基准测试中,SPL带来了显著提升:Arena Hard性能提高8.6%,AIME24提高6.67%。经过500次查询后,系统自动创建了129个策略,并优化了其中97个,证明了其持续学习的能力。
版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。
文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。
若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。