如何系统学习AI工程？2026年最全资源推荐（含ML理论到RAG）

Q: 《AI工程》这本书主要涵盖哪些核心技术领域？

本书涵盖ML理论、基础模型、评估、提示工程、RAG、微调、数据集工程、推理优化和架构，并附有论文、案例和工具资源。

ML Theory Fundamentals

While you don't need an ML background to start building with foundation model基础模型，指在大规模数据上预训练的通用AI模型，如GPT、LLaMA等，可通过微调适应多种下游任务。s, a rough understanding of how AI works under the hood is useful to prevent misuse. Familiarity with ML theory will make you much more effective.

虽然你不需要机器学习背景就能开始使用基础模型进行开发，但对AI底层工作原理的大致了解有助于防止误用。熟悉机器学习理论将使你更加高效。

[Lecture notes] Stanford CS 321N: a longtime favorite introductory course on neural networks.
- [Videos] I'd recommend watching lectures 1 to 7 from the 2017 course video recordings. They cover the fundamentals that haven't changed.
- [Videos] Andrej Karpathy's Neural Networks: Zero to Hero is more hands-on where he shows how to implement several models from scratch.

[讲义] 斯坦福CS 321N：一门长期备受青睐的神经网络入门课程。

[视频] 建议观看2017年课程录像的第1至7讲，它们涵盖了未改变的基础知识。

[视频] Andrej Karpathy的《神经网络：从零到精通》更具实践性，展示了如何从头实现多个模型。

[Book] Machine Learning: A Probabilistic Perspective (Kevin P Murphy, 2012)
Foundational, comprehensive, though a bit intense. This used to be many of my friends' go-to book when preparing for theory interviews for research positions.

[书籍] 《机器学习：概率视角》（Kevin P Murphy，2012年）
基础性、全面性，但稍显深入。这曾是我许多朋友准备研究岗位理论面试时的首选书籍。

Aman's Math Primers
A good note that covers basic differential calculus and probability concepts.

Aman的数学入门
一份涵盖基础微积分和概率概念的好笔记。

I also made a list of resources for MLOps, which includes a section for ML + engineering fundamentals.

我还整理了一份MLOps资源列表，其中包含机器学习与工程基础部分。

I wrote a brief 1500-word note on how an ML model learns and concepts like objective function and learning procedure.

我撰写了一篇约1500字的笔记，介绍机器学习模型如何学习，以及目标函数和学习过程等概念。

AI Engineering also covers the important concepts immediately relevant to the discussion:
- Transformer architecture (Chapter 2)
- Embedding (Chapter 3)
- Backpropagation and trainable parameters (Chapter 7)

《AI工程》还涵盖了与讨论直接相关的重要概念：

Transformer架构（第2章）

嵌入（第3章）

反向传播与可训练参数（第7章）

Chapter 1. Planning Applications with Foundation Model基础模型，指在大规模数据上预训练的通用AI模型，如GPT、LLaMA等，可通过微调适应多种下游任务。s

第1章. 使用基础模型规划应用

GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models (OpenAI, 2023)
OpenAI (2023) has excellent research on how exposed different occupations are to AI. They defined a task as exposed if AI and AI-powered software can reduce the time needed to complete this task by at least 50%. An occupation with 80% exposure means that 80% of this occupation tasks are considered exposed. According to the study, occupations with 100% or close to 100% exposure include interpreters and translators, tax preparers, web designers, and writers. Some of them are shown in Figure 1-5. Not unsurprisingly, occupations with no exposure to AI include cooks, stonemasons, and athletes. This study gives a good idea of what use cases AI is good for.

GPT就是GPT：大型语言模型对劳动力市场潜在影响的早期观察（OpenAI，2023年）
OpenAI（2023年）对不同职业受AI影响的程度进行了出色研究。他们将一个任务定义为“受影响”，如果AI或AI驱动的软件能将完成该任务所需时间减少至少50%。一个职业的受影响度为80%，意味着该职业80%的任务被视为受影响。根据该研究，受影响度为100%或接近100%的职业包括口译员和笔译员、税务准备人员、网页设计师和作家。其中部分职业如图1-5所示。不出所料，不受AI影响的职业包括厨师、石匠和运动员。这项研究很好地说明了AI适用于哪些用例。

Applied LLMs (Yan et al., 2024)
Eugene Yan and co. shared their learnings from one year of deploying LLM applications. Many helpful tips!

应用型LLM（Yan等人，2024年）
Eugene Yan及其团队分享了他们一年来部署LLM应用的经验。许多有用的技巧！

Musings on Building a Generative AI Product (Juan Pablo Bottaro and Co-authored by Karthik Ramgopal, LinkedIn, 2024)
One of the best reports I've read on deploying LLM applications: what worked and what didn't. They discussed structured outputs, latency vs. throughput tradeoffs, the challenges of evaluation (they spent most of their time on creating annotation guidelines), and the last-mile challenge of building gen AI applications.

构建生成式AI产品的思考（Juan Pablo Bottaro与Karthik Ramgopal合著，LinkedIn，2024年）
这是我读过关于部署LLM应用的最佳报告之一：哪些有效，哪些无效。他们讨论了结构化输出、延迟与吞吐量的权衡、评估的挑战（他们大部分时间花在创建标注指南上），以及构建生成式AI应用的“最后一公里”挑战。

Apple's human interface guideline for designing ML applications
Outlines how to think about the role of AI and human in your application, which influences the interface decisions.

Apple的人机界面指南，用于设计机器学习应用
概述了如何思考AI和人类在应用中的角色，这会影响界面决策。

LocalLlama subreddit: useful to check from time to time to see what people are up to.

LocalLlama子版块：偶尔查看以了解人们在做什么，很有用。

State of AI Report (updated yearly): very comprehensive. It's useful to skim through to see what you've missed.

AI现状报告（每年更新）：非常全面。快速浏览以了解自己错过了什么，很有用。

16 Changes to the Way Enterprises Are Building and Buying Generative AI (Andreessen Horowitz, 2024)

企业构建和购买生成式AI的16个变化（Andreessen Horowitz，2024年）

"Like Having a Really Bad PA": The Gulf between User Expectation and Experience of Conversational Agents (Luger and Sellen, 2016)
A solid, ahead-of-its-time paper on user experience with conversational agents. It makes a great case for the value of dialogue interfaces and what's needed to make them useful, featuring in-depth interviews with 14 users. "It has been argued that the true value of dialogue interface systems over direct manipulation (GUI) can be found where task complexity is greatest."

“就像有一个非常糟糕的私人助理”：用户对对话代理的期望与体验之间的鸿沟（Luger和Sellen，2016年）
一篇扎实且超前的关于对话代理用户体验的论文。它通过14位用户的深度访谈，有力地论证了对话界面的价值以及使其有用所需的条件。“有人认为，对话界面系统相对于直接操作（GUI）的真正价值，可以在任务复杂性最高的地方找到。”

Stanford Webinar - How AI is Changing Coding and Education, Andrew Ng & Mehran Sahami (2024)
A great discussion that shows how the Stanford's CS department thinks about what CS education will look like in the future. My favorite quote: "CS is about systematic thinking, not writing code."

斯坦福网络研讨会 - AI如何改变编程与教育，Andrew Ng & Mehran Sahami（2024年）
一场精彩的讨论，展示了斯坦福计算机科学系对未来计算机科学教育形态的思考。我最喜欢的一句话：“计算机科学是关于系统性思维，而不是写代码。”

Professional artists: how much has AI art affected your career? - 1 year later : r/ArtistLounge
Many people share their experience on how AI impacted their work. E.g.:
"From time to time, I am sitting in meetings where managers dream of replacing coders, writers and visual artists with AI. I hate those meetings and try to avoid them, but I still get involved from time to time. All my life, I loved coding & art. But nowadays, I often feel this weird sadness in my heart."

专业艺术家：AI艺术对你的职业生涯影响有多大？- 一年后：r/ArtistLounge
许多人分享了AI如何影响他们工作的经历。例如：
“时不时地，我会坐在那些经理们梦想用AI取代程序员、作家和视觉艺术家的会议上。我讨厌那些会议并试图避开，但有时仍会卷入其中。我一生都热爱编程和艺术。但如今，我心中常常感到一种奇怪的悲伤。”

Chapter 2. Understanding Foundation Model基础模型，指在大规模数据上预训练的通用AI模型，如GPT、LLaMA等，可通过微调适应多种下游任务。s

第2章. 理解基础模型

Training large models

训练大型模型

Papers detailing the training process of important models are gold mines. I'd recommend reading all of them. But if you can only pick 3, I'd recommend Gopher, InstructGPT, and Llama 3.

详细描述重要模型训练过程的论文是金矿。我建议阅读所有论文。但如果只能选三篇，我推荐Gopher、InstructGPT和Llama 3。


Paper	Organization	Year	Key Contribution
[GPT-2] Language Models are Unsupervised Multitask Learners	OpenAI	2019	Demonstrated zero-shot task transfer at scale
[GPT-3] Language Models are Few-Shot Learners	OpenAI	2020	Introduced in-context learning with 175B parameters
[Gopher] Scaling Language Models: Methods, Analysis & Insights from Training Gopher	DeepMind	2021	Systematic analysis of scaling at 280B parameters
[InstructGPT] Training language models to follow instructions with human feedback	OpenAI	2022	Pioneered RLHF for instruction following
[Chinchilla] Training Compute-Optimal Large Language Models	DeepMind	2022	Established the Chinchilla scaling law
Qwen technical report	Alibaba	2022	Open-source bilingual model development
Qwen2 Technical Report	Alibaba	2024	Improved architecture and training methodology
Constitutional AI: Harmlessness from AI Feedback	Anthropic	2022	Introduced self-supervised safety training
LLaMA: Open and Efficient Foundation Language Models	Meta	2023	Efficient training with smaller models
Llama 2: Open Foundation and Fine-Tuned Chat Models	Meta	2023	Open-source chat-optimized models
The Llama 3 Herd of Models	Meta	2024	Best paper on synthetic data generation and verification
Yi: Open Foundation Model基础模型，指在大规模数据上预训练的通用AI模型，如GPT、LLaMA等，可通过微调适应多种下游任务。s by 01.AI	01.AI	2024	Bilingual foundation model基础模型，指在大规模数据上预训练的通用AI模型，如GPT、LLaMA等，可通过微调适应多种下游任务。 with competitive performance

论文组织年份关键贡献

[GPT-2] 语言模型是无监督多任务学习者 OpenAI 2019 展示了大规模零样本任务迁移

[GPT-3] 语言模型是少样本学习者 OpenAI 2020 引入175B参数的上下文学习

[Gopher] 扩展语言模型：训练Gopher的方法、分析与见解 DeepMind 2021 280B参数下的系统扩展分析

[InstructGPT] 通过人类反馈训练语言模型遵循指令 OpenAI 2022 开创了用于指令遵循的RLHF

[Chinchilla] 训练计算最优的大型语言模型 DeepMind 2022 确立了Chinchilla扩展定律

Qwen技术报告阿里巴巴 2022 开源双语模型开发

Qwen2技术报告阿里巴巴 2024 改进的架构和训练方法

宪法AI：来自AI反馈的无害性 Anthropic 2022 引入自监督安全训练

LLaMA：开放高效的基础语言模型 Meta 2023 使用更小模型的高效训练

Llama 2：开放基础与微调聊天模型 Meta 2023 开源聊天优化模型

Llama 3模型群 Meta 2024 关于合成数据生成与验证的最佳论文

Yi：01.AI的开放基础模型 01.AI 2024 具有竞争性能的双语基础模型


论文	组织	年份	关键贡献
[GPT-2] 语言模型是无监督多任务学习者	OpenAI	2019	展示了大规模零样本任务迁移
[GPT-3] 语言模型是少样本学习者	OpenAI	2020	引入175B参数的上下文学习
[Gopher] 扩展语言模型：训练Gopher的方法、分析与见解	DeepMind	2021	280B参数下的系统扩展分析
[InstructGPT] 通过人类反馈训练语言模型遵循指令	OpenAI	2022	开创了用于指令遵循的RLHF
[Chinchilla] 训练计算最优的大型语言模型	DeepMind	2022	确立了Chinchilla扩展定律
Qwen技术报告	阿里巴巴	2022	开源双语模型开发
Qwen2技术报告	阿里巴巴	2024	改进的架构和训练方法
宪法AI：来自AI反馈的无害性	Anthropic	2022	引入自监督安全训练
LLaMA：开放高效的基础语言模型	Meta	2023	使用更小模型的高效训练
Llama 2：开放基础与微调聊天模型	Meta	2023	开源聊天优化模型
Llama 3模型群	Meta	2024	关于合成数据生成与验证的最佳论文
Yi：01.AI的开放基础模型	01.AI	2024	具有竞争性能的双语基础模型

Scaling laws

扩展定律


Resource	Year	Key Insight
From bare metal to high performance training: Infrastructure scripts and best practices - imbue	2024	Practical scaling with 4,092 H100 GPUs across 511 computers
Scaling Laws for Neural Language Models (OpenAI)	2020	Early scaling law; up to 1B non-embedding params and 1B tokens
Training Compute-Optimal Large Language Models (Hoffman et al.)	2022	Chinchilla scaling law: most well-known scaling law paper
Scaling Data-Constrained Language Models (Muennighoff et al.)	2023	Training with up to 4 epochs of repeated data yields negligible loss change
Scaling Instruction-Finetuned Language Models (Chung et al.)	2022	Importance of diversity in instruction data
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws (Sardana et al.)	2023	Extended scaling laws to include inference costs
AI models are devouring energy. Tools to reduce consumption are here (MIT Lincoln Laboratory)	2023	Energy consumption analysis and mitigation strategies
Will we run out of data? Limits of LLM scaling based on human-generated data (Villalobos et al.)	2022	Data scarcity projections for continued scaling

资源年份关键见解

从裸机到高性能训练：基础设施脚本与最佳实践 - imbue 2024 使用4092块H100 GPU跨511台计算机的实际扩展

神经语言模型的扩展定律（OpenAI） 2020 早期扩展定律；最多10亿非嵌入参数和10亿token

训练计算最优的大型语言模型（Hoffman等人） 2022 Chinchilla扩展定律：最著名的扩展定律论文

数据受限语言模型的扩展（Muennighoff等人） 2023 使用最多4个epoch的重复数据训练，损失变化可忽略

扩展指令微调语言模型（Chung等人） 2022 指令数据多样性的重要性

超越Chinchilla最优：在语言模型扩展定律中考虑推理（Sardana等人） 2023 将扩展定律扩展到包含推理成本

AI模型正在吞噬能源。减少消耗的工具已存在（MIT林肯实验室） 2023 能耗分析与缓解策略

我们会耗尽数据吗？基于人类生成数据的LLM扩展限制（Villalobos等人） 2022 持续扩展的数据稀缺预测


资源	年份	关键见解
从裸机到高性能训练：基础设施脚本与最佳实践 - imbue	2024	使用4092块H100 GPU跨511台计算机的实际扩展
神经语言模型的扩展定律（OpenAI）	2020	早期扩展定律；最多10亿非嵌入参数和10亿token
训练计算最优的大型语言模型（Hoffman等人）	2022	Chinchilla扩展定律：最著名的扩展定律论文
数据受限语言模型的扩展（Muennighoff等人）	2023	使用最多4个epoch的重复数据训练，损失变化可忽略
扩展指令微调语言模型（Chung等人）	2022	指令数据多样性的重要性
超越Chinchilla最优：在语言模型扩展定律中考虑推理（Sardana等人）	2023	将扩展定律扩展到包含推理成本
AI模型正在吞噬能源。减少消耗的工具已存在（MIT林肯实验室）	2023	能耗分析与缓解策略
我们会耗尽数据吗？基于人类生成数据的LLM扩展限制（Villalobos等人）	2022	持续扩展的数据稀缺预测

Fun stuff

有趣的内容


Resource	Description
Evaluating feature steering: A case study in mitigating social biases (Anthropic, 2024)	Focused on 29 features related to social biases; feature steering can influence specific biases but may cause unexpected off-target effects
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (Anthropic, 2024)	Interpretability research extracting features from production models
GitHub - ianand/spreadsheets-are-all-you-need	Implements GPT2 forward pass entirely in Excel using standard spreadsheet functions
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)	Helpful visualization of multi-head attention in action

资源描述

评估特征引导：缓解社会偏见的案例研究（Anthropic，2024年）聚焦29个与社会偏见相关的特征；特征引导可影响特定偏见，但可能导致意外的非目标效应

扩展单语义性：从Claude 3 Sonnet中提取可解释特征（Anthropic，2024年）从生产模型中提取特征的可解释性研究

GitHub - ianand/spreadsheets-are-all-you-need 完全使用Excel标准电子表格函数实现GPT2前向传播

BertViz：可视化NLP模型中的注意力（BERT、GPT2、BART等）多头注意力机制的可视化工具


资源	描述
评估特征引导：缓解社会偏见的案例研究（Anthropic，2024年）	聚焦29个与社会偏见相关的特征；特征引导可影响特定偏见，但可能导致意外的非目标效应
扩展单语义性：从Claude 3 Sonnet中提取可解释特征（Anthropic，2024年）	从生产模型中提取特征的可解释性研究
GitHub - ianand/spreadsheets-are-all-you-need	完全使用Excel标准电子表格函数实现GPT2前向传播
BertViz：可视化NLP模型中的注意力（BERT、GPT2、BART等）	多头注意力机制的可视化工具

Sampling

采样


Resource	Year	Focus
A Guide to Structured Generation Using Constrained Decoding (Aidan Cooper)	2024	In-depth tutorial on generating structured outputs
Fast JSON Decoding for Local LLMs with Compressed Finite State Machine (LMSYS)	2024	Efficient structured output decoding for local models
How fast can grammar-structured generation be? (Brandon T. Willard)	2024	Performance analysis of grammar-constrained generation

资源年份重点

使用约束解码的结构化生成指南（Aidan Cooper） 2024 生成结构化输出的深入教程

使用压缩有限状态机为本地LLM实现快速JSON解码（LMSYS） 2024 本地模型的高效结构化输出解码

语法结构化生成能有多快？（Brandon T. Willard） 2024 语法约束生成的性能分析


资源	年份	重点
使用约束解码的结构化生成指南（Aidan Cooper）	2024	生成结构化输出的深入教程
使用压缩有限状态机为本地LLM实现快速JSON解码（LMSYS）	2024	本地模型的高效结构化输出解码
语法结构化生成能有多快？（Brandon T. Willard）	2024	语法约束生成的性能分析

I also wrote a post on sampling for text generation (2024).

我还撰写了一篇关于文本生成采样的文章（2024年）。

Context length and context efficiency

上下文长度与上下文效率


Resource	Year	Key Contribution
Everything About Long Context Fine-tuningThe process of adapting a pre-trained model to specific tasks by continuing training on domain-specific data while preserving general knowledge from initial training. (Wenbo Pan)	2024	Comprehensive guide on fine-tuningThe process of adapting a pre-trained model to specific tasks by continuing training on domain-specific data while preserving general knowledge from initial training. for long context
Data Engineering for Scaling Language Models to 128K Context (Yu et al.)	2024	Data preparation strategies for extended context windows
The Secret Sauce behind 100K context window in LLMs (Galina Alperovich)	2023	Collection of tricks for achieving long context windows

资源年份关键贡献

关于长上下文微调的一切（Wenbo Pan） 2024 长上下文微调的全面指南

将语言模型扩展到128K上下文的数据工程（Yu等人） 2024 扩展上下文窗口的数据准备策略

LLM中100K上下文窗口背后的秘诀（Galina Alperovich） 2023 实现长上下文窗口的技巧合集


资源	年份	关键贡献
关于长上下文微调的一切（Wenbo Pan）	2024	长上下文微调的全面指南
将语言模型扩展到128K上下文的数据工程（Yu等人）	2024	扩展上下文窗口的数据准备策略
LLM中100K上下文窗口背后的秘诀（Galina Alperovich）	2023	实现长上下文窗口的技巧合集

This resource guide continues with subsequent chapters covering Evaluation Methodology, Prompt EngineeringA technique for designing and optimizing text prompts to effectively interact with and guide large language models., RAGRetrieval-Augmented Generation - an AI framework that combines information retrieval with language generation to produce more accurate and contextually relevant responses. and Agents, Finetuning, Dataset Engineering, Inference Optimization推理优化，通过模型压缩、硬件加速、缓存等技术减少模型推理时的计算资源和延迟。, and AI Engineering Architecture. For the complete list, please refer to the original source or the full book "AI Engineering".

本资源指南继续涵盖后续章节，包括评估方法论、提示工程、RAGRetrieval-Augmented Generation - an AI framework that combines information retrieval with language generation to produce more accurate and contextually relevant responses.与智能体、微调、数据集工程、推理优化以及AI工程架构。如需完整列表，请参考原始来源或《AI工程》全书。

常见问题（FAQ）

《AI工程》这本书主要涵盖哪些核心技术领域？

本书涵盖ML理论、基础模型、评估、提示工程、RAGRetrieval-Augmented Generation - an AI framework that combines information retrieval with language generation to produce more accurate and contextually relevant responses.、微调、数据集工程、推理优化和架构，并附有论文、案例和工具资源。

学习AI工程需要先掌握机器学习理论吗？

不一定需要，但了解ML基础有助于避免误用。推荐斯坦福CS 321N课程、Karpathy的《神经网络：从零到精通》或《机器学习：概率视角》等资源。

如何评估AI在具体职业中的应用潜力？

参考OpenAI 2023年研究，若AI能将任务时间减少至少50%，则该任务被视为受影响。例如口译员、税务准备人员等职业受影响度接近100%。

AI Summary (BLUF)