大语言模型怎么用?2026年零基础入门教程(附API调用)
AIAI Summary (BLUF)
本文是一篇面向初学者的全面大语言模型(LLM)入门指南,解释了其核心原理(Transformer架构、自注意力机制)、提示词工程基础,以及如何通过Python调用LLM API(OpenAI、DeepSeek)。文章强调了LLM的统计本质、局限性以及有效交互的实用技巧。
核心洞察
大语言模型不仅是 AI Agent 的大脑,更是一个重新定义人机交互范式的革命性技术。本文从本质出发,剖析 LLM 的统计概率核心、Transformer 架构的自注意力机制,并系统阐述 Prompt 工程与 API 调用的实践要点。理解这些基础,是构建真正智能、可控的 AI Agent 的起点。
Large Language Models are not only the brain of AI Agents but also a revolutionary technology that redefines the paradigm of human-computer interaction. This article starts from the essence, dissecting the statistical probability core of LLMs, the self-attention mechanism of the Transformer architecture, and systematically elaborating on the practical points of prompt engineering and API calls. Understanding these fundamentals is the starting point for building truly intelligent and controllable AI Agents.
大语言模型 (Large Language Model) 简介
大语言模型(Large Language Model,简称 LLM)是 AI Agent 的大脑,理解它是构建智能 Agent 的基础。
A Large Language Model (LLM) is the brain of an AI Agent. Understanding it is the foundation for building intelligent Agents.
大语言模型之所以能与你对话、写文章、编程,本质上是它在根据你给出的文本(提示),一个字一个字地猜出最合理的下文。
The reason why a large language model can converse with you, write articles, and code is essentially that, based on the text (prompt) you provide, it guesses the most reasonable next word, one word at a time.
简单来说,大语言模型是一个经过海量文本数据训练的深度学习模型,它能够理解和生成人类语言。大语言模型通过分析互联网上的海量文本,学习语言的统计规律,当收到输入时,根据学到的规律生成最合理的续写。
Simply put, a large language model is a deep learning model trained on massive amounts of text data, capable of understanding and generating human language. By analyzing vast quantities of text from the internet, LLMs learn the statistical patterns of language. When they receive input, they generate the most reasonable continuation based on the patterns they have learned.
我们可以把大语言模型想象成一个极其用功、记忆力超群的学生:
We can imagine a large language model as an extremely hardworking student with a superb memory:
- 学习阶段(训练):它阅读了互联网上几乎所有公开的文本——书籍、文章、网页、代码等(数据量可达万亿单词级别)。在这个过程中,它不是在背诵,而是在学习一套极其复杂的语言规律。
- Learning Phase (Training): It reads almost all publicly available text on the internet – books, articles, web pages, code, etc. (the data volume can reach trillions of words). In this process, it is not memorizing but learning an extremely complex set of linguistic patterns.
- 应用阶段(推理):当你向它提问或给出指令时,它就会运用学到的规律,一个字接一个字地生成出最合乎逻辑和语境的回答。
- Application Phase (Inference): When you ask it a question or give it a command, it applies the patterns it has learned to generate the most logical and contextually appropriate response, word by word.
它的"大"主要体现在两个方面:
Its "largeness" is mainly reflected in two aspects:
- 参数规模大:模型内部有数百亿甚至上万亿个可调节的参数,记录了学到的语言知识。
- Large Parameter Scale: The model contains tens or hundreds of billions, or even trillions, of adjustable parameters that record the learned linguistic knowledge.
- 训练数据大:用于训练的文本数据量巨大,涵盖互联网公开信息的精华。
- Large Training Data: The amount of text data used for training is enormous, covering the essence of publicly available information on the internet.
LLM 的局限性
尽管 LLM 很强大,但它也有明确的局限性:
Despite their power, LLMs have clear limitations:
| 能力 | 说明 | 局限性 |
|---|---|---|
| 知识截止 | 训练数据有截止日期 | 无法获知训练后的新信息 |
| 数学计算 | 能做简单计算 | 复杂计算容易出错 |
| 实时信息 | 需要外部工具辅助 | 本身无法获取实时数据 |
| 事实准确性 | 可能生成错误信息 | 需要事实核查 |
| 长文本处理 | 上下文长度有限制 | 超长文本会丢失信息 |
| 逻辑一致性 | 可能前后矛盾 | 需要仔细设计和验证 |
Important Reminder: LLMs are not omniscient; they are fundamentally statistical pattern-matching systems. Understanding their limitations is the key to utilizing their capabilities effectively.
核心工作原理:Transformer 架构简析
LLM 的惊人能力,离不开其底层核心技术——Transformer 架构。不需要深究复杂的数学原理,但可以理解它的核心思想。
The remarkable capabilities of LLMs are inseparable from their underlying core technology: the Transformer architecture. While you don't need to delve into complex mathematical principles, it's helpful to understand its core ideas.
想象你要写一篇关于太阳系的文章:
Imagine you are writing an article about the Solar System:
- 通读资料:你会先看很多相关的书籍和网页。
- Read Through Materials: You would first read many related books and web pages.
- 抓住重点:你会注意到太阳、行星、轨道、引力这些词频繁出现且相互关联。
- Grasp Key Points: You would notice that words like sun, planets, orbit, and gravity appear frequently and are interconnected.
- 组织语言:根据你想表达的重点(比如介绍火星),你会选择性地运用之前看到的关于火星大小、颜色、位置等信息,并组织成通顺的句子。
- Organize Language: Based on the key points you want to express (e.g., introducing Mars), you would selectively use the previously seen information about Mars’s size, color, and position, and organize it into coherent sentences.
Transformer 的工作方式与此类似,它的核心流程分为三个阶段:
The Transformer works in a similar way. Its core process is divided into three stages:
- 输入处理:你的话被拆分成词或字(Token),并转换成计算机能理解的数字(向量)。
- Input Processing: Your words are broken down into tokens or characters and converted into numbers (vectors) that computers can understand.
- 理解上下文(核心):自注意力机制(Self-Attention)开始工作。它让模型在处理句子中每一个词时,都能权衡句子中所有其他词的重要性。这个过程是并行的,速度极快。
- Context Understanding (Core): The Self-Attention mechanism begins to work. It allows the model to weigh the importance of all other words in the sentence when processing each word. This process is parallel and extremely fast.
- 生成与循环:模型基于对所有词的理解,计算出概率分布,预测下一个最可能出现的词。选中并输出这个词后,将其作为新的输入,重复整个过程,直到生成完整回答。
- Generation and Iteration: Based on the understanding of all words, the model calculates a probability distribution to predict the most likely next word. After selecting and outputting this word, it is used as new input, and the entire process is repeated until a complete response is generated.
自注意力机制是 Transformer 最关键的创新。以句子"苹果的手机它的电池很大"为例,当模型处理"它"这个词时,自注意力机制会帮助模型判断"它"与"苹果"和"手机"高度相关。
The self-attention mechanism is the most critical innovation of the Transformer. Taking the sentence "Apple's phone, its battery is very large" as an example, when the model processes the word "its," the self-attention mechanism helps the model determine that "its" is highly relevant to "Apple" and "phone."
正是这种能并行处理并深度理解全局上下文的能力,使得基于 Transformer 的 LLM 在语言任务上远超以往技术(如 RNN)。
It is precisely this ability to process in parallel and deeply understand the global context that makes Transformer-based LLMs far superior to previous technologies (like RNNs) in language tasks.
如何与 LLM 交互:Prompt 工程入门
**Prompt(提示词)**是你给 LLM 的输入,它告诉模型你想要什么,就像给助理下达指令——指令越清晰,结果越好。Prompt 的质量直接决定了回答的质量。
A Prompt is the input you give to an LLM, telling the model what you want, much like giving instructions to an assistant – the clearer the instructions, the better the result. The quality of the prompt directly determines the quality of the response.
一个好的 Prompt 通常由以下四个部分组成:
A good prompt typically consists of the following four parts:
- 上下文:设定角色与背景。
- Context: Set the role and background.
- 指令:明确任务目标。
- Instruction: Define the task objective.
- 示例(可选):提供范例与反例。
- Examples (Optional): Provide examples and counter-examples.
- 格式要求:明确输出格式与长度。
- Format Requirements: Specify the output format and length.
基本原则
Basic Principles
- 明确具体:避免模糊表达。不要说"写点关于狗的东西",而应该说"用生动活泼的语言,为 6-8 岁儿童写一段 100 字左右的关于金毛寻回犬性格特点的简短介绍"。
- Be Specific and Concrete: Avoid vague expressions. Instead of saying "write something about dogs," say "Write a short introduction of about 100 words on the personality traits of Golden Retrievers for children aged 6-8, in a lively and vivid language."
- 提供上下文:告诉模型你的身份、背景和目标。例如:"你是一位经验丰富的 Python 编程导师。请向一个刚学完基本语法的初学者解释什么是列表推导式,并提供一个简单的例子。"
- Provide Context: Tell the model your identity, background, and goals. For example: "You are an experienced Python programming tutor. Please explain what a list comprehension is to a beginner who has just finished learning basic syntax, and provide a simple example."
- 指定格式:如果需要特定格式的输出,请明确说明,例如:"请将以下要点总结为三个 bullet points" 或 "请以 JSON 格式输出"。
- Specify Format: If you need output in a specific format, state it clearly, such as "Please summarize the following points into three bullet points" or "Please output in JSON format."
- 分步思考(Chain-of-Thought):对于复杂问题,可以在 Prompt 中引导模型逐步推理,例如:"请一步一步地分析这个问题,先列出已知条件,再推导中间步骤,最后给出结论。" 这种方式能显著提升复杂推理任务的准确率。
- Chain-of-Thought: For complex problems, you can guide the model to reason step-by-step within the prompt, for example: "Please analyze this problem step-by-step. First, list the known conditions, then deduce the intermediate steps, and finally give the conclusion." This method can significantly improve the accuracy of complex reasoning tasks.
LLM 的常见应用场景
Common Application Scenarios for LLMs
| 场景类别 | 具体示例 | 说明 |
|---|---|---|
| 内容创作与编辑 | 撰写邮件、报告、博客;续写故事;润色文案;翻译不同风格文本 | 快速生成草稿,提供灵感和多种表达方式 |
| 信息检索与总结 | 快速阅读长文档并提炼核心观点;基于知识库的问答 | 比传统搜索更"理解"问题,能进行归纳和整合 |
| 编程辅助 | 解释代码、生成代码片段、调试错误、重构代码、编写测试用例 | 充当全天候的编程伙伴,极大提升开发效率 |
| 对话与客服 | 智能聊天机器人、个性化导师、角色扮演 | 提供拟人化、上下文连贯的交互体验 |
| 逻辑推理与分析 | 解数学题、进行基础逻辑推理、分析数据趋势、制定计划 | 在限定领域内展示出令人惊讶的推理能力 |
API 调用与参数设置
要构建 AI Agent,你需要学会如何通过 API 调用 LLM。本章节我们以兼容 OpenAI API 的主流平台为例,介绍基本的调用方法与实践配置。
To build an AI Agent, you need to learn how to call LLMs via an API. This chapter uses mainstream platforms compatible with the OpenAI API as examples to introduce basic calling methods and practical configurations.
基础 API 调用
Basic API Call
1. 安装必要的库
1. Install the Required Library
pip install openai
Then, you need to register an account on the relevant platform's official website and generate an API Key from the API Keys page.
2. 核心配置参数对比
2. Core Configuration Parameter Comparison
当在不同的平台上切换时,核心配置通常只涉及三个参数。以下表格对比了使用 OpenAI 标准 SDK 访问不同服务的差异:
When switching between different platforms, the core configuration usually involves only three parameters. The following table compares the differences in accessing different services using the OpenAI standard SDK:
| 平台 / 服务 | base_url (基础地址) | api_key (密钥) | model (模型名称) |
|---|---|---|---|
| OpenAI | https://api.openai.com/v1 |
在 OpenAI 官网获取 | gpt-4o, gpt-4-turbo 等 |
| DeepSeek | https://api.deepseek.com |
在 DeepSeek 官网获取 | deepseek-v4-flash (非思考模式) 或 deepseek-v4-pro (思考模式) |
| 阿里百炼 (通义千问) | https://dashscope.aliyuncs.com/compatible-mode/v1 |
在阿里云百炼控制台获取 | qwen-max, qwen-plus 等 |
代码实例 (以 DeepSeek 为例)
Code Example (Using DeepSeek as an Example)
以下代码展示了如何使用兼容的 SDK 调用 DeepSeek 模型。只需修改 base_url 和 api_key 即可适配其他平台。
The following code demonstrates how to call a DeepSeek model using a compatible SDK. Simply modify the
base_urlandapi_keyto adapt it to other platforms.
import os
from openai import OpenAI
# 初始化客户端(核心配置:替换为你的 API Key)
# Initialize the client (Core configuration: Replace with your API Key)
client = OpenAI(
api_key=os.environ.get('DEEPSEEK_API_KEY'), # 推荐通过环境变量配置
base_url="https://api.deepseek.com" # DeepSeek 固定域名
)
# 调用对话 API
# Call the Chat API
try:
response = client.chat.completions.create(
model="deepseek-v4-flash", # 指定模型
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello"},
],
stream=False # 非流式输出
)
# 打印回复内容
# Print the response content
print("回复结果:", response.choices[0].message.content)
except Exception as e:
print("调用失败:", str(e))
Note: OpenAI's official Python library has been upgraded to a new
.responses.create()method. The example above uses the more widely adopted.chat.completions.create()method, which is compatible with both OpenAI and third-party platforms like DeepSeek and Alibaba Cloud Bailian.
常见问题(FAQ)
大语言模型是如何做到“理解”人类语言的?
大语言模型通过海量文本训练学习语言统计规律,基于Transformer自注意力机制,根据输入逐字预测最合理的下一词,本质是概率匹配,并非真正理解语义。
调用LLM API时,有哪些重要的参数需要调整?
关键参数包括temperature(控制随机性)、max_tokens(生成长度)、top_p(核采样)、frequency_penalty(避免重复)等,需根据任务调整以平衡创造性和准确性。
如何编写有效的提示词来获得更好结果?
提示词应明确角色、任务、格式和约束,使用示例或分步骤指令,避免模糊用语。参考基本原则:清晰具体、提供上下文、限制输出范围。
版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。
文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。
若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。



