如何用Promptfoo提升LLM测试效率？2026年自动化评估全指南

Q: Promptfoo支持哪些安装方式？

Promptfoo支持npm全局安装、npx临时使用和Homebrew安装（macOS用户）。安装后可通过`promptfoo --version`验证版本号。

Q: 如何快速开始使用Promptfoo进行测试？

运行`npx promptfoo@latest init --example getting-started`创建示例项目，或运行`npx promptfoo@latest init`进行交互式配置，生成promptfooconfig.yaml配置文件。

Q: Promptfoo的核心配置文件如何定义提示词？

在promptfooconfig.yaml的prompts字段中，可直接内联定义模板（如'Convert this English to {{language}}: {{input}}'）或引用外部文件，使用{{变量名}}语法。

你是否还在为LLM（大语言模型）应用的测试效率低下而烦恼？手动对比不同提示词、模型输出耗时费力，难以规模化验证应用质量？Promptfoo作为一款专业的LLM测试工具，能帮助你自动化评估提示词、模型和RAG（检索增强生成）系统，显著提升测试效率。本文将带你从安装到高级应用，全面掌握Promptfoo的使用方法。

Are you still struggling with the inefficiency of testing your LLM (Large Language Model) applications? Manually comparing different prompts and model outputs is time-consuming and labor-intensive, making it difficult to validate application quality at scale. Promptfoo, as a professional LLM testing tool, can help you automatically evaluate prompts, models, and RAG (Retrieval-Augmented Generation) systems, significantly improving testing efficiency. This article will guide you from installation to advanced applications, helping you master the use of Promptfoo comprehensively.

Installation

Promptfoo支持多种安装方式，满足不同用户需求。根据你的环境选择以下任意一种方式进行安装：

Promptfoo supports multiple installation methods to meet the needs of different users. Choose any of the following methods based on your environment:

Global Installation via npm

npm install -g promptfoo

Temporary Use via npx

npx promptfoo@latest

Installation via Homebrew (for macOS Users)

brew install promptfoo

安装完成后，通过以下命令验证安装是否成功：

After installation, verify if the installation was successful with the following command:

promptfoo --version

成功安装会显示版本号，如0.114.7。官方安装文档：site/docs/installation.md

A successful installation will display the version number, such as 0.114.7. Official installation documentation: site/docs/installation.md

Getting Started

完成安装后，通过初始化命令快速创建第一个测试项目：

After installation, quickly create your first test project using the initialization command:

Using an Example Project

npx promptfoo@latest init --example getting-started

Interactive Configuration Creation

如果需要自定义配置，可以运行不带示例参数的初始化命令：

If you need a custom configuration, you can run the initialization command without the example parameter:

npx promptfoo@latest init

该命令会引导你完成交互式配置过程，创建适合你的测试环境。

This command will guide you through an interactive configuration process to create a testing environment suitable for you.

初始化完成后，会在当前目录生成promptfooconfig.yaml配置文件和相关测试资源。入门指南：site/docs/getting-started.md

After initialization, a promptfooconfig.yaml configuration file and related test resources will be generated in the current directory. Getting Started Guide: site/docs/getting-started.md

Configuration Deep Dive

Promptfoo的核心配置文件是promptfooconfig.yaml，通过该文件你可以定义测试的各个方面。一个完整的配置包含提示词、模型提供者和测试用例三个主要部分。

The core configuration file for Promptfoo is promptfooconfig.yaml. Through this file, you can define all aspects of your tests. A complete configuration consists of three main parts: prompts, model providers, and test cases.

Defining Prompts

在配置文件中，使用prompts字段定义需要测试的提示词模板。可以直接内联定义或引用外部文件：

In the configuration file, use the prompts field to define the prompt templates to be tested. You can define them inline or reference external files:

prompts:
  - 'Convert this English to {{language}}: {{input}}'
  - 'Translate to {{language}}: {{input}}'
  # 引用外部提示词文件 / Reference external prompt file
  - file://prompts.txt

提示词中使用双花括号{{variable_name}}定义变量，测试时会动态替换为测试用例中的值。提示词配置详情：site/docs/configuration/prompts

Variables are defined in prompts using double curly braces {{variable_name}}, which are dynamically replaced with values from test cases during execution. Prompt configuration details: site/docs/configuration/prompts

Configuring Model Providers

providers字段用于指定需要测试的AI模型。Promptfoo支持50多种模型提供者，包括OpenAI、Anthropic、Google等主流API，以及Ollama等本地模型：

The providers field is used to specify the AI models to be tested. Promptfoo supports over 50 model providers, including mainstream APIs like OpenAI, Anthropic, Google, as well as local models like Ollama:

providers:
  - openai:gpt-5
  - openai:gpt-5-mini
  - anthropic:messages:claude-sonnet-4-20250514
  - vertex:gemini-2.5-pro
  # 自定义本地模型 / Custom local model
  - ollama:llama3.1
  # 引用自定义提供者脚本 / Reference custom provider script
  - file://path/to/custom/provider.py

大多数模型需要设置API密钥等认证信息，通常通过环境变量配置：

Most models require authentication information such as API keys, typically configured via environment variables:

export OPENAI_API_KEY=sk-your-key
export ANTHROPIC_API_KEY=your-key

支持的模型列表：site/docs/providers

List of supported models: site/docs/providers

Creating Test Cases

tests字段定义测试输入变量和预期结果。每个测试用例包含变量值和可选的断言条件：

The tests field defines test input variables and expected results. Each test case contains variable values and optional assertion conditions:

tests:
  - vars:
      language: French
      input: Hello world
    assert:
      - type: contains
        value: bonjour
  - vars:
      language: Spanish
      input: Where is the library?
    assert:
      - type: contains
        value: biblioteca

测试用例配置：site/docs/configuration/guide

Test case configuration: site/docs/configuration/guide

Running Tests

配置完成后，使用eval命令执行测试：

After configuration is complete, use the eval command to execute the tests:

npx promptfoo@latest eval

Command Line Output

测试执行过程中，会在终端显示实时进度和结果摘要：

During test execution, real-time progress and result summaries are displayed in the terminal:

Generating HTML Reports

添加-o参数生成详细的HTML报告：

Add the -o parameter to generate a detailed HTML report:

npx promptfoo@latest eval -o output.html

Viewing the Web Interface

测试完成后，通过view命令打开交互式Web界面查看结果：

After testing is complete, open the interactive web interface to view results using the view command:

npx promptfoo@latest view

Web界面提供丰富的结果展示和比较功能，支持按多种维度筛选和排序结果：

The web interface provides rich result display and comparison features, supporting filtering and sorting results by various dimensions:

Advanced Features

Automatically Evaluating Output Quality

通过断言（assertions）功能，可以自动评估模型输出是否符合预期。Promptfoo支持多种断言类型：

Through the assertions feature, you can automatically evaluate whether model outputs meet expectations. Promptfoo supports multiple assertion types:

Content Checking

assert:
  - type: contains
    value: expected substring
  - type: not-contains
    value: forbidden content
  - type: equals
    value: exact match
  - type: starts-with
    value: beginning of response

LLM Scoring

使用另一个LLM作为裁判，根据自定义评分标准评估输出：

Use another LLM as a judge to evaluate outputs based on custom scoring criteria:

assert:
  - type: llm-rubric
    value: "Scoring Criteria: Is the answer clear, accurate, and concise? 10-point scale."
    threshold: 8 # Minimum score requirement

Custom JavaScript Evaluation

编写JavaScript函数进行复杂的自定义评估：

Write JavaScript functions for complex custom evaluations:

assert:
  - type: javascript
    value: |
      // Calculate output length score, lower score for longer outputs
      Math.max(0, Math.min(1, 1 - (output.length - 100) / 900));

断言配置详情：site/docs/configuration/expected-outputs

Assertion configuration details: site/docs/configuration/expected-outputs

Model Comparison Analysis

Promptfoo可以同时测试多个模型，方便进行横向对比。以下是一个对比GPT-5和Claude的配置示例：

Promptfoo can test multiple models simultaneously, facilitating horizontal comparisons. The following is a configuration example comparing GPT-5 and Claude:

prompts:
  - "Summarize this text in 50 words: {{input}}"
providers:
  - openai:gpt-5
  - anthropic:messages:claude-sonnet-4-20250514
tests:
  - vars:
      input: "人工智能（AI）是计算机科学的一个分支，致力于创建能够模拟人类智能的系统。这些系统能够学习、推理、自适应并执行通常需要人类智能才能完成的任务。"
  - vars:
      input: "气候变化是指地球气候系统的长期变化，包括全球平均温度、降水模式、极端天气事件频率等方面的改变。主要由人类活动导致的温室气体排放是当前气候变化的主要驱动因素。"

运行测试后，通过Web界面可以直观比较不同模型的输出质量：

After running the tests, you can visually compare the output quality of different models through the web interface:

模型对比教程：examples/openai-model-comparison

Model comparison tutorial: examples/openai-model-comparison

Security Testing (Red Teaming)

Promptfoo还提供强大的红队测试功能，帮助你发现LLM应用的安全漏洞：

Promptfoo also provides powerful red teaming functionality to help you discover security vulnerabilities in LLM applications:

npx promptfoo@latest redteam

红队测试会使用各种攻击策略尝试诱导模型生成不当内容，识别潜在的安全风险。测试完成后生成详细的风险报告：

Red team testing uses various attack strategies to attempt to induce the model to generate inappropriate content, identifying potential security risks. A detailed risk report is generated upon completion:

红队测试指南：site/docs/red-team

Red team testing guide: site/docs/red-team

Practical Examples

Translation Application Testing

以下是一个完整的翻译应用测试配置示例，比较不同提示词和模型在多语言翻译任务中的表现：

The following is a complete translation application testing configuration example, comparing the performance of different prompts and models in multilingual translation tasks:

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: Translation App Test - Comparing Translation Quality Across Prompts and Models

prompts:
  - 'Translate the following to {{language}}: {{input}}'
  - 'In {{language}}, this would be: {{input}}'
  - 'Convert the text to {{language}}, keeping the original meaning: {{input}}'

providers:
  - openai:gpt-5-mini
  - anthropic:messages:

## 常见问题（FAQ）

### Promptfoo支持哪些安装方式？

Promptfoo支持npm全局安装、npx临时使用和Homebrew安装（macOS用户）。安装后可通过`promptfoo --version`验证版本号。

### 如何快速开始使用Promptfoo进行测试？

运行`npx promptfoo@latest init --example getting-started`创建示例项目，或运行`npx promptfoo@latest init`进行交互式配置，生成promptfooconfig.yaml配置文件。

### Promptfoo的核心配置文件如何定义提示词？

在promptfooconfig.yaml的prompts字段中，可直接内联定义模板（如'Convert this English to {{language}}: {{input}}'）或引用外部文件，使用{{变量名}}语法。