间接提示注入攻击如何远程利用LLM集成应用？（附Bing Chat实测案例）

摘要

Large Language Models (LLMs) are increasingly being integrated into various applications. The functionalities of recent LLMs can be flexibly modulated via natural language prompts. This renders them susceptible to targeted adversarial prompting, e.g., Prompt Injection (PI) attacks enable attackers to override original instructions and employed controls. So far, it was assumed that the user is directly prompting the LLM. But, what if it is not the user prompting? We argue that LLM-Integrated Applications blur the line between data and instructions. We reveal new attack vectors, using Indirect Prompt Injection, that enable adversaries to remotely (without a direct interface) exploit LLM-integrated applications by strategically injecting prompts into data likely to be retrieved. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities, including data theft, worming, information ecosystem contamination, and other novel security risks. We demonstrate our attacks' practical viability against both real-world systems, such as Bing's GPT-4A large multimodal AI model developed by OpenAI, known for advanced reasoning and language understanding. powered Chat and code-completion engines, and synthetic applications built on GPT-4A large multimodal AI model developed by OpenAI, known for advanced reasoning and language understanding.. We show how processing retrieved prompts can act as arbitrary code execution, manipulate the application's functionality, and control how and if other APIs are called. Despite the increasing integration and reliance on LLMs, effective mitigations of these emerging threats are currently lacking. By raising awareness of these vulnerabilities and providing key insights into their implications, we aim to promote the safe and responsible deployment of these powerful models and the development of robust defenses that protect users and systems from potential attacks.

大语言模型正日益广泛地集成到各类应用中。近期LLMs的功能可以通过自然语言提示词灵活地调节。这使得它们容易受到有针对性的对抗性提示攻击，例如，提示注入攻击使攻击者能够覆盖原始指令和已部署的控制措施。迄今为止，普遍假设是用户直接向LLM提供提示。但如果提示并非来自用户呢？我们认为，LLM集成应用模糊了数据和指令之间的界限。我们揭示了新的攻击向量——间接提示注入，它使攻击者能够通过将提示词策略性地注入到可能被检索的数据中，从而远程（无需直接接口）利用LLM集成应用。我们从计算机安全的角度推导出一个全面的分类法，以系统性地研究其影响和漏洞，包括数据窃取、蠕虫化、信息生态系统污染以及其他新颖的安全风险。我们证明了这些攻击在实际系统中的可行性，包括Bing的GPT-4A large multimodal AI model developed by OpenAI, known for advanced reasoning and language understanding.驱动的聊天和代码补全引擎，以及基于GPT-4A large multimodal AI model developed by OpenAI, known for advanced reasoning and language understanding.构建的合成应用。我们展示了处理被检索的提示词如何充当任意代码执行、操纵应用程序功能，并控制其他API的调用方式和时机。尽管对LLMs的集成和依赖日益加深，但目前对这些新兴威胁的有效缓解措施仍然缺乏。通过提高对这些漏洞的认识并提供其影响的关键见解，我们旨在促进这些强大模型的安全和负责任部署，并开发保护用户和系统免受潜在攻击的稳健防御措施。

引言：从直接提示注入到间接攻击

传统的提示注入攻击模型假设攻击者能够直接与LLM的输入接口交互。然而，在LLM集成应用中，模型处理的文本通常来自外部数据源（如网页、数据库、文档）。这创造了一个新的攻击面：攻击者可以将恶意提示词预先植入这些数据源中。当应用程序检索并处理这些被污染的数据时，内嵌的恶意指令就会被LLM执行，从而实现远程、间接的攻击。

The traditional prompt injection attack model assumes the attacker can directly interact with the LLM's input interface. However, in LLM-integrated applications, the text processed by the model often originates from external data sources (such as web pages, databases, documents). This creates a new attack surface: attackers can pre-plant malicious prompts into these data sources. When the application retrieves and processes this contaminated data, the embedded malicious instructions are executed by the LLM, enabling remote, indirect attacks.

核心攻击向量与影响分类

我们从计算机安全角度，对间接提示注入攻击的影响进行了系统化分类。主要风险类别如下：

From a computer security perspective, we have systematically categorized the impacts of indirect prompt injection attacks. The main risk categories are as follows:

1. 数据泄露与权限提升

攻击者通过注入的提示词诱导LLM泄露其系统提示、内部指令、敏感用户数据或应用程序的配置信息。在某些场景下，这可能进一步导致权限提升，使攻击者获得超出预期的系统访问或控制能力。

Attackers use injected prompts to induce the LLM to leak its system prompts, internal instructions, sensitive user data, or application configuration information. In some scenarios, this may further lead to privilege escalation, granting attackers system access or control capabilities beyond expectations.

2. 应用功能劫持与逻辑绕过

恶意提示可以覆盖或修改应用程序的原始工作流程和业务逻辑。例如，一个旨在总结网页内容的AI助手，可能被注入的提示词劫持，转而执行攻击者指定的操作，如发送欺诈邮件或发布不当内容。

Malicious prompts can override or modify the application's original workflow and business logic. For instance, an AI assistant designed to summarize web content could be hijacked by injected prompts to perform attacker-specified actions, such as sending phishing emails or posting inappropriate content.

3. 持久化攻击与“AI蠕虫”

通过将自复制或传播指令注入到LLM可能读取并写入的数据存储中（如知识库、协作文档），攻击可以实现在系统内部或跨系统的自动传播，类似于传统网络蠕虫，形成“AI蠕虫”。

By injecting self-replicating or propagation instructions into data stores that the LLM may read and write to (such as knowledge bases, collaborative documents), attacks can achieve automatic propagation within a system or across systems, similar to traditional network worms, forming "AI worms."

4. 生态系统污染与供应链攻击

攻击者污染广泛使用的公共数据源（如开源代码库、维基页面、新闻摘要）。任何依赖这些数据源的LLM应用在检索信息时，都可能被动执行恶意指令，造成大规模、难以追溯的影响。

Attackers contaminate widely used public data sources (such as open-source code repositories, wiki pages, news summaries). Any LLM application relying on these data sources may passively execute malicious instructions when retrieving information, causing large-scale, hard-to-trace impacts.

攻击演示与可行性分析

研究团队在真实和合成环境中验证了间接提示注入攻击的可行性。下表对比了在不同类型应用上演示的攻击效果及关键指标：

The research team validated the feasibility of indirect prompt injection attacks in both real and synthetic environments. The table below compares the demonstrated attack effects and key metrics across different types of applications:


目标应用/环境	攻击注入点	核心攻击效果	关键指标/严重性
Bing Chat (GPT-4A large multimodal AI model developed by OpenAI, known for advanced reasoning and language understanding.驱动)	攻击者控制的网页内容	劫持对话，诱导用户访问恶意网站或泄露会话历史。	高影响面：影响所有通过Bing Chat访问该网页的用户。
代码补全引擎	代码注释或文档字符串	在生成的代码中插入后门、漏洞或恶意依赖。	高隐蔽性：恶意指令混迹于正常开发上下文。
合成AI客服应用	知识库文章	覆盖客服流程，执行数据导出、发送内部消息等未授权操作。	权限突破：将只读知识查询转化为特权操作。
智能文档分析工具	PDF/Word文档内容	将文档处理请求重定向至攻击者服务器，导致数据渗出。	供应链风险：通过受信文档分发攻击载荷。

当前缓解措施的局限性

目前常见的防御策略在面对间接提示注入时存在显著不足：

Current common defense strategies show significant shortcomings when facing indirect prompt injection:

输入过滤与清洗：难以区分合法内容与精心构造的恶意提示，尤其是当后者与上下文语义融合时。
- Input Filtering & Sanitization: It is difficult to distinguish legitimate content from carefully crafted malicious prompts, especially when the latter is semantically blended with the context.
输出监控与后处理：攻击可能在LLM输出最终结果前就已达成目标（如调用内部API），后处理无法干预。
- Output Monitoring & Post-processing: The attack may achieve its goal (e.g., calling internal APIs) before the LLM produces its final output, which post-processing cannot intervene in.
传统WAF/防火墙：此类攻击发生在应用层逻辑内部，流量本身看似正常，传统边界安全设备难以检测。
- Traditional WAF/Firewalls: These attacks occur within the application-layer logic; the traffic itself appears normal, making it difficult for traditional perimeter security devices to detect.
“系统提示词”加固：攻击者注入的提示词可能在检索的上下文中获得更高优先级，覆盖或绕过系统预设的防护指令。
- "System Prompt" Hardening: Injected prompts from attackers may gain higher priority within the retrieved context, overriding or bypassing system-preset guardrail instructions.

防御建议与未来方向

构建针对间接提示注入的稳健防御是一个跨学科的挑战，需要从架构、算法和流程多方面入手：

Building robust defenses against indirect prompt injection is an interdisciplinary challenge requiring a multi-faceted approach encompassing architecture, algorithms, and processes:

架构隔离与权限最小化
- 在应用设计中严格区分“可信指令通道”（如系统提示）与“不可信数据通道”（如检索内容）。为LLM访问外部API或数据实施严格的、基于上下文的权限控制。
- Architectural Isolation & Principle of Least Privilege
  - Strictly separate the "trusted instruction channel" (e.g., system prompts) from the "untrusted data channel" (e.g., retrieved content) in application design. Enforce strict, context-based permission controls for LLM access to external APIs or data.
上下文感知的威胁检测
- 开发能够分析完整交互链（用户查询 + 检索到的数据 + LLM响应 + 触发的动作）的检测模型，识别其中不协调或恶意的指令流模式。
- Context-Aware Threat Detection
  - Develop detection models capable of analyzing the complete interaction chain (user query + retrieved data + LLM response + triggered actions) to identify incongruent or malicious patterns in the instruction flow.
数据源信誉与验证
- 为应用程序引入的数据建立信誉评分机制。对来自低信誉或未知来源的数据进行额外审查、沙箱处理或限制其可触发的操作类型。
- Data Source Reputation & Verification
  - Implement a reputation scoring mechanism for data ingested by the application. Apply additional scrutiny, sandboxing, or restrictions on the types of operations that can be triggered by data from low-reputation or unknown sources.
红队测试与安全基准
- 将间接提示注入场景纳入LLM应用的安全测试标准。建立公开的基准测试集，以持续评估和提升模型及应用层面的安全性。
- Red Teaming & Security Benchmarks
  - Incorporate indirect prompt injection scenarios into the security testing standards for LLM applications. Establish public benchmark datasets to continuously evaluate and improve security at both the model and application levels.

结论

间接提示注入攻击揭示了LLM集成应用基础架构中一个深刻且普遍的安全盲区。它表明，当强大的、可编程的AI模型与动态的外部数据源相结合时，传统的安全边界已经失效。应对这一挑战不仅需要技术上的创新，更需要开发者、安全研究者和政策制定者转变思维模式，将安全性作为LLM原生应用设计的核心支柱。随着AI更深地嵌入社会关键职能，主动理解和缓解此类系统性风险已变得至关重要。

Indirect prompt injection attacks reveal a profound and pervasive security blind spot in the foundational architecture of LLM-integrated applications. It demonstrates that when powerful, programmable AI models are coupled with dynamic external data sources, traditional security boundaries become ineffective. Addressing this challenge requires not only technological innovation but also a paradigm shift among developers, security researchers, and policymakers to treat security as a core pillar of LLM-native application design. As AI becomes more deeply embedded in society's critical functions, proactively understanding and mitigating such systemic risks has become paramount.

引用与参考

论文原文: Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
DOI: https://doi.org/10.48550/arXiv.2302.12173

常见问题（FAQ）

什么是间接提示注入攻击？它与传统攻击有何不同？

间接提示注入是攻击者将恶意提示词预先植入外部数据源（如网页、数据库），当LLM应用检索并处理这些数据时执行攻击。与传统直接提示注入不同，它无需直接接口即可远程实施。

间接提示注入攻击主要有哪些危害类型？

主要危害包括：1) 数据泄露与权限提升；2) 应用功能劫持与逻辑绕过；3) 持久化攻击与“AI蠕虫”；4) 生态系统污染与供应链攻击。这些攻击已在Bing Chat等实际系统中验证可行。

如何防范LLM集成应用中的间接提示注入攻击？

目前有效防护措施仍缺乏。研究建议提高安全意识，开发稳健的防御机制，并在部署LLM时采取更严格的数据源验证和处理流程，以阻断恶意提示的执行路径。

AI Summary (BLUF)

摘要