如何利用OpenAPI替代MCP为LLM集成工具?(附Scala实现方案)
AI Summary (BLUF)
This article explores an alternative approach to the Model Context Protocol (MCP) for integrating tools with Large Language Models (LLMs) by leveraging existing OpenAPI servers. It proposes a simpler, more intuitive method that uses structured HTTP API definitions as tool inputs, requiring only minimal authentication flow additions. The implementation is demonstrated through a concise Scala script, focusing on core tool integration while omitting MCP's broader features like prompts and resources.
原文翻译: 本文探讨了一种替代模型上下文协议(MCP)的方法,通过利用现有的OpenAPI服务器为大型语言模型(LLM)集成工具。它提出了一种更简单、更直观的方法,使用结构化的HTTP API定义作为工具输入,仅需添加最小的身份验证流程。通过一个简洁的Scala脚本演示了实现,专注于核心工具集成,同时省略了MCP更广泛的功能,如提示和资源。
Beyond MCP: A Simplified Approach Using OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。 as LLM Tools
随着大型语言模型(LLM)与外部工具的集成需求日益增长,模型上下文协议(MCP) 被越来越多的人采纳为一种便捷的集成方式。然而,我发现 MCP 的架构不够直观且过于复杂。因此,在本文中,我将探讨如何利用现有的 OpenAPI 服务器作为 LLM 的工具,而不是在一个全新的协议中重新编写功能。这有可能成为一种更简单的标准,只需额外实现一个认证流程(如果需要的话)。
As the demand for integrating Large Language Models (LLMs) with external tools grows, the Model Context Protocol (MCP) is being adopted by more and more people as a convenient integration method. However, I find MCP's architecture unintuitive and overly complex. Therefore, in this article, I will explore how to leverage existing OpenAPI servers as tools for LLMs, instead of rewriting functionalities within a completely new protocol. This could potentially become a simpler standard, requiring only the implementation of an additional authentication flow (if needed).
对于不熟悉 OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。 的人来说,它是一种描述 HTTP API 的规范方式。你可能听说过 Swagger一套用于设计、构建和记录RESTful API的工具集,与OpenAPI规范紧密相关,常用于生成交互式API文档。,它本质上是同一回事。许多 HTTP 框架都支持 OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。,因此你可以生成 JSON 或 YAML 格式的结构化文档,并在 Swagger Editor 等工具中查看。正是由于这种结构化文档的特性,它非常适合作为工具定义输入给 LLM。
For those unfamiliar with OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。, it is a formal way to describe HTTP APIs. You may have heard of Swagger一套用于设计、构建和记录RESTful API的工具集,与OpenAPI规范紧密相关,常用于生成交互式API文档。, which is essentially the same thing. Many HTTP frameworks support it, allowing you to generate structured documentation in JSON or YAML format and view it in tools like the Swagger Editor. Because of this structured documentation, it is perfectly suited to be fed into an LLM as tool definitions.
最终成果位于仓库 ai-tool-proto-experiment。这是一个不到 300 行代码的 Scala 脚本文件。它没有使用任何 LLM SDK,仅通过简单的 HTTP 调用与 LLM 服务提供商通信。它也没有使用任何高级 API,只需要具备结构化输出功能的聊天补全 API。
The final result is in the repository ai-tool-proto-experiment. It is a single-file Scala script with less than 300 lines of code. It does not use any LLM SDK, only simple HTTP calls to LLM providers. It also does not use any advanced APIs; only the chat completion API with structured output is required.
目标与非目标
Goals and Non-Goals
工具服务器只是 MCP 的一部分。MCP 还包含更多功能,如提示词(prompts)和资源(resources)。就我个人而言,我不认为将如此多的用例塞进单一协议有多大好处。
The tool server is only one part of MCP. It includes more features like prompts and resources. Personally, I don't see much benefit in cramming so many use cases into a single protocol.
例如,提示词本质上就是一个服务器 API,你可以通过它获取所有预定义的提示词。这用任何协议都很容易实现,实在没有必要将其与 LLM 工具协议捆绑在一起。
For example, prompts are essentially just a server API from which you can retrieve all pre-defined prompts. This is very easy to implement with any protocol, and there is really no need to combine it with the LLM tool protocol.
因此,在本文中,我们将只探讨如何将其他服务集成为 LLM 的工具,而不关心 MCP 的其他部分,如提示词和资源。
Therefore, in this article, we will only explore how to integrate other services as tools for LLMs, without concerning ourselves with other parts of MCP like prompts and resources.
MCP 也没有解决其他一些问题,比如安全性。这篇文章 总结了 MCP 中的许多安全问题,我认为即使我们使用 OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。 这样的现有协议,也没有简单的解决方法。因此,本实验的目标是仅使用受信任的 OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。 服务器,而不担心工具劫持(tool shadowing)等攻击。话虽如此,对 OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。 服务器进行认证仍然是必要的,这是对服务器的保护,而非客户端。MCP 直到最近才将认证加入规范。正如你将在后文看到的,我在这里尝试的认证工作流要简单和通用得多。
MCP also does not address other issues, such as security. This post summarizes many security problems in MCP, and I don't think there is an easy way around that even if we use existing protocols like OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。. So the goal of this experiment is only to use trusted OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。 servers, without worrying about attacks like tool shadowing. With that said, authentication is still necessary for the OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。 server, which is a protection for the server rather than the client. MCP only recently added authentication to its specification. As you will see later, the authentication workflow I tried here is much simpler and more generic.
最后,尽可能少地使用 LLM 专有 API 也是一个目标,以便更容易地将实现移植到其他 LLM 提供商。
Finally, using as few LLM-specific APIs as possible is also a goal, making it easier to port the implementation to other LLM providers.
实现方案
The Implementation
使用 OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。 之类的方案并非新想法。我在 HackerNews 等地方看到过多人提及。在实现过程中,我也发现了 Open WebUI,这是一个我日常自营和使用的工具,它也添加了将 OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。 服务器用作工具的支持。尽管如此,我仍然尝试进行自己的实验,因为我希望保持其尽可能简单,同时也想更深入地了解这种方法的潜力。
Using something like OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。 is not a new idea. I've seen multiple people mention it on places like HackerNews. And during my implementation, I also found Open WebUI, a tool I self-host and use daily, which also added support for using OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。 servers as tools. Nevertheless, I still try to experiment with my own implementation because I want to keep it as simple as possible and also learn more details about the capabilities of such an approach.
在实验中,我尝试了一个简单的 开源天气 OpenAPI 服务器 和我自己的项目 RSS Brain。我将尝试解释它是如何实现的,并在最后讨论一个实验结果。
In the experiment, I tried both a simple open-source weather OpenAPI server and my own project RSS Brain. I'll try to explain how it is implemented and discuss an experiment result at the end.
定义工具调用结构
Define the Tool Calling Structure
许多 LLM 提供商支持工具调用 API。我们将避免使用这些 API,以保持简单,并使其更普遍地适用于其他 LLM,包括自托管的模型。因此,我们定义自己的 JSON 模式,希望 LLM 遵循,并将其作为系统提示词的一部分输入,同时使用结构化输出 API 来强制 LLM 的响应遵循该 JSON 模式。我在开头说过希望使用尽可能少的功能,但我认为除了基本的聊天补全之外,结构化输出是一个足够重要且必须使用的功能。幸运的是,包括本地模型(如 Ollama)在内的许多其他 LLM 也支持此功能。
Many LLM providers support tool-calling APIs. We will avoid using those APIs to keep things simpler and make the solution more generally available for other LLMs, including self-hosted ones. Therefore, we define our own JSON schema that we want the LLM to follow, feed it as part of the system prompt, and use the structured output API to enforce the LLM's response to adhere to this JSON schema. I said at the beginning that I wanted to use as few features as possible, but I think structured output is an important enough feature that I need to use in addition to basic chat completion. Fortunately, many other LLMs, including local ones like Ollama, also support this feature.
以下是我们期望的响应结构,以 Scala 类定义的形式呈现:
Here are the response structures we expect, presented in the form of Scala class definitions:
case class ToolParam(
httpRequestEndpoint: String,
httpRequestPath: String,
httpRequestHeaders: Option[Map[String, String]],
httpRequestMethod: String,
httpPostBody: Option[String],
)
case class ChatResponse(
callTool: Option[ToolParam] = None,
toUser: Option[String] = None,
)
LLM 应该直接使用 toUser 字段响应用户,或者要求代理使用 callTool 字段调用 HTTP API。你可以看到 ToolParam 的定义非常通用:它基本上可以执行任何 HTTP 调用。
The LLM should either respond to the user directly using the
toUserfield or instruct the agent to call an HTTP API using thecallToolfield. You can see theToolParamdefinition is quite generic: it can essentially perform any HTTP call.
对于 OpenAI,其结构化输出 API 只接受 JSON 模式定义的一个子集。因此,我无法用一行 Scala 代码将结构转换为 JSON 模式,而是需要手动编写与 OpenAI 兼容的模式。
For OpenAI, its structured output API only accepts a subset of JSON schema definitions. So instead of converting the structure to a JSON schema with a single line of Scala code, I need to manually write the OpenAI-compatible one.
我还发现,即使启用了结构化输出,OpenAI 模型(至少是 gpt-4o-latest)也经常无法生成符合结构要求的响应。你仍然需要将 JSON 模式包含在系统提示词中,以获得最佳效果。
I also found that OpenAI models, at least gpt-4o-latest, often fail to generate responses that meet the structural requirements even when structured output is enabled. You still need to include the JSON schema in the system prompt to get the best chance of success.
总而言之,以下是让系统使用工具的系统提示词:
Overall, here is the system prompt to enable the system to use the tools:
val systemPrompt: String = {
val timeStr = ZonedDateTime.now().format(DateTimeFormatter.ISO_ZONED_DATE_TIME)
s"""You are a helpful assistant.
|
|The current time is $timeStr.
|
|You have many tools to use by sending a http request to some API servers. Your response must be Json that
|follows the Json schema definition:
|
|$chatResponseSchemaStr
|
|Either request a call to one of the APIs with `callTool` field, or
|response to user directly with `toUser` field if there is no need to request to any tool or you need more
|information from the user.
|
|Each tool has an optional authUrl that you can ask the user to open in the browser. If you get authentication
| related errors when calling a tool, ask the user to open the authUrl in browser and copy the instruction back,
| then use the instruction to try authentication again.
|
|Important:
|
|* Respond only the JSON body. Never quote the response in something like json```...```.
|* Never respond to user directly without using the `toUser` field with a JSON response.
|* Only one of `callTool` and `toUser` field should be filled.
|* Always include the `http` or `https` part for the `httpRequestEndpoint` field.
|
|""".stripMargin
}
你可以看到末尾有一些额外的要点,这些是我发现模型经常出问题的情况。
You can see there are some extra points at the end, which are cases I found the model often hiccups on.
将工具信息输入 LLM
Feed the Tool Information Into LLM
由于 OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。 可以为 API 服务器生成结构化文档(JSON 或 YAML 格式),我们可以直接将文档输入 LLM。除了文档端点,我们还需要提供 API 服务器的端点,以及一个可选的 authUrl(我们稍后会讨论)。以下是 Scala 类中工具的定义及其提示词:
Since OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。 can generate structured documentation for the API server, either in JSON or YAML format, we can feed the document directly to the LLM. In addition to the documentation endpoint, we also need to provide the endpoint of the API servers, as well as an optional
authUrlwe will discuss later. Here is the definition of the tool in Scala classes, along with the prompts:
case class ToolDef(
httpEndpoint: String,
openAPIPath: String,
authUrl: Option[String] = None,
) {
def prompt: String = {
val authUrlPrompt = authUrl.map(url => s"Tool login URL: $url\n").getOrElse("")
s"""----
|Tool server endpoint: $httpEndpoint
|
|$authUrlPrompt
|Tool's OpenAPI definition:
|$openAPIDef
|
|----
|
|""".stripMargin
}
private def openAPIDef: String = {
requests.get(httpEndpoint + openAPIPath).text()
}
}
在系统提示词之后,工具提示词作为第一条聊天消息以 developer 角色发送给 LLM。我发现这比将其放入系统提示词效果更好,可能是因为工具定义有时会太长:
After the system prompt, the tools prompt is sent as the first chat message to the LLM with a role of
developer. I find it works better than putting it into the system prompt, perhaps because the tool definition can sometimes be too long:
val tools = Seq(
ToolDef(httpEndpoint = "https://grpc-gateway.rssbrain.com", openAPIPath = "/swagger.json",
authUrl = Some("http://app.rssbrain.com/login?redirect_url=/llm_auth")),
)
val toolsPrompt = tools.map(_.prompt).mkString("\n")
val req = ChatRequest(
messages = Seq(
ChatMessage(role = "system", content = systemPrompt),
ChatMessage(role = "developer", content =
s"""
|
|Here are the OpenAPI definition of the tools:
|
|$toolsPrompt
|
|""".stripMargin),
),
)
loop(req, None, waitForUser = true)
认证流程
Authentication
正如你从上面的系统提示词中看到的:
As you can see from the system prompt above:
Each tool has an optional authUrl that you can ask the user to open in the browser. If you get authentication
related errors when calling a tool, ask the user to open the authUrl in browser and copy the instruction back,
then use the instruction to try authentication again.
我们实际上利用了 LLM 的灵活性来实现认证流程:我们为工具服务器定义了一个 authUrl,用户可以在浏览器中打开。该 URL 将执行必要的认证流程,然后返回关于凭据以及如何使用这些凭据进行 API 认证的自然语言描述。
We actually take advantage of the flexibility of LLMs for our authentication flow: we define an
authUrlfor a tool server, which the user can open in the browser. The URL will handle the necessary authentication flow and then return a natural language description about the credentials and how to use them for authentication with the APIs.
理想情况下,自然语言指令应该以安全的方式传递给客户端,例如,通过回调到客户端服务的本地 URL。但为了实验的简单性,我只是要求用户将指令复制回对话中。
Ideally, the natural language instruction should be passed to the client in a secure way, for example, through a callback to a local URL served by the client. But for the simplicity of the experiment, I just ask the user to copy the instruction back into the conversation.
因此,下面是一个示例:
So here is what it looks like in the example below:
User input: Get all my RSS folders
Calling tool https://grpc-gateway.rssbrain.com//rss.FolderAPI/GetMyFolders ...
用户请求获取 RSS 文件夹,因此 LLM 以 callTool 操作响应。当尝试调用 HTTP API 时,它返回了一个关于认证的错误。我们将结果反馈给 LLM,然后它回复用户:
The user asks for the RSS folders, so the LLM responds with a
callToolaction. When trying to call the HTTP API, it returns an error about authentication. We feed the result back to the LLM, and then it responds to the user:
Assistant: It seems like your request for fetching RSS folders requires authentication. Please log in to your RSS Brain account and provide the token to proceed. You can open [this login page](http://app.rssbrain.com/login?redirect_url=/llm_auth) to login and obtain the necessary token.
你可以看到 LLM 要求用户在浏览器中打开一个 URL。当用户在浏览器中打开此 URL 时,服务将提示用户登录,并将用户重定向到一个包含 LLM 自然语言指令的页面。用户将指令复制回聊天中:
You can see the LLM is asking the user to open a URL in the browser. When the user opens this URL in the browser, the service will prompt the user to log in and redirect the user to a page with natural language instructions for the LLM. The user copies the instruction back to the chat:
User input: Use `token` param in the APIs to do authentication. Your current token is `XXXXX`.
将用户输入添加到 LLM 后,LLM 现在知道如何用必要的认证信息填充 callTool 参数,调用最终成功:
After adding the user input to the LLM, the LLM now knows how to fill in the
callToolparameters with the necessary authentication information, and the call finally succeeds:
Calling tool https://grpc-gateway.rssbrain.com//rss.FolderAPI/GetMyFolders ...
Assistant: Here are your RSS folders:
...
这种认证流程使其非常灵活:工具服务器基本上可以实现任何类型的认证方法,只要它提供一个包含认证指令和凭据的 URL。你甚至可以创建一个第三方认证服务器,如果原始的 OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。 服务器不提供此认证工作流,这样你就可以集成任何需要认证的
常见问题(FAQ)
如何利用OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。为LLM集成工具?
该方法使用OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。的结构化HTTP API定义作为工具输入,通过简洁的Scala脚本实现核心集成,仅需添加最小认证流程,省略了MCP的提示和资源等复杂功能。
与MCP相比,OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。方案有哪些优势?
OpenAPI一种用于描述HTTP API的正式规范,支持生成JSON或YAML格式的结构化文档,常用于API文档化和测试,与Swagger类似。方案更简单直观,利用现有HTTP API规范,避免重新编写协议,专注于工具集成核心功能,且认证流程更通用简洁,适合信任的服务器环境。
这个方案支持哪些MCP不具备的功能?
本方案明确排除了MCP的提示词和资源等扩展功能,仅解决工具集成问题,使用结构化输出聊天API,通过300行内Scala脚本演示实现。
版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。
文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。
若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。