如何减少AI编程代理65%令牌消耗？Vexp图RAG引擎详解

Introduction

I've been building vexp for the past months to solve a problem that kept bugging me: AI coding agents waste most of their context window reading code they don't need.

在过去的几个月里，我一直在构建 Vexp，以解决一个一直困扰我的问题：AI 编码代理浪费了其大部分上下文窗口来读取它们不需要的代码。

The Core Problem: Context Window Inefficiency

When you ask Claude Code or Cursor to fix a bug, they typically grep around, cat a bunch of files, and dump thousands of lines into the context. Most of it is irrelevant. You burn tokens, hit context limits, and the agent loses focus on what matters.

当你要求 Claude Code 或 Cursor 修复一个 bug 时，它们通常会进行 grep 搜索，读取一堆文件，并将数千行代码转储到上下文中。其中大部分是无关的。你消耗了 token，触及上下文限制，并且代理失去了对重要内容的关注。

What Vexp Does: A Semantic Graph Approach

Vexp is a local-first context engine that builds a semantic graph of your codebase (AST + call graph + import graph + change coupling from git history), then uses a hybrid search — keyword matching (FTS5 BM25), TF-IDF cosine similarity, and graph centrality — to return only the code that's actually relevant to the current task.

Vexp 是一个本地优先的上下文引擎，它构建代码库的语义图（AST + 调用图 + 导入图 + 来自 git 历史的变更耦合），然后使用混合搜索——关键词匹配（FTS5 BM25）、TF-IDF 余弦相似度和图中心性——仅返回与当前任务真正相关的代码。

The Core Idea: Graph-RAG Applied to Code

The core idea is Graph-RAG applied to code. The process involves three key stages:

核心理念是将 Graph-RAG 应用于代码。该过程涉及三个关键阶段：

Index — tree-sitter parses every file into an AST, extracts symbols (functions, classes, types), builds edges (calls, imports, type references). Everything stored in a single SQLite file (.vexp/index.db).

索引 — tree-sitter 将每个文件解析为 AST，提取符号（函数、类、类型），构建边（调用、导入、类型引用）。所有内容都存储在单个 SQLite 文件 (.vexp/index.db) 中。
Traverse — when the agent asks "fix the auth bug in the checkout flow", vexp combines text search with graph traversal to find the right pivot nodes, then walks the dependency graph to include callers, importers, and related files.

遍历 — 当代理要求“修复结账流程中的身份验证错误”时，Vexp 将文本搜索与图遍历相结合，以找到正确的枢纽节点，然后遍历依赖图以包含调用者、导入者和相关文件。
Capsule — pivot files are returned in full, supporting files as skeletons (signatures + type defs only, 70-90% token reduction). The result is a compact "context capsule" that gives the agent everything it needs in ~2k-4k tokens instead of 15-20k.

胶囊 — 枢纽文件被完整返回，支持文件则以骨架形式返回（仅签名 + 类型定义，减少 70-90% 的 token）。结果是一个紧凑的“上下文胶囊”，为代理提供所需的一切，仅需约 2k-4k token，而不是 15-20k。

Session Memory: Context That Evolves with Your Code

The latest addition is session memory linked to the code graph. Every tool call is auto-captured as a compact observation. When the agent starts a new session, relevant memories from previous sessions are auto-surfaced inside the context capsule. If you refactor a function that a memory references, the memory is automatically flagged as stale. Think of it as a knowledge base that degrades gracefully as the code evolves.

最新的功能是与代码图关联的会话记忆。每次工具调用都会被自动捕获为一个紧凑的观察记录。当代理开始新会话时，来自先前会话的相关记忆会自动在上下文胶囊中浮现。如果你重构了某个记忆引用的函数，该记忆会自动被标记为过时。可以将其视为一个随着代码演变而优雅降级的知识库。

Technical Architecture

How it works technically:

技术实现方式：

Rust daemon (vexp-core) handles indexing, graph storage, and query execution

Rust 守护进程 (vexp-core) 处理索引、图存储和查询执行
TypeScript MCP server (vexp-mcp) exposes 10 tools via the Model Context Protocol

TypeScript MCP 服务器 (vexp-mcp) 通过模型上下文协议暴露 10 个工具
VS Code extension (vexp-vscode) manages the daemon lifecycle and auto-configures AI agents

VS Code 扩展 (vexp-vscode) 管理守护进程生命周期并自动配置 AI 代理

Supported Ecosystem

Supports 12 agents: Claude Code, Cursor, Windsurf, GitHub Copilot, Continue.dev, Augment, Zed, Codex, Opencode, Kilo Code, Kiro, Antigravity

支持 12 种代理：Claude Code, Cursor, Windsurf, GitHub Copilot, Continue.dev, Augment, Zed, Codex, Opencode, Kilo Code, Kiro, Antigravity
12 languages: TypeScript, JavaScript, Python, Go, Rust, Java, C#, C, C++, Ruby, Bash

12 种语言：TypeScript, JavaScript, Python, Go, Rust, Java, C#, C, C++, Ruby, Bash

Key Design Principles

The index is git-native — .vexp/index.db is committed to your repo, so teammates get it without re-indexing. Local-first, no data leaves your machine. Everything runs locally. The index is a SQLite file on disk. No telemetry by default (opt-in only, and even then it's just aggregate stats like token savings %). No code content is ever transmitted anywhere.

索引是 git 原生的 — .vexp/index.db 被提交到你的代码仓库中，因此团队成员无需重新索引即可获取。本地优先，数据不会离开你的机器。一切都在本地运行。索引是磁盘上的 SQLite 文件。默认情况下没有遥测数据（仅限选择加入，即使如此也只是像 token 节省百分比这样的聚合统计数据）。代码内容永远不会被传输到任何地方。

Getting Started

Try it. Install the VS Code extension: https://marketplace.visualstudio.com/items?itemName=Vexp.vex...

尝试一下。安装 VS Code 扩展：https://marketplace.visualstudio.com/items?itemName=Vexp.vex...

The free tier (Starter) gives you up to 2,000 nodes and 1 repo — enough for most side projects and small-to-medium codebases. Open your project, vexp indexes automatically, and your agent starts getting better context on the next task. No account, no API key, no setup. Docs: https://vexp.dev/docs

免费版（Starter）提供最多 2,000 个节点和 1 个代码仓库 — 对于大多数副项目和小到中型代码库来说足够了。打开你的项目，Vexp 会自动索引，你的代理将在下一个任务中获得更好的上下文。无需账户，无需 API 密钥，无需设置。文档：https://vexp.dev/docs

Call for Feedback

I'd love to hear feedback, especially from people working on large codebases (50k+ lines) where context management is a real bottleneck. Happy to answer any questions about the architecture or the graph-RAG approach.

我非常希望听到反馈，特别是来自那些在大型代码库（5 万行以上）上工作、上下文管理是真正瓶颈的人。很乐意回答任何关于架构或 Graph-RAG 方法的问题。

常见问题（FAQ）

Vexp如何帮助AI编程助手节省token使用量？

Vexp通过构建代码语义图并采用混合搜索技术，仅返回与当前任务相关的代码，将上下文token使用量从15-20k减少到2-4k，实现65-70%的节省。

Vexp的会话记忆功能有什么作用？

会话记忆自动捕获工具调用记录，在新会话中智能浮现相关历史信息，当引用的代码被重构时会标记为过时，形成随代码演进而优雅降级的知识库。

Vexp支持哪些编程语言和AI代理？

支持TypeScript、Python、Go等12种编程语言，兼容Claude Code、Cursor、GitHub Copilot等12种AI代理，通过VS Code扩展和MCP协议集成。