OpenViking如何管理AI Agent上下文？分层抽象与递归检索解析

OpenViking is worth analyzing, not because it's currently trending, but because it tackles one of the hardest nuts to crack in Agent systems: context management.

Models have advanced rapidly in recent years, but the problems with a truly continuous-working Agent often don't lie with the model itself, but in areas like these:

Where to store user preferences
How to retain historical tasks
How to organize project documentation and code
How to handle temporary information within the current session
After completing a task, what should be solidified into long-term memory

OpenViking 值得分析，不是因为它最近热，而是因为它踩中了 Agent 系统里最难啃的一块：上下文管理。

模型这两年进步很快，但一个真正能连续工作的 Agent，问题常常不出在模型本身，而出在下面这些地方：

用户偏好放哪儿

历史任务怎么留

项目文档和代码怎么组织

当前会话里的临时信息怎么接住

做完任务以后，哪些东西该沉淀成长期记忆

Traditional RAG is more like a "text fragment retriever": documents are chunked, vectorized, and then a few fragments are selected from a pool of chunks and stuffed into the prompt. This method works fine for FAQ or single-turn Q&A, but when the scenario shifts to long tasks and multi-step execution, problems become very specific:

Directory structure and context boundaries are shattered.
Retrieved content may be locally relevant but globally off-target.
To be safe, systems tend to stuff too much text into the model, driving up token costs.
When problems arise, they are hard to debug because it's difficult to explain how the retrieval chain went astray.

传统 RAG 更像一个"文本碎片检索器"：把文档切块、向量化，再从一堆 chunk 里挑若干片段塞进 prompt。这个办法做 FAQ 或单轮问答没什么问题，但场景换成长任务、多步骤执行，问题就会变得很具体：

目录结构和上下文边界被切碎了。

检索拿到的内容局部相关，全局却可能跑偏。

为了保险，系统容易把太多文本塞进模型，token 成本一路涨。

出了问题不好查，因为很难解释检索链路到底是怎么走偏的。

OpenViking aims to address precisely this class of problems. Instead of continuing down the path of "building a stronger vector database," it changes the abstraction: organizing the Agent's context like a file system.

Its official definition is "open-source context database for AI Agents." The most noteworthy part of this definition is the latter half—it doesn't treat context as a pile of isolated text, but rather as a space with directories, hierarchy, and recursive navigation.

Let's start with an overview diagram:

Image 1: OpenViking Context File System Overview

OpenViking 想处理的就是这一类问题。它没有沿着"做一个更强的向量库"这条路继续卷，而是换了个抽象：把 Agent 的上下文当成文件系统来组织。

官方对它的定义是 open-source context database for AI Agents。这个定义里最值得看的是后半句——它并不把上下文理解成一堆孤立文本，而是理解成一个有目录、有层次、能递归导航的空间。

先看一张总览图：

I prefer to view OpenViking as a context operation layer. It connects to Agents or applications above, and to embedding services, model services, and local storage below. In the middle, it handles three things:

Unified namespace
Hierarchical context processing
Delivering the right content to the model via paths

我更愿意把 OpenViking 看成一层上下文操作层。它上面接 Agent 或应用，下面接 embedding、模型服务和本地存储，中间负责三件事：

统一命名空间

分层加工上下文

按路径把正确内容送到模型面前

Core Concept: Stop Treating Context as Just a Chunk Pool

The key change in OpenViking isn't just a renamed retrieval interface; it's a change in the context abstraction.

In traditional RAG, a piece of knowledge typically follows this path:

Document
Chunk splitting
Embedding
Global similarity retrieval
TopK prompt assembly

In OpenViking, the process is closer to this:

Resource enters the viking:// namespace
Resource is organized into directories and files
Each directory level generates summaries and overviews
Queries first locate directories, then drill down layer by layer
Finally, only the necessary details are read

OpenViking 的关键变化，不是检索接口换了个名字，而是上下文抽象变了。

在传统 RAG 里，一份知识通常会经历这样的路径：

文档

chunk 切分

embedding

全局相似度检索

TopK 拼 prompt

在 OpenViking 里，更接近下面这个过程：

资源进入 viking:// 命名空间

资源被组织成目录和文件

每一层目录都生成摘要和概览

查询先定位目录，再逐层下钻

最后只读取真正要看的细节

The difference between these two approaches might seem like just "an extra layer of directories," but it's significant.

Many Agent tasks aren't about "finding the three most similar text snippets" but rather:

First, find the correct project space
Then, find the correct module or data directory
Then, decide which files to read
Finally, hand a small amount of detail to the model

Humans handle complex project materials in essentially the same way: locate first, then read. OpenViking simply systematizes this process.

这两个思路的区别，看起来只是"多了一层目录"，其实差别很大。

很多 Agent 任务，本来就不是"找到最像的三段文本"，而是：

先找到正确的项目空间

再找到正确的模块或资料目录

再决定读哪几个文件

最后把少量细节交给模型

人类处理复杂项目资料时，基本也是这么干的。先找位置，再看内容。OpenViking 只是把这个过程做成了系统能力。

Architectural Design: What Components Make Up OpenViking?

Based on the official description, README, and repository structure, OpenViking's architecture can be broken down into four layers.

Namespace Layer: `viking://`

The first layer is unified addressing. In official examples, all context is placed under the viking:// protocol, roughly like this:

viking://
├── resources/
│   └── my_project/
├── user/
│   └── memories/
└── agent/
    ├── skills/
    ├── memories/
    └── instructions/

This design seems simple but is crucial. It solves at least three engineering problems:

Resources, memories, and skills are no longer scattered across different backends; addresses are unified.
Directory relationships are preserved and not immediately shattered into parentless fragments.
Subsequent recursive retrieval finally has a real path to follow.

第一层是统一寻址。官方示例中，所有上下文都放在 viking:// 协议下，大致长这样：
viking://
├── resources/
│   └── my_project/
├── user/
│   └── memories/
└── agent/
    ├── skills/
    ├── memories/
    └── instructions/
这层设计看起来简单，但非常关键。

它至少解决了三个工程问题：

资源、记忆、技能不再散落在不同后端，地址统一了。

目录关系能保留下来，不会一上来就被切成无父无子的碎片。

后面的递归检索终于有了真正可以走的路径。

Representation Layer: L0 / L1 / L2

This is the most frequently mentioned set of concepts in OpenViking's public materials:

L0: A one-sentence summary, roughly 100 tokens according to official figures.
L1: A structured overview, roughly 2k tokens.
L2: Full content, loaded only when truly needed.

This isn't just a simple "summary + original text" two-layer structure; it's more like a resolution-switching mechanism.

When the system initially judges relevance, looking at L0 is sufficient.
During planning or decision-making stages, L1 is consulted.
Only when implementation details, code, or full context are needed is L2 read.

What's truly interesting is that the examples in the README show that not only files generate .abstract and .overview, but directories themselves also have their own summaries and overviews. This turns directories from static containers into retrievable nodes.

这是 OpenViking 公开资料里出现最频繁的一组概念：

L0：一句摘要，官方给出的量级大约是 100 tokens。

L1：结构化概览，量级大约是 2k tokens。

L2：完整内容，只有真的需要时才加载。

这不是普通的"摘要 + 原文"两层结构，而更像一个分辨率切换机制。

系统刚开始判断相关性时，看 L0 就够了。

进入规划或决策阶段时，再看 L1。

需要实现细节、代码或完整上下文时，才读 L2。

真正有意思的是，README 里给出的例子不只给文件生成 .abstract 和 .overview，目录本身也会有自己的摘要和概览。这样一来，目录不是静态容器，而成了可检索节点。

Retrieval Layer: Recursive Retrieval Engine

This is where OpenViking is most unique. It doesn't start by fishing for TopK in the global corpus. Instead, it first finds high-scoring directories, then continues searching within them. If there are finer subdirectories, it drills down further.

This chain can be summarized in the following steps:

Analyze query intent and expand retrieval conditions.
Locate highly relevant directories at the L0/L1 layer.
Continue retrieving subdirectories and files after entering a directory.
Continue filtering within the smaller candidate set.
Load L2 details only in the final stage.

The corresponding logic can be illustrated with a short piece of pseudocode:

def recursive_retrieve(query, node):
    intents = expand_query_intent(query)
    candidates = semantic_search(node.children_summaries(), intents)

    hits = []
    for child in top_k(candidates):
        if child.is_directory:
            hits.extend(recursive_retrieve(query, child))
        else:
            hits.append(child)

    return rerank_and_load_details_on_demand(hits)

This isn't source code, just a more digestible representation of the public logic in the README. The focus isn't on the code but on the underlying strategy: First shrink the search space, then read the details.

OpenViking 最像它自己的地方，就在这儿。

它不是一上来就在全局语料里捞 TopK，而是先找高分目录，再进目录里继续找。如果还有更细的子目录，就继续往下走。

这条链路可以概括成下面几步：

分析查询意图，扩展检索条件。

在 L0 / L1 层定位高相关目录。

进入目录后继续检索子目录和文件。

在更小的候选集合里继续筛选。

只在最后阶段读取 L2 详情。

对应的思路，可以用一段很短的伪代码说明：
def recursive_retrieve(query, node):
    intents = expand_query_intent(query)
    candidates = semantic_search(node.children_summaries(), intents)

    hits = []
    for child in top_k(candidates):
        if child.is_directory:
            hits.extend(recursive_retrieve(query, child))
        else:
            hits.append(child)

    return rerank_and_load_details_on_demand(hits)
这不是源码，只是把 README 里的公开思路压成了一个更好懂的形状。重点不在代码，而在背后的策略：先收缩搜索空间，再读细节。

Session and Memory Layer

Many retrieval systems are good at "searching" but not as good at "what to do after use." Agent systems are different; they must handle write-back issues:

Should user preferences be solidified into long-term memory?
Should newly generated documents enter project resources?
Should execution experience learned from this task be reused by the Agent?

OpenViking's public documentation mentions automatic session management, memory compression, and long-term memory extraction. In other words, it's not just a retrieval store; it also attempts to manage the backflow of memory.

很多检索系统擅长"查"，但不太擅长"用完以后怎么办"。Agent 系统不一样，它必须处理回写问题：

用户偏好要不要沉淀成长期记忆

新生成的文档要不要进入项目资源

这次任务学到的执行经验要不要被 Agent 复用

OpenViking 的公开说明里提到自动会话管理、记忆压缩和长期记忆抽取。换句话说，它不只是 retrieval store，还试图负责记忆的回流。

Engineering Form: Not a Single-Language Project

From the repository structure, it's clear OpenViking doesn't cram everything into a single Python package:

openviking: Core Python package
crates/ov_cli, openviking_cli: Rust CLI
AGFS: README mentions requiring Go 1.22+ to build
src and native extension: Handle performance-sensitive parts
bot: Access layer for Agents/Bots

This division is pragmatic. Python handles ecosystem integration and development efficiency, Rust is suitable for CLI, Go handles some system capabilities, and underlying extensions address performance hotspots.

从仓库结构能看到，OpenViking 并不是把所有东西都塞在一个 Python 包里：

openviking：核心 Python 包

crates/ov_cli、openviking_cli：Rust CLI

AGFS：README 提到需要 Go 1.22+ 构建

src 和 native extension：承接性能敏感部分

bot：面向 Agent / Bot 的接入层

这套拆法挺务实。Python 负责生态集成和开发效率，Rust 适合做 CLI，Go 负责部分系统能力，底层扩展则用来处理性能热点。

Workflow: How a Query Travels Through OpenViking

The following diagram clarifies the retrieval path:

Image 2: OpenViking Recursive Retrieval Flowchart

下面这张图可以先把检索路径看清楚：

Breaking it down for a real query, the process is roughly as follows.

Resource Ingestion

The first step is usually to bring external materials into viking://resources/. In the official CLI example, you can directly add a GitHub repository:

openviking-server
ov add-resource https://github.com/volcengine/OpenViking --wait
ov tree viking://resources/volcengine -L 2

Several things typically happen behind the scenes:

Fetching or parsing external resources
Mapping content to directories and files
Generating L0/L1 representations for directories and files
Establishing vector representations for subsequent retrieval

第一步通常是把外部资料收进 viking://resources/。官方 CLI 示例里，直接把一个 GitHub 仓库加进来就行：
openviking-server
ov add-resource https://github.com/volcengine/OpenViking --wait
ov tree viking://resources/volcengine -L 2
这一步背后通常会发生几件事：

拉取或解析外部资源

把内容映射为目录和文件

为目录和文件生成 L0 / L1 表示

为后续检索建立向量表示

Query Entry into the System

When an Agent poses a question, like "How does this project's context loading mechanism work?", OpenViking won't immediately push a large block of text to the model. It first performs coarse positioning at the summary layer to determine:

Is the relevant information more likely in resources/, user/, or agent/?
Which directory is most worth exploring first?
Which directories can be directly excluded?

当 Agent 抛出一个问题，比如"这个项目的上下文加载机制怎么做"，OpenViking 不会立刻把一大段正文推给模型。它先会在摘要层做粗定位，先判断：

相关信息更可能在 resources/、user/ 还是 agent/

哪个目录最值得先进去看

哪些目录可以直接排除

Further Refinement Within Directories

Once a directory is hit, the search space shrinks from the global corpus to a local subtree. The next steps aren't "repeat the global search" but rather continue examining within the directory:

Subdirectory summaries
File summaries
Relevance of local candidates

If information is still insufficient, it continues drilling down.

一旦命中了某个目录，检索空间就从全局语料缩到局部子树。接下来的步骤不是"重复全局搜索"，而是在目录内部继续看：

子目录摘要

文件摘要

局部候选的相关性

如果信息还不够，就继续往下钻。

On-Demand L2 Reading

Only when the system determines "this file is really worth reading" is the full content retrieved. This lazy loading is crucial because it directly impacts token cost.

A typical problem with

常见问题（FAQ）

OpenViking 如何降低 AI Agent 的 token 成本？

OpenViking 通过分层抽象（L0/L1/L2）和递归检索，避免将大量无关文本塞入提示词。它像文件系统一样组织上下文，让 Agent 按需获取必要内容，从而显著减少 token 使用。

OpenViking 与传统 RAG 在上下文管理上有何根本区别？

传统 RAG 将文档切块后做全局相似性检索，容易破坏目录结构和上下文边界。OpenViking 则将上下文视为有目录、有层次的空间（viking:// 命名空间），支持按路径递归导航和分层加工。

OpenViking 的分层抽象（L0/L1/L2）具体指什么？

这是 OpenViking 的核心设计。它代表上下文的不同抽象级别：L0 可能是原始细节，L1 是章节或模块摘要，L2 是更高层概述。这种结构支持从目录定位开始，层层下钻的检索方式。