
BlockRank如何实现秒级检索500个文档?利用LLM注意力稀疏性提升效率
AI Insight
This paper introduces BlockRank, a method that exploits attention sparsity in LLMs for in-context ranking, reducing complexity from quadratic to linear and enabling efficient retrieval of up to 500 documents within a second.
原文翻译:本文提出BlockRank,利用LLM注意力稀疏性进行上下文排序,将复杂度从二次降至线性,实现秒级检索500个文档。AI大模型2026/4/24
阅读全文 →







