llmware框架适合构建本地化私有LLM应用吗？（附300+模型对比）

Q: llmware框架如何保证金融AI应用的数据隐私与安全？

llmware是一个本地化、私有化的统一框架，支持在AI PC或边缘设备上部署，数据无需上传云端，从架构上确保了金融数据的隐私与安全。

Q: 在金融领域使用llmware，有哪些现成的模型可以选择？

框架提供包含300多个模型的目录，涵盖主流开源模型及50多个专为企业流程优化的微调模型（如SLIM、Bling），同时也支持OpenAI等云端模型，方便金融任务选型。

Q: 如何利用llmware构建基于内部金融知识库的智能问答系统？

通过其集成的RAG管道，可以解析和摄取各类金融文档（如报告、合同），构建本地知识库，实现基于准确知识源的查询与提示，生成可靠回答。

llmware 是一个专为构建基于知识的本地化、私有化、安全的大型语言模型（LLM）应用而设计的统一框架。

llmware is a unified framework designed for building knowledge-based, local, private, and secure Large Language Model (LLM) applications.

框架概述

llmware 针对AI PC、本地笔记本电脑、边缘计算和自托管部署进行了优化，支持广泛的Windows、Mac和Linux平台。它集成了GGUF、OpenVINO、ONNXRuntime、ONNXRuntime-QNN (Qualcomm)、WindowsLocalFoundry和Pytorch等多种推理技术，提供了一个高级接口，便于开发者根据目标平台选择并利用最优的推理技术。

llmware is optimized for AI PC and local laptop, edge and self-hosted deployment across a wide range of Windows, Mac and Linux platforms. It integrates various inference technologies such as GGUF, OpenVINO, ONNXRuntime, ONNXRuntime-QNN (Qualcomm), WindowsLocalFoundry, and Pytorch, providing a high-level interface that makes it easy for developers to select and leverage the optimal inference technology for the target platform.

llmware 主要由两大核心组件构成：

llmware consists of two main core components:

包含300+模型的模型目录 - 提供预打包的量化、优化格式模型，以充分利用设备上的GPU和NPU能力。支持主流开源模型系列，以及50多个专为企业流程自动化关键任务优化的llmware微调SLIM、Bling、Dragon和Industry-Bert模型。同时支持来自OpenAI、Anthropic和Google的领先云端模型。

Model Catalog with 300+ Models - Provides pre-packaged quantized and optimized format models to fully leverage on-device GPU and NPU capabilities. Supports major open-source model families and over 50 llmware fine-tuned SLIM, Bling, Dragon, and Industry-Bert models specialized for key tasks in enterprise process automation. Also supports leading cloud models from OpenAI, Anthropic, and Google.
RAG（检索增强生成）管道 - 集成了将知识源连接到生成式AI模型全生命周期的组件，具备广泛的文档解析和摄取能力，并能创建可扩展的知识库。

RAG (Retrieval-Augmented Generation) Pipeline - Integrates components for the full lifecycle of connecting knowledge sources to generative AI models, featuring extensive document parsing and ingestion capabilities, and the ability to create scalable knowledge bases.

通过整合这两大组件，llmware 提供了一套全面的工具集，用于快速构建基于知识的企业级LLM应用。

By integrating these two components, llmware offers a comprehensive set of tools for rapidly building knowledge-based enterprise LLM applications.

我们的愿景是让AI变得可持续、精确且成本效益高，使用尽可能小的计算资源完成任务。

Our vision is to make AI sustainable, accurate, and cost-effective, accomplishing tasks with the smallest possible computational footprint.

实际上，我们所有的示例和模型都可以在设备上直接运行——立即在您的笔记本电脑上开始体验。

Virtually all of our examples and models can be run directly on-device—get started right away on your laptop.

加入我们的 Discord | 观看 YouTube 教程 | 在 Huggingface 上探索我们的模型家族

Join us on Discord | Watch YouTube Tutorials | Explore our Model Families on Huggingface

🎯 核心特性

使用 llmware 编写代码基于以下几个核心概念：

Writing code with llmware is based on the following core concepts:

模型目录

通过统一的方式访问所有模型，无论底层实现如何，查找都同样简单。

Access all models in a unified way with easy lookup, regardless of the underlying implementation.

#   目录中包含300+模型，其中50+为针对RAG优化的BLING、DRAGON和Industry BERT模型
#   全面支持GGUF、OpenVINO、Onnxruntime、HuggingFace、Sentence Transformers及主流API模型
#   易于扩展以添加自定义模型 - 参见示例

from llmware.models import ModelCatalog
from llmware.prompts import Prompt

#   通过ModelCatalog访问所有模型
models = ModelCatalog().list_all_models()

#   使用ModelCatalog中的任何模型 - 使用"load_model"方法并传入model_name参数
my_model = ModelCatalog().load_model("llmware/bling-phi-3-gguf")

#   调用模型进行推理
output = my_model.inference("what is the future of AI?", add_context="Here is the article to read")

#   调用模型进行流式输出
for token in my_model.stream("What is the future of AI?"):
    print(token, end="")

#   将模型集成到Prompt中
prompter = Prompt().load_model("llmware/bling-tiny-llama-v0")
response = prompter.prompt_main("what is the future of AI?", context="Insert Sources of information")

#   300+ Models in Catalog with 50+ RAG-optimized BLING, DRAGON and Industry BERT models
#   Full support for GGUF, OpenVINO, Onnxruntime, HuggingFace, Sentence Transformers and major API-based models
#   Easy to extend to add custom models - see examples

from llmware.models import ModelCatalog
from llmware.prompts import Prompt

#   all models accessed through the ModelCatalog
models = ModelCatalog().list_all_models()

#   to use any model in the ModelCatalog - "load_model" method and pass the model_name parameter
my_model = ModelCatalog().load_model("llmware/bling-phi-3-gguf")

#   call model with: inference
output = my_model.inference("what is the future of AI?", add_context="Here is the article to read")

#   call model with: stream
for token in my_model.stream("What is the future of AI?"):
    print(token, end="")

#   to integrate model into a Prompt
prompter = Prompt().load_model("llmware/bling-tiny-llama-v0")
response = prompter.prompt_main("what is the future of AI?", context="Insert Sources of information")

知识库

摄取、组织和索引大规模的知识集合——解析、文本分块和嵌入。

Ingest, organize, and index a collection of knowledge at scale—Parse, Text Chunk, and Embed.

from llmware.library import Library

#   解析和文本分块一组文档 (pdf, pptx, docx, xlsx, txt, csv, md, json/jsonl, wav, png, jpg, html)

#   步骤 1 - 创建一个知识库，它是“知识库容器”结构
#          - 知识库同时拥有文本集合（数据库）资源和文件资源（例如，llmware_data/accounts/{library_name}）
#          - 嵌入和查询都是针对一个知识库运行的

lib = Library().create_new_library("my_library")

#    步骤 2 - add_files 是通用摄取函数 - 指向包含混合文件类型的本地文件夹
#           - 文件将根据扩展名路由到正确的解析器，进行解析、文本分块并索引到文本集合数据库中

lib.add_files("/folder/path/to/my/files")

#   在知识库上安装嵌入 - 选择一个嵌入模型和向量数据库
lib.install_new_embedding(embedding_model_name="mini-lm-sbert", vector_db="milvus", batch_size=500)

#   在同一知识库上添加第二个嵌入（混合匹配模型 + 向量数据库）
lib.install_new_embedding(embedding_model_name="industry-bert-sec", vector_db="chromadb", batch_size=100)

#   易于为不同的项目和组创建多个知识库

finance_lib = Library().create_new_library("finance_q4_2023")
finance_lib.add_files("/finance_folder/")

hr_lib = Library().create_new_library("hr_policies")
hr_lib.add_files("/hr_folder/")

#    拉取知识库卡片，包含关键元数据 - 文档、文本块、图像、表格、嵌入记录
lib_card = Library().get_library_card("my_library")

#   查看所有知识库
all_my_libs = Library().get_all_library_cards()

from llmware.library import Library

#   to parse and text chunk a set of documents (pdf, pptx, docx, xlsx, txt, csv, md, json/jsonl, wav, png, jpg, html)

#   step 1 - create a library, which is the 'knowledge-base container' construct
#          - libraries have both text collection (DB) resources, and file resources (e.g., llmware_data/accounts/{library_name})
#          - embeddings and queries are run against a library

lib = Library().create_new_library("my_library")

#    step 2 - add_files is the universal ingestion function - point it at a local file folder with mixed file types
#           - files will be routed by file extension to the correct parser, parsed, text chunked and indexed in text collection DB

lib.add_files("/folder/path/to/my/files")

#   to install an embedding on a library - pick an embedding model and vector_db
lib.install_new_embedding(embedding_model_name="mini-lm-sbert", vector_db="milvus", batch_size=500)

#   to add a second embedding to the same library (mix-and-match models + vector db)
lib.install_new_embedding(embedding_model_name="industry-bert-sec", vector_db="chromadb", batch_size=100)

#   easy to create multiple libraries for different projects and groups

finance_lib = Library().create_new_library("finance_q4_2023")
finance_lib.add_files("/finance_folder/")

hr_lib = Library().create_new_library("hr_policies")
hr_lib.add_files("/hr_folder/")

#    pull library card with key metadata - documents, text chunks, images, tables, embedding record
lib_card = Library().get_library_card("my_library")

#   see all libraries
all_my_libs = Library().get_all_library_cards()

查询

使用文本、语义、混合、元数据和自定义过滤器的组合来查询知识库。

Query libraries with a mix of text, semantic, hybrid, metadata, and custom filters.

from llmware.retrieval import Query
from llmware.library import Library

#   步骤 1 - 加载先前创建的知识库
lib = Library().load_library("my_library")

#   步骤 2 - 创建一个查询对象并传入知识库
q = Query(lib)

#    步骤 3 - 运行多种不同的查询（示例中还有许多其他选项）

#    基础文本查询
results1 = q.text_query("text query", result_count=20, exact_mode=False)

#    语义查询
results2 = q.semantic_query("semantic query", result_count=10)

#    结合文本查询，限制仅查询知识库中的特定文档，并对查询进行“精确”匹配
results3 = q.text_query_with_document_filter("new query", {"file_name": "selected file name"}, exact_mode=True)

#   要应用特定的嵌入（如果知识库上有多个），在创建查询对象时传入名称
q2 = Query(lib, embedding_model_name="mini_lm_sbert", vector_db="milvus")
results4 = q2.semantic_query("new semantic query")

from llmware.retrieval import Query
from llmware.library import Library

#   step 1 - load the previously created library
lib = Library().load_library("my_library")

#   step 2 - create a query object and pass the library
q = Query(lib)

#    step 3 - run lots of different queries  (many other options in the examples)

#    basic text query
results1 = q.text_query("text query", result_count=20, exact_mode=False)

#    semantic query
results2 = q.semantic_query("semantic query", result_count=10)

#    combining a text query restricted to only certain documents in the library and "exact" match to the query
results3 = q.text_query_with_document_filter("new query", {"file_name": "selected file name"}, exact_mode=True)

#   to apply a specific embedding (if multiple on library), pass the names when creating the query object
q2 = Query(lib, embedding_model_name="mini_lm_sbert", vector_db="milvus")
results4 = q2.semantic_query("new semantic query")

基于知识源的提示

将知识检索与LLM推理结合的最简单方式。

The easiest way to combine knowledge retrieval with LLM inference.

from llmware.prompts import Prompt
from llmware.retrieval import Query
from llmware.library import Library

#   构建一个提示
prompter = Prompt().load_model("llmware/bling-tiny-llama-v0")

#   添加一个文件 -> 文件被解析、文本分块、通过查询过滤，然后打包为模型就绪的上下文，
#   如果需要，会分批处理以适应模型的上下文窗口

source = promp

## 常见问题（FAQ）

### llmware框架如何保证金融AI应用的数据隐私与安全？

llmware是一个本地化、私有化的统一框架，支持在AI PC或边缘设备上部署，数据无需上传云端，从架构上确保了金融数据的隐私与安全。

### 在金融领域使用llmware，有哪些现成的模型可以选择？

框架提供包含300多个模型的目录，涵盖主流开源模型及50多个专为企业流程优化的微调模型（如SLIM、Bling），同时也支持OpenAI等云端模型，方便金融任务选型。

### 如何利用llmware构建基于内部金融知识库的智能问答系统？

通过其集成的RAG管道，可以解析和摄取各类金融文档（如报告、合同），构建本地知识库，实现基于准确知识源的查询与提示，生成可靠回答。

AI Summary (BLUF)

框架概述

🎯 核心特性

模型目录

知识库

查询

基于知识源的提示