如何用本地AI工具为图片自动生成描述和关键词？

LLMII 是一款利用本地人工智能技术为图像自动生成元数据标签并进行索引的工具。它不依赖于任何云服务或外部数据库。

LLMII 是一款利用本地人工智能技术为图像自动生成元数据标签并进行索引的工具。它不依赖于任何云服务或外部数据库。

该工具在您的计算机上运行一个视觉语言模型，用于为指定目录树中的图像生成描述性标题和关键词。生成的信息随后会被写入每个图像文件的元数据中，或存储在一个独立的附属文件中。之后，您可以使用任何能够读取通用元数据字段的工具，根据这些描述来索引、搜索和组织图像。文件本身不受位置或特定软件的束缚，索引器可以在同一图像集上运行任意多次，以添加更多元数据或处理新图像。

A visual language model runs on your computer and is used to create captions and keywords for images in a directory tree. The generated information is then added to each image file's metadata or to a sidecar file. The images can then be indexed, searched, and organized by their descriptions using any tool you like that can access the common metadata fields. The files themselves are not bound to any location or tied to any software and the indexer can be run any number of times on the same collection in order to add more metadata or process new images.

首次启动该工具时，您将看到一个可供选择的AI模型列表。一旦模型权重下载完成且依赖项安装完毕，该工具及其所有组件都将在本地运行。

The first time you launch the tool you will be presented with a list of AI models to choose from. Once the model weights are downloaded and the dependencies are installed then the tool and all of its components will operate locally.

LLMII 工具界面截图

核心特性

图像分析 (Image Analysis): 利用本地AI模型为每张图像生成关键词列表和描述性标题。
元数据增强 (Metadata Enhancement): 可自动将生成的标签写入图像元数据。
本地化处理 (Local Processing): 所有处理均在您的本地机器上完成。
多格式支持 (Multi-Format Support): 支持广泛的图像格式，包括所有主流相机原始文件。
用户友好GUI (User-Friendly GUI): 包含图形用户界面和安装程序。所有AI功能均依赖于单一可执行文件 Koboldcpp。
图像预览 (Image View): 在处理后查看图像，并逐步检查生成的元数据。
简易模型选择 (Simple Model Selection): 选择最先进的模型，它将自动下载并配置。
全自动后端配置 (Completely Automatic Backend Configuration): AI后端 (KoboldCpp) 将自动下载并以最优设置进行配置。
GPU加速 (GPU Acceleration): 如果可用，将使用 Apple Metal、Nvidia CUDA 或 AMD (Vulkan) 硬件来大幅加速推理。
跨平台 (Cross-Platform): 支持 Windows、macOS ARM 和 Linux。
断点续传 (Stop and Start Capability): 可以随时停止和开始，无需重新处理所有文件。
单步或两步处理 (One or Two Step Processing): 可一步生成关键词和简单标题，或分两步生成关键词和详细标题。
高度可配置 (Highly Configurable): 您可以控制所有设置。
多模型支持 (Multiple Models): 您可以选择不同的模型并再次运行，以添加更多关键词。

重要信息

建议您的机器配备独立显卡。

It is recommended to have a discrete graphics processor in your machine.

该工具直接操作图像文件的元数据。它会写入以下一个或多个字段：

This tool operates directly on image file metadata. It will write to one or more of the following fields:

MWG:Keyword
MWG:Description
XMP:Identifier
XMP:Status

"Status" 和 "Identifier" 字段用于跟踪图像的处理状态。"Description" 字段用于存储图像标题，"Subject" 或 "Keyword" 字段用于存储关键词。

The "Status" and "Identifier" fields are used to track the processing state of images. The "Description" field is used for the image caption, and "Subject" or "Keyword" fields are used to hold keywords.

使用 Identifier 标签意味着您可以管理文件、添加新文件，并随意多次运行该工具，而无需担心重新处理之前已被工具标记过的文件。

The use of the Identifier tag means you can manage your files and add new files, and run the tool as many times as you like without worrying about reprocessing the files that were previously keyworded by the tool.

安装指南

前置条件

Python 3.8 或更高版本。

Python 3.8 or higher.

Windows 安装

克隆仓库或下载 ZIP 文件并解压。
安装 Windows 版 Python。
运行 llmii-windows.bat。

Clone the repository or download the ZIP file and extract it.

Install Python for Windows.

Run llmii-windows.bat.

macOS 安装 (包括 ARM 架构)

克隆仓库或下载 ZIP 文件并解压。
如果尚未安装，请安装 Python 3.7 或更高版本。您可以使用 Homebrew：
```
brew install python
```
安装 ExifTool：
```
brew install exiftool
```
运行脚本：
```
./llmii.sh
```
如果 KoboldCpp 运行失败，请在 'resources' 文件夹中打开终端并执行：
```
xattr -cr ./resources/koboldcpp-mac-arm64
chmod +x ./resources/koboldcpp-mac-arm64
```

Clone the repository or download the ZIP file and extract it.
Install Python 3.7 or higher if not already installed. You can use Homebrew:
brew install python
Install ExifTool:
brew install exiftool
Run the script:
./llmii.sh
If KoboldCpp fails to run, open a terminal in the 'resources' folder:
xattr -cr ./resources/koboldcpp-mac-arm64
chmod +x ./resources/koboldcpp-mac-arm64

Linux 安装

克隆仓库或下载并解压 ZIP 文件。
如果尚未安装，请安装 Python 3.8 或更高版本。使用您的发行版包管理器，例如在 Ubuntu 上：
```
sudo apt-get update
sudo apt-get install python3 python3-pip
```

安装 ExifTool。在 Ubuntu 上：

sudo apt-get install libimage-exiftool-perl

运行脚本：
```
./llmii.sh
```
如果 KoboldCpp 运行失败，请在 'resources' 文件夹中打开终端并执行：
```
chmod +x ./resources/koboldcpp-linux-x64
```

Clone the repository or download and extract the ZIP file.
Install Python 3.8 or higher if not already installed. Use your distribution's package manager, for example on Ubuntu:
sudo apt-get update
sudo apt-get install python3 python3-pip
Install ExifTool. On Ubuntu:
sudo apt-get install libimage-exiftool-perl
Run the script:
./llmii.sh
If KoboldCpp fails to run, open a terminal in the 'resources' folder:
chmod +x ./resources/koboldcpp-linux-x64

使用说明

启动 LLMII GUI：
- Windows: 运行 llmii-windows.bat
- macOS/Linux: 运行 ./llmii.sh
确保 KoboldCPP 正在运行。等待在 KoboldCPP 窗口中看到以下消息：
```
Please connect to custom endpoint at http://localhost:5001
```
在 GUI 中配置索引设置。
点击 "Run Image Indexer" 开始处理。
在 GUI 的输出区域监控进度。

Launch the LLMII GUI:

On Windows: Run llmii-windows.bat

On macOS/Linux: Run ./llmii.sh
Ensure KoboldCPP is running. Wait until you see the following message in the KoboldCPP window:
Please connect to custom endpoint at http://localhost:5001
Configure the indexing settings in the GUI.

Click "Run Image Indexer" to start the process.

Monitor the progress in the output area of the GUI.

多语言支持

有时可以获得英语以外的语言的关键词或标题。您需要修改指令并取消选中某些关键词选项。操作如下：

It is sometimes possible to get keywords or captions in a language besides English. You will need to modify the instruction and uncheck some keyword options. To do this:

在 GUI 中，点击 "Settings"，然后点击 "Edit Instructions"：
- 将指令中的 ENGLISH 替换为其他语言的名称。
在 "Keyword Corrections" 中：
- 取消选中 "Depluralize keywords"。
- 取消选中 "Only Latin characters"。

In the GUI, click on "Settings" and then "Edit Instructions":

Replace ENGLISH in the instructions with the name of a different language.

In Keyword Corrections:

Uncheck "Depluralize keywords".

Uncheck "Only Latin characters".

配置详解

API 设置

API URL: LLM API 服务器的 URL。默认为 http://localhost:5001。

API URL: URL of the LLM API server. Default is http://localhost:5001.

API Password: API 身份验证所需的密码（如果需要）。如果不需要身份验证，请留空。

API Password: Password for API authentication if required. Leave blank if no authentication needed.

指令设置

System Instruction: 用于引导 LLM 行为的系统指令。

System Instruction: The instruction given to guide the LLM's behavior.

Edit Instruction: 打开对话框以编辑图像分析的详细指令。

Edit Instruction: Opens dialog to edit detailed instructions for image analysis.

目录设置

Skip Folders: 在此列出您希望完全跳过的文件夹。用换行符或分号分隔。您无需输入完整路径，仅子目录名称即可。

Skip Folders: list folders here that you want to skip entirely. Separate by a new line or a semicolon. You don't have to put the full path -- the subdirectory name only will suffice.

标题选项

Caption Instruction: 用于生成详细图像标题的特定指令。

Caption Instruction: Specific instructions for generating a detailed image caption.

Separate caption query: 为标题和关键词发送单独的查询。这将花费两倍时间。使用 Caption instruction 和 Keyword instruction。

Separate caption query: Send a separate query for captions and keywords This will take twice as long. Uses the Caption instruction and Keyword instruction.

Combined caption query: 在一个查询中生成标题和关键词。使用主指令。推荐设置

Combined caption query: Generate captions and keywords in one query. This uses the main instruction. Recommended setting

No caption query: 完全跳过标题生成，仅创建关键词。使用 Keyword instruction。

No caption query: Skip caption generation entirely, only create keywords. This uses the Keyword instruction.

生成选项

GenTokens: 响应中生成的最大令牌数。这是令牌数，不是单词数。较少的令牌意味着每次生成处理更快，但可能导致更多重试，因为模型可能在生成过程中被截断。但更多令牌也不一定更好。对于单独的标题或关键词，推荐设置在 100 到 200 之间；对于组合的标题和关键词，推荐设置在 200 到 300 之间。

GenTokens: Maximum number of tokens to generate in response. These are tokens, not words. Fewer tokens means faster processing per generation but may lead to more retries because the model may get cut off mid generation. More is not necessarily better though. Recommended setting for separate captions or keywords is between 100 and 200, for combined caption and keywords between 200 and 300.

图像选项

Dimension length: 图像水平或垂直方向的最大长度（以像素为单位）。设置得更高不一定能带来更好的生成结果。推荐设置在 392 到 896 之间。

Dimension length: The maximum length of a horizontal or vertical dimension of the image, in pixels. Setting this higher will not necessarily result in better generations. Recommended setting is between 392 and 896.

采样器选项

采样器影响 AI 每次生成新令牌时可以从哪些令牌中选择。

Samplers affect the tokens that the AI can choose from every time it generates a new token.


参数	说明	范围	推荐设置
Temperature	模型输出的随机性。	0.0 - 2.0	0.1 - 0.5
top_p	从概率超过 p 的最小令牌集中选择。关闭 = 1.0	0.0 - 1.0	0.92 - 1
top_k	限制为最可能的 k 个令牌。关闭 = 0	整数	20 - 100
min_p	阻止概率低于 p 的令牌。关闭 = 0.0	0.0 - 1.0	0.02 - 0.05
rep_pen	防止重复。可能导致不稳定行为。关闭 = 1.0	≥ 1.0	1.0 - 1.02

文件选项

Don't go in subdirectories: 仅处理主目录中的图像，不查看其他子目录。

Don't go in subdirectories: Only process images in the main directory, don't look inside others.

Reprocess everything: 处理所有图像，即使它们已有元数据。如果选中此项并不选中 Don't clear existing keywords，它将从任何先前处理过的文件中移除所有现有关键词，并用新生成的内容替换。

Reprocess everything: Process all images, even if they already have metadata. If you check this and leave Don't clear existing keywords unchecked it will remove all existing keywords from any previously processed files and replace them with the new generations.

Reprocess failures: 重新处理在先前运行中被标记为失败的图像。

Reprocess failures: Reprocess images that were marked as failed in previous runs.

Fix any orphans: 已弃用设置 当文件被处理时，会添加一些元数据，以便工具知道它已被处理以及上次处理的状态。如果我们发现看起来具有有效元数据（由该工具处理过）但状态标记缺失的图像，我们称之为“孤儿”。此选项将为孤儿添加状态标记，而无需重新生成元数据。如果不选中此项，则在移除对 JSON 数据库需求之前的工具版本生成的文件将被作为新文件重新处理。如果选中此项，则如果图像中存在对工具看起来有效的错误元数据，它会将这些文件标记为成功。建议仅在您使用过 2025 年 3 月之前的旧版本工具，并再次在这些文件上运行时才使用此选项。

Fix any orphans: Decrecated setting When a file gets processed it gets some metadata added to it so that the tool knows it has been processed and what the state of the last processing was. If we find images with what looks like valid metadata that was processed by the tool, but the status markers are missing, we call these orphans. This option will add the status marker to the orphans without regenerating the metadata. Without this checked then files which were produced with versions of the tool before the removal of the need for the json database will be processed again as new files. With this checked then if there is bad metadata in images that looks valid to the tool, it will mark those files as a success. It is recommended to use this option only if you have used previous versions of this tool before March 2025 and are running on those files again.

No backups: 在修改前不创建现有元数据的备份

常见问题（FAQ）

LLMII工具需要联网或依赖云端服务吗？

完全不需要。LLMII是一款本地化工具，所有AI模型和处理都在您的计算机上运行，不依赖任何云服务或外部数据库，确保数据隐私。

LLMII支持哪些操作系统？安装复杂吗？

支持Windows、macOS（包括ARM架构）和Linux。安装过程简单：下载解压后，根据系统运行对应脚本即可，工具会自动配置AI后端和依赖项。

LLMII生成的元数据如何帮助我管理图片？

工具将自动生成的描述和关键词直接嵌入图片文件的元数据中。您可以使用任何能读取通用元数据字段的软件，根据这些信息轻松索引、搜索和组织图片库。

AI Summary (BLUF)