Gemini文档处理器如何生成泰语摘要？2025最新AI工具指南

项目概述

Gemini Document Processor 是一款功能强大的文档处理工具，它利用 Google 的 Gemini AI 模型，从 PDF 和 EPUB 文件中生成高质量的泰语摘要。该工具还具备图像提取功能，并能与 Obsidian 笔记软件无缝集成，为用户提供一个从文档处理到知识管理的完整工作流。

Gemini Document Processor 是一款功能强大的文档处理工具，它利用 Google 的 Gemini AI 模型，从 PDF 和 EPUB 文件中生成高质量的泰语摘要。该工具还具备图像提取功能，并能与 Obsidian 笔记软件无缝集成，为用户提供一个从文档处理到知识管理的完整工作流。

核心特性

核心功能

AI 驱动摘要生成：使用 Google 最新的 Gemini 模型（gemini-2.0-flash, gemini-2.5-flash-preview, gemini-1.5-pro）。

AI-Powered Summarization: Uses Google's latest Gemini models (gemini-2.0-flash, gemini-2.5-flash-preview, gemini-1.5-pro).
多格式支持：可处理 PDF 和 EPUB 两种主流文档格式。

Multiple Document Formats: Processes both PDF and EPUB files.
泰语优化摘要：专门针对生成全面、准确的泰语摘要进行优化。

Thai-Focused Summaries: Optimized for creating comprehensive Thai language summaries.

高级处理能力

智能分块处理将大型文档分割成可管理的小块进行AI处理的技术。：将文档分割为易于管理的“块”，以提升 AI 处理性能与效果。

Smart Chunking: Processes documents in manageable chunks for better AI performance.
图像提取与过滤从文档中提取图像并基于大小阈值进行筛选的技术。：从文档中提取图像，并可根据预设的大小阈值进行筛选。

Image Extraction: Extracts and filters images from documents with size thresholds.
鲁棒的错误处理：包含智能重试机制，并可在主模型失效时自动回退到备用模型。

Robust Error Handling: Includes intelligent retry mechanisms with model fallbacks.
超时管理：为 API 调用和分块处理提供可配置的超时设置。

Timeout Management: Configurable timeouts for both API calls and chunk processing.

用户体验

Web 界面：提供简洁、选项卡式的 Web 应用程序，用于文档处理。

Web Interface: Clean, tabbed web application for document processing.
实时进度跟踪：在处理过程中提供实时更新。

Real-time Progress Tracking: Live updates during processing.
任务状态监控：跟踪处理失败的“块”，并可重试有问题的部分。

Job Status Monitoring: Track failed chunks and retry problematic sections.
并行处理：采用多线程进行图像提取，以提高性能。

Parallel Processing: Multi-threaded image extraction for improved performance.

Obsidian 集成

直接导出：可在您的 Obsidian 知识库中直接创建 Markdown 文件。

Direct Export: Create markdown files directly in your Obsidian vault.
元数据支持：生成的 Markdown 文件包含 YAML 前置元数据，支持标签等。

Metadata Support: Includes YAML frontmatter with tags and other metadata.
自定义标签：可为处理后的文档定义您自己的 Obsidian 标签。

Customizable Tags: Define your own Obsidian tags for processed documents.

安装指南

克隆此仓库：

Clone this repository:

git clone https://github.com/kidpeterpan/gemini-document-processor.git
cd gemini-document-processor

安装所需依赖：

Install the required dependencies:
```
pip install -r requirements.txt
```
获取 Google Gemini API 密钥：请前往 Google AI Studio 获取。

Get a Google Gemini API key from Google AI Studio.

使用说明

启动 Web 界面

运行 Web 服务器：

Run the web server:

python document_gui.py

然后在您的网页浏览器中访问：http://127.0.0.1:8081/

Then open your web browser and navigate to: http://127.0.0.1:8081/

Web 界面功能

界面主要分为三个选项卡：

基础设置：

Basic Settings:
- 上传 PDF 或 EPUB 文件。
  
  Upload PDF or EPUB files.
- 选择 Gemini 模型：gemini-2.0-flash（更快）、gemini-2.5-flash-preview（更准确）、gemini-1.5-pro（备用选项）。
  
  Select Gemini model: gemini-2.0-flash (Faster), gemini-2.5-flash-preview (More accurate), gemini-1.5-pro (Backup option).
- 调整分块大小（每个处理单元的页数）。
  
  Adjust chunk size (pages per processing unit).
- 输入您的 Gemini API 密钥。
  
  Enter your Gemini API key.
- 切换图像提取功能。
  
  Toggle image extraction.
Obsidian 集成：

Obsidian Integration:
- 启用自动导出至 Obsidian。
  
  Enable automatic export to Obsidian.
- 验证并设置 Obsidian 知识库路径。
  
  Verify and set Obsidian vault path.
- 配置标签、作者、封面 URL 和评分。
  
  Configure tags, author, cover URL, and review ratings.
- 自动路径验证。
  
  Automatic path validation.
高级设置：

Advanced Settings:
- 配置超时设置：分块处理超时（60-1800 秒）、API 请求超时（30-300 秒）。
  
  Configure timeout settings: Chunk processing timeout (60-1800 seconds), API request timeout (30-300 seconds).
- 设置 API 调用的重试次数。
  
  Set retry attempts for API calls.
- 配置图像大小阈值。
  
  Configure image size thresholds.
- 选择图像格式（PNG/JPG）。
  
  Select image format (PNG/JPG).
- 调整工作线程数量（1-16）。
  
  Adjust worker thread count (1-16).

任务状态与监控

实时进度：在处理过程中查看详细进度。

Real-time Progress: View detailed progress during processing.
日志查看器：实时查看所有处理事件。

Log Viewer: See all processing events as they happen.
失败分块：识别并可重试有问题的部分。

Failed Chunks: Identify and retry problematic sections.
结果管理：下载或查看生成的摘要。

Result Management: Download or view generated summaries.
Obsidian 导出：跟踪已导出至 Obsidian 知识库的文件。

Obsidian Export: Track files exported to your Obsidian vault.

工作原理

文档加载：应用程序加载 PDF 或 EPUB 文件并提取文本内容。

Document Loading: The application loads PDF or EPUB files and extracts text content.
分块处理：内容被分割成易于管理的“块”（PDF 按页，EPUB 按章节）。

Chunking: Content is divided into manageable chunks (by page for PDFs, by chapter for EPUBs).
图像提取：提取图像，进行大小过滤，并单独保存。

Image Extraction: Images are extracted with size filtering and saved separately.
AI 处理：每个“块”被发送至 Gemini API，并带有超时处理和重试机制。

AI Processing: Each chunk is sent to Gemini API with timeout handling and retries.
错误恢复：跟踪处理失败的“块”，并可使用更稳健的设置进行重试。

Error Recovery: Failed chunks are tracked and can be retried with more robust settings.
摘要生成：将结果编译成格式良好的 Markdown 文档。

Summary Creation: Results are compiled into a well-formatted Markdown document.
集成输出：摘要和图像被保存到本地，并可（选择性地）保存到 Obsidian。

Integration: Summary and images are saved locally and (optionally) to Obsidian.

故障排除

常见问题

API 错误：请检查您的 API 密钥和网络连接。

API Errors: Check your API key and internet connection.
处理超时：请在“高级设置”中增加分块和 API 的超时值。

Processing Timeouts: Increase the chunk and API timeout values in Advanced Settings.
失败分块：使用任务状态页面上的“重试失败分块”按钮。

Failed Chunks: Use the "Retry Failed Chunks" button on the job status page.
Obsidian 集成问题：请确保您的 Obsidian 知识库路径正确，并且其中包含 .obsidian 文件夹。

Obsidian Integration: Ensure your Obsidian vault path is correct and contains a .obsidian folder.

错误日志

如需详细的错误信息，请检查终端或命令提示符中的应用程序日志。

For detailed error information, check the application logs in your terminal or command prompt.

项目结构

document_gui.py - Web 界面和任务管理。

Web interface and job management.
document_processor.py - 文档处理的核心逻辑。

Core processing logic for documents.
epub_processor.py - EPUB 文件的特定处理功能。

EPUB-specific processing functionality.
templates/ - Web 界面的 HTML 模板。

HTML templates for web interface.
uploads/ - 上传文件和处理结果的临时存储目录。

Temporary storage for uploaded files and processing results.

许可证

本项目采用 MIT 许可证 - 详情请参阅 LICENSE 文件。

This project is licensed under the MIT License - see the LICENSE file for details.

致谢

本项目使用了以下技术：

This project uses the following technologies:

Google Generative AI API
Flask
PyPDF
ebooklib
Bootstrap（用于 Web 界面）

Bootstrap for the web interface.