GEO

最新文章

114
DeepSeek-OCR:以LLM为中心的视觉文本压缩革命

DeepSeek-OCR:以LLM为中心的视觉文本压缩革命

DeepSeek-OCR introduces a revolutionary LLM-centric approach to OCR that integrates vision processing directly within language models, offering superior performance on complex documents through flexible resolution support and advanced prompt engineering. (DeepSeek-OCR引入了一种革命性的以LLM为中心的OCR方法,将视觉处理直接集成到语言模型中,通过灵活的分辨率支持和先进的提示工程,在复杂文档上提供卓越性能。)
DeepSeek2026/1/22
阅读全文 →
生成对抗网络:AI博弈论驱动的深度学习革命

生成对抗网络:AI博弈论驱动的深度学习革命

Generative Adversarial Networks (GANs) apply game theory to deep learning through competing generator and discriminator networks that produce realistic synthetic data while addressing traditional ML limitations like data scarcity and feature engineering requirements. (生成对抗网络(GANs)通过竞争的生成器和判别器网络将博弈论应用于深度学习,产生逼真的合成数据,同时解决传统机器学习中的数据稀缺和特征工程需求等限制。)
AI大模型2026/1/21
阅读全文 →
VoxCPM:无分词器TTS系统,实现零样本语音克隆与上下文感知生成

VoxCPM:无分词器TTS系统,实现零样本语音克隆与上下文感知生成

VoxCPM is a tokenizer-free TTS system by OpenBMB that models speech in continuous space, enabling context-aware generation and zero-shot voice cloning with near-human quality and efficient performance on consumer hardware. (VoxCPM是OpenBMB开发的无分词器TTS系统,通过在连续空间中建模语音,实现上下文感知生成和零样本语音克隆,具有接近人声的质量和在消费级硬件上的高效性能。)
AI大模型2026/1/21
阅读全文 →
LEANN AI框架:全球最小向量索引,实现本地化RAG革命

LEANN AI框架:全球最小向量索引,实现本地化RAG革命

LEANN is an innovative vector database framework that enables powerful RAG capabilities on local devices with 97% storage reduction through graph-based selective recomputation, maintaining search accuracy while ensuring complete data privacy. (LEANN是一个创新的向量数据库框架,通过基于图的选择性重计算在本地设备上实现强大的RAG能力,减少97%存储空间,保持搜索精度的同时确保完全的数据隐私。)
AI大模型2026/1/21
阅读全文 →
VoxCPM开源语音生成模型:0.5B参数实现真人级语音合成

VoxCPM开源语音生成模型:0.5B参数实现真人级语音合成

VoxCPM is a 0.5B parameter open-source speech generation model achieving human-like voice synthesis with SOTA performance, efficient deployment on consumer hardware, and topping HuggingFace's trend rankings. (VoxCPM是0.5B参数的开源语音生成模型,实现真人级语音合成,达到SOTA性能,支持消费级硬件高效部署,并登顶HuggingFace趋势榜。)
AI大模型2026/1/21
阅读全文 →
VoxCPM:突破语音合成瓶颈,分层语义-声学建模实现零样本性能飞跃

VoxCPM:突破语音合成瓶颈,分层语义-声学建模实现零样本性能飞跃

VoxCPM is a novel tokenizer-free TTS model that resolves the trade-off between discrete tokens and continuous signals through hierarchical semantic-acoustic modeling, achieving state-of-the-art zero-shot performance on a 1.8M-hour bilingual corpus. (VoxCPM是一种新型无标记器TTS模型,通过分层语义-声学建模解决了离散标记与连续信号之间的权衡问题,在180万小时双语语料库上实现了最先进的零样本性能。)
AI大模型2026/1/21
阅读全文 →