如何通过NCBI GEO新提交界面快速上传高通量测序数据?2026年最新操作流程
AI Summary (BLUF)
NCBI has announced a new submission interface for GEO that enables web-based metadata upload, immediate validation and error reporting with how-to-fix instructions, and faster processing.
原文翻译: NCBI宣布了GEO的新提交界面,支持基于网页的元数据上传、即时验证和错误报告(附带修复说明),以及更快的处理速度。
Introduction
Gene Expression Omnibus (GEO)A public repository for gene expression and epigenomics data, established and maintained by the NCBI. is a public functional genomics data repository maintained by NCBI. It relies on community submissions to archive transcriptomic, epigenomic, and other high-throughput sequencing studies. Recent updates to GEO have introduced a new submission interface, comprehensive human RNA-seq count matrices, and enhanced visualization capabilities through the Genome Data Viewer (GDV). This blog post highlights these key developments and provides practical guidance for submitters and data users.
基因表达综合数据库(GEO)是NCBI维护的公共功能基因组学数据仓库,依赖社区提交来存档转录组、表观组及其他高通量测序研究。GEO近期更新引入了新的提交界面、全面的人类RNA-seq计数矩阵,并通过基因组数据浏览器(GDV)增强了可视化能力。本文将重点介绍这些关键进展,并为提交者和数据用户提供实用指南。
Key Updates to GEO Submission Interface
What’s New?
The redesigned submission interface for high-throughput sequence data to GEO brings several improvements:
- A web interface for uploading GEO metadata元数据,描述数据的数据,包括实验设计、样本信息、处理条件等。 (新增基于网页的GEO元数据上传界面)
- Metadata元数据,描述数据的数据,包括实验设计、样本信息、处理条件等。 immediately validated for format and completeness (元数据即刻进行格式与完整性验证)
- Errors reported instantly with how-to-fix instructions (错误即时报告并附带修复指引)
- Faster submission processing (提交处理速度更快)
These changes reduce the time and effort required to deposit data, especially for users unfamiliar with command-line tools. The validation验证,系统自动检查元数据的格式和完整性,确保符合提交标准。 step catches common mistakes early, and the guided error correction helps ensure that submissions meet GEO’s standards.
重新设计的GEO高通量测序数据提交界面带来了多项改进:基于网页的元数据上传、即时格式验证、错误实时报告及修复建议,以及更快的处理速度。这些变化减少了提交数据所需的时间和精力,尤其对不熟悉命令行工具的用户更为友好。验证步骤能及早发现常见错误,引导式纠错功能则有助于确保提交符合GEO标准。
Comparison: Old vs. New Submission Process
| Feature (功能) | Previous Approach (旧方式) | New Approach (新方式) |
|---|---|---|
| Metadata元数据,描述数据的数据,包括实验设计、样本信息、处理条件等。 entry | Manual spreadsheet or command-line | Web-based form with guided fields |
| Validation验证,系统自动检查元数据的格式和完整性,确保符合提交标准。 | Delayed until manual review | Real-time format and completeness check |
| Error feedback | Email notification after hours/days | Instant error report with fix instructions |
| Processing speed | Several days typical | Faster turnaround expected |
| User guidance | Limited documentation | Integrated help within submission interface |
Exploring Human RNA-Seq Count Matrices
Accessing Consistent Gene Expression Data
As of early 2023, GEO provides precomputed gene expression count matrices for all human RNA-seq studies in the repository, covering over half a million samples across thousands of experimental studies. These counts are generated using a standardized pipeline, enabling cross-study comparisons without the need for re-processing raw data.
截至2023年初,GEO为仓库中所有人类RNA-seq研究提供了预计算的基因表达计数矩阵,涵盖数千项实验研究中的超过五十万个样本。这些计数通过标准化流程生成,无需重新处理原始数据即可进行跨研究比较。
How to Search for RNA-Seq Counts
To find studies with available count matrices, simply search GEO Datasets using the filter "rnaseq counts"[Filter]. This returns all human RNA-seq studies for which precomputed counts have been generated.
要查找具有可用计数矩阵的研究,只需使用筛选条件"rnaseq counts"[Filter]搜索GEO数据集。这将返回所有已生成预计算计数的人类RNA-seq研究。
Using the Count Data
| Purpose (用途) | Recommended Action (推荐操作) |
|---|---|
| Download (下载) | Access the count matrix files from the GEO record’s supplementary data section or via the dedicated rnaseqcounts link |
| Visualize (可视化) | Use GEO’s built-in visualization tools or export to third-party applications (e.g., R, Python) |
| Integrate (整合) | Combine multiple studies by mapping gene identifiers and normalizing count values |
For developers and bioinformaticians, the count matrices are available in a tab-separated format, ready for downstream analysis such as differential expression, clustering, or pathway enrichment.
对于开发者和生物信息学人员,计数矩阵以制表符分隔的格式提供,可直接用于下游分析,如差异表达、聚类或通路富集。
Visualizing GEO Data in NCBI’s Genome Data Viewer
Adding External Data Tracks
NCBI’s Genome Data Viewer (GDV) allows users to browse aligned sequencing data from GEO, SRA, and dbGaP as custom tracks. This feature is particularly useful for exploring epigenomic or transcriptomic data in a genomic context.
NCBI的基因组数据浏览器(GDV)允许用户将来自GEO、SRA和dbGaP的比对测序数据作为自定义轨道进行浏览。该功能对于在基因组背景下探索表观组或转录组数据尤为有用。
Step-by-Step Guide
- Open GDV for the desired genome assembly (e.g., GRCh38).
- Click the “Tracks” button on the toolbar, then select “Configure Tracks”.
- Go to the “Find Tracks” tab in the pop-up Configure panel.
- Search for tracks using keywords (e.g., study accession, tissue type, experimental factor). Spaces act as AND operators; wildcards are accepted.
- Select and add the desired tracks to your browser view.
- Configure display settings (e.g., color, scale, track height) as needed.
Note: The “Find Tracks” tab searches across GEO, SRA, and dbGaP repositories simultaneously, making it easy to locate relevant datasets without leaving the browser.
注意: “查找轨道”选项卡可同时搜索GEO、SRA和dbGaP仓库,无需离开浏览器即可轻松定位相关数据集。
Practical Example
If you are studying a specific gene locus and want to visualize ChIP-seq peaks from a GEO study, simply search for the study accession (e.g., GSE12345) in the Find Tracks tab. The aligned reads or signal tracks will appear and can be overlaid with gene annotations, sequence conservation, or other public tracks.
例如,如果您正在研究某个特定基因位点并希望可视化来自GEO研究的ChIP-seq峰,只需在“查找轨道”选项卡中搜索研究登录号(如GSE12345)。比对读段或信号轨道将显示出来,并可与基因注释、序列保护性或其他公共轨道叠加。
Conclusion
NCBI’s Gene Expression Omnibus continues to evolve to meet the needs of the functional genomics community. The new submission interface streamlines data deposition, precomputed RNA-seq count matrices enable reproducible cross-study analyses, and GDV integration provides intuitive visualization of aligned sequencing data. Together, these enhancements make GEO a more powerful and user-friendly resource for researchers worldwide.
NCBI的基因表达综合数据库持续演进以满足功能基因组学社区的需求。新的提交界面简化了数据沉积,预计算的RNA-seq计数矩阵支持可重复的跨研究分析,而GDV集成则提供了直观的比对测序数据可视化。这些增强共同使GEO成为全球研究人员更强大、更易用的资源。
For more information, visit the GEO home page or explore the RNA-seq counts documentation.
欲了解更多信息,请访问GEO主页或查看RNA-seq计数文档。
常见问题(FAQ)
GEO新提交界面有哪些改进?
新界面支持网页上传元数据,即时验证格式和完整性,错误实时报告并附带修复指引,处理速度更快,无需命令行工具。
如何下载GEO中人类RNA-seq的计数矩阵?
在GEO记录的补充数据部分或通过“rnaseqcounts”链接下载计数矩阵文件,格式为制表符分隔,可直接用于下游分析。
在GEO中如何快速找到含有计数矩阵的RNA-seq研究?
在GEO Datasets搜索中使用筛选条件 "rnaseq counts"[Filter],即可返回所有已预计算计数的人类RNA-seq研究。
版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。
文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。
若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。