什么是NCBI GEO?2024年基因表达综合数据库全面指南 | Geoz.com.cn
The Gene Expression Omnibus (GEO) is a comprehensive public repository for functional genomics data, supporting raw and processed data from microarray and next-generation sequencing technologies with advanced analysis tools like GEO2R. (GEO是一个全面的功能基因组学数据公共存储库,支持来自微阵列和下一代测序技术的原始和处理数据,并提供GEO2R等先进分析工具。)
Introduction
The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) stands as a cornerstone of modern biological research. Established over two decades ago, it has evolved from a repository primarily for microarray-based gene expression data into a comprehensive, international public archive for a vast array of high-throughput functional genomics data. This blog post synthesizes key information from the foundational and update literature on GEO, providing a clear overview of its purpose, evolution, core features, and utility for the research community.
基因表达综合数据库由美国国家生物技术信息中心维护的国际公共存储库,专门用于存档和分发高通量基因表达及其他功能基因组学数据集。(GEO)隶属于美国国家生物技术信息中心(NCBI),是现代生物学研究的基石。它成立于二十多年前,已从一个主要存储微阵列基因表达数据的仓库,发展成为一个全面的、国际性的公共档案库,用于存储各种高通量功能基因组学研究基因组中基因功能及其相互作用的学科领域,关注基因表达、调控和功能分析。数据。本篇博文综合了关于GEO的基础文献和更新文献中的关键信息,清晰概述了其宗旨、演变、核心功能以及对研究界的实用性。
What is NCBI GEO?
The Gene Expression Omnibus (GEO) is an international public repository that archives and freely distributes high-throughput gene expression and functional genomics data sets. These datasets are generated using technologies such as microarrays and next-generation sequencing (NGS). Researchers typically submit data to GEO to comply with journal and funder mandates that require public data sharing. The repository stores raw data files, processed data files, and descriptive metadata, making all information indexed, searchable, and freely downloadable.
基因表达综合数据库由美国国家生物技术信息中心维护的国际公共存储库,专门用于存档和分发高通量基因表达及其他功能基因组学数据集。(GEO)是一个国际公共存储库,负责归档并免费分发高通量基因表达和功能基因组学研究基因组中基因功能及其相互作用的学科领域,关注基因表达、调控和功能分析。数据集。这些数据集使用微阵列和下一代测序(NGS)等技术生成。研究人员通常向GEO提交数据,以遵守要求公开数据共享的期刊和资助机构规定。该存储库存储原始数据文件、处理后的数据文件和描述性元数据,使所有信息可被索引、搜索和免费下载。
Core Purpose and Evolution
Originally created in 2000 to serve as a global resource for gene expression data, GEO's scope has expanded significantly. It now hosts data from diverse applications beyond gene expression, including:
- 基因组甲基化 (Genome Methylation)
- 染色质结构 (Chromatin Structure)
- 基因组-蛋白质相互作用 (Genome-Protein Interactions)
- 拷贝数变异 (Copy Number Variation)
This adaptability is due to GEO's flexible database infrastructure, which has allowed it to evolve alongside rapidly changing genomic technologies, seamlessly accommodating the shift from microarray-dominated studies to those based on NGS.
GEO最初于2000年创建,旨在成为基因表达数据的全球资源,其范围已显著扩大。它现在托管基因表达之外的各种应用数据,包括:
- 基因组甲基化
- 染色质结构
- 基因组-蛋白质相互作用
- 拷贝数变异
这种适应性得益于GEO灵活的数据库基础设施,使其能够与快速变化的基因组技术同步发展,无缝适应从以微阵列为主的研究向基于NGS的研究的转变。
Key Features and Content
As of recent updates, GEO represents a massive aggregation of scientific data:
- 数据量 (Scale): It archives over 200,000 studies comprising more than 6.5 million samples.
- 数据标准 (Data Standards): GEO supports community-derived reporting standards like MIAME (Minimum Information About a Microarray Experiment), ensuring submitted data are well-annotated and reusable.
- 数据类型 (Data Types): The repository handles fully annotated 原始数据 (raw data), 处理后的数据 (processed data), and 元数据 (metadata).
- 可访问性 (Accessibility): All data are freely accessible through the GEO website: https://www.ncbi.nlm.nih.gov/geo/.
根据最近的更新,GEO代表了海量的科学数据聚合:
- 数据量:它归档了超过 200,000 项研究,包含超过 650 万个样本。
- 数据标准:GEO支持社区衍生的报告标准,如MIAME(微阵列实验最小信息),确保提交的数据注释良好且可重复使用。
- 数据类型:该存储库处理完全注释的原始数据、处理后的数据和元数据。
- 可访问性:所有数据均可通过GEO网站免费访问:https://www.ncbi.nlm.nih.gov/geo/。
Tools for Data Exploration and Analysis
A major strength of GEO is its suite of web-based tools designed to help users find, analyze, and visualize data without requiring extensive bioinformatics expertise or downloading massive datasets.
GEO的一个主要优势在于其一套基于网络的工具,旨在帮助用户查找、分析和可视化数据,而无需大量的生物信息学专业知识或下载庞大的数据集。
Primary Tools and Utilities
- GEO2RAn interactive analysis tool within GEO for identifying differentially expressed genes and assessing dataset quality.: An interactive web tool that allows users to compare groups of samples within a dataset to identify differentially expressed genes. Recent advancements include new interactive graphical plots that help assess dataset quality.
GEO2RAn interactive analysis tool within GEO for identifying differentially expressed genes and assessing dataset quality.:一个交互式网络工具,允许用户比较数据集内的样本组,以识别差异表达基因。最近的进展包括新的交互式图形图,有助于评估数据集质量。
- 搜索与浏览 (Search and Browse): A powerful search engine enables complex queries across studies, samples, and platforms. Data can be explored from both experiment-centric and gene-centric perspectives.
搜索与浏览:强大的搜索引擎支持跨研究、样本和平台的复杂查询。可以从实验中心和基因中心两个角度探索数据。
- 数据可视化 (Data Visualization): Tools include gene expression profile charts, dataset clustering diagrams, and genome browser tracks, facilitating intuitive data interpretation.
数据可视化:工具包括基因表达谱图、数据集聚类图和基因组浏览器轨道,便于直观的数据解读。
- 一致计算的矩阵 (Consistently Computed Matrices): A significant recent development is the provision of uniformly processed gene expression count matrices for thousands of RNA-seq studies, greatly enhancing reproducibility and ease of re-analysis.
一致计算的矩阵:最近的一项重要进展是为数千项RNA-seq研究提供统一处理的基因表达计数矩阵,极大地增强了可重复性和重新分析的便利性。
Conclusion
For over 23 years, NCBI GEO has been an indispensable resource for the functional genomics community. It successfully fulfills the dual role of a mandated data archive and an active discovery platform. By continuously adapting to technological shifts—from microarrays to sequencing—and enhancing its user-friendly analysis tools, GEO empowers researchers worldwide to explore vast amounts of public data, validate findings, and generate new hypotheses. Its commitment to free access, data standards, and community utility solidifies its position as a foundational pillar of open science in genomics.
23年多来,NCBI GEO一直是功能基因组学研究基因组中基因功能及其相互作用的学科领域,关注基因表达、调控和功能分析。界不可或缺的资源。它成功地扮演了强制数据档案库和主动发现平台的双重角色。通过不断适应从微阵列到测序的技术变革,并增强其用户友好的分析工具,GEO使全球研究人员能够探索大量公共数据、验证发现并产生新的假设。其对免费访问、数据标准和社区实用性的承诺,巩固了其作为基因组学开放科学基础支柱的地位。
版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。
文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。
若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。