Publicly available gene expression datasets deposited in the Gene Expression Omnibus (GEO) are growing at an accelerating rate. Such datasets hold great value for knowledge discovery, particularly when integrated. Although numerous software platforms and tools have been developed to enable reanalysis and integration of individual, or groups, of GEO datasets, large-scale reuse of those datasets is impeded by minimal requirements for standardized metadata both at the study and sample levels as well as uniform processing of the data across studies. Here, we review methodologies developed to facilitate the systematic curation and processing of publicly available gene expression datasets from GEO. We identify trends for advanced metadata curation and summarize approaches for reprocessing the data within the entire GEO repository.

Original languageEnglish
Pages (from-to)103-110
Number of pages8
JournalBiophysical Reviews
Issue number1
StatePublished - 7 Feb 2019


  • Computational data curation
  • FAIR principles
  • GEO
  • Gene Expression Omnibus
  • Natural language processing


Dive into the research topics of 'Mining data and metadata from the gene expression omnibus'. Together they form a unique fingerprint.

Cite this