IDSL.IPA Characterizes the Organic Chemical Space in Untargeted LC/HRMS Data Sets

Sadjad Fakouri Baygi, Yashwant Kumar, Dinesh Kumar Barupal

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Generating comprehensive and high-fidelity metabolomics data matrices from LC/HRMS data remains to be extremely challenging for population-scale large studies (n > 200). Here, we present a new data processing pipeline, the Intrinsic Peak Analysis (IDSL.IPA) R package (, to generate such data matrices specifically for organic compounds. The IDSL.IPA pipeline incorporates (1) identifying potential 12C and 13C ion pairs in individual mass spectra; (2) detecting and characterizing chromatographic peaks using a new sensitive and versatile approach to perform mass correction, peak smoothing, baseline development for local noise measurement, and peak quality determination; (3) correcting retention time and cross-referencing peaks from multiple samples by a dynamic retention index marker approach; (4) annotating peaks using a reference database of m/z and retention time; and (5) accelerating data processing using a parallel computation of the peak detection and alignment steps for larger studies. This pipeline has been successfully evaluated for studies ranging from 200 to 1600 samples. By specifically isolating high quality and reliable signals pertaining to carbon-containing compounds in untargeted LC/HRMS data sets from larger studies, IDSL.IPA opens new opportunities for discovering new biological insights in the population-scale metabolomics and exposomics projects. The package is available in the R CRAN repository at

Original languageEnglish
Pages (from-to)1485-1494
Number of pages10
JournalJournal of Proteome Research
Issue number6
StatePublished - 3 Jun 2022


  • C/C isotope pairs
  • chromatography analysis
  • mass spectrometry
  • metabolomics
  • peak-picking
  • retention time correction
  • untargeted analysis


Dive into the research topics of 'IDSL.IPA Characterizes the Organic Chemical Space in Untargeted LC/HRMS Data Sets'. Together they form a unique fingerprint.

Cite this