TY - JOUR
T1 - IDSL.IPA Characterizes the Organic Chemical Space in Untargeted LC/HRMS Data Sets
AU - Fakouri Baygi, Sadjad
AU - Kumar, Yashwant
AU - Barupal, Dinesh Kumar
N1 - Publisher Copyright:
© 2022 American Chemical Society. All rights reserved.
PY - 2022/6/3
Y1 - 2022/6/3
N2 - Generating comprehensive and high-fidelity metabolomics data matrices from LC/HRMS data remains to be extremely challenging for population-scale large studies (n > 200). Here, we present a new data processing pipeline, the Intrinsic Peak Analysis (IDSL.IPA) R package (https://ipa.idsl.me), to generate such data matrices specifically for organic compounds. The IDSL.IPA pipeline incorporates (1) identifying potential 12C and 13C ion pairs in individual mass spectra; (2) detecting and characterizing chromatographic peaks using a new sensitive and versatile approach to perform mass correction, peak smoothing, baseline development for local noise measurement, and peak quality determination; (3) correcting retention time and cross-referencing peaks from multiple samples by a dynamic retention index marker approach; (4) annotating peaks using a reference database of m/z and retention time; and (5) accelerating data processing using a parallel computation of the peak detection and alignment steps for larger studies. This pipeline has been successfully evaluated for studies ranging from 200 to 1600 samples. By specifically isolating high quality and reliable signals pertaining to carbon-containing compounds in untargeted LC/HRMS data sets from larger studies, IDSL.IPA opens new opportunities for discovering new biological insights in the population-scale metabolomics and exposomics projects. The package is available in the R CRAN repository at https://cran.r-project.org/package=IDSL.IPA.
AB - Generating comprehensive and high-fidelity metabolomics data matrices from LC/HRMS data remains to be extremely challenging for population-scale large studies (n > 200). Here, we present a new data processing pipeline, the Intrinsic Peak Analysis (IDSL.IPA) R package (https://ipa.idsl.me), to generate such data matrices specifically for organic compounds. The IDSL.IPA pipeline incorporates (1) identifying potential 12C and 13C ion pairs in individual mass spectra; (2) detecting and characterizing chromatographic peaks using a new sensitive and versatile approach to perform mass correction, peak smoothing, baseline development for local noise measurement, and peak quality determination; (3) correcting retention time and cross-referencing peaks from multiple samples by a dynamic retention index marker approach; (4) annotating peaks using a reference database of m/z and retention time; and (5) accelerating data processing using a parallel computation of the peak detection and alignment steps for larger studies. This pipeline has been successfully evaluated for studies ranging from 200 to 1600 samples. By specifically isolating high quality and reliable signals pertaining to carbon-containing compounds in untargeted LC/HRMS data sets from larger studies, IDSL.IPA opens new opportunities for discovering new biological insights in the population-scale metabolomics and exposomics projects. The package is available in the R CRAN repository at https://cran.r-project.org/package=IDSL.IPA.
KW - C/C isotope pairs
KW - chromatography analysis
KW - mass spectrometry
KW - metabolomics
KW - peak-picking
KW - retention time correction
KW - untargeted analysis
UR - http://www.scopus.com/inward/record.url?scp=85131270990&partnerID=8YFLogxK
U2 - 10.1021/acs.jproteome.2c00120
DO - 10.1021/acs.jproteome.2c00120
M3 - Article
C2 - 35579321
AN - SCOPUS:85131270990
SN - 1535-3893
VL - 21
SP - 1485
EP - 1494
JO - Journal of Proteome Research
JF - Journal of Proteome Research
IS - 6
ER -