TY - JOUR
T1 - Unsupervised dimensionality reduction for exposome research
AU - Kalia, Vrinda
AU - Walker, Douglas I.
AU - Krasnodemski, Katherine M.
AU - Jones, Dean P.
AU - Miller, Gary W.
AU - Kioumourtzoglou, Marianthi Anna
N1 - Funding Information:
NIH P30 ES009089, P30 ES023515, R01 ES028805, P30 ES019776, U2C ES030859, RC2DK118619.
Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2020/6
Y1 - 2020/6
N2 - Understanding the effect of the environment on human health has benefited from progress made in measuring the exposome. High-resolution mass spectrometry (HRMS) has made it possible to measure small molecules across a large dynamic range, allowing researchers to study the role of low abundance environmental toxicants in causing human disease. HRMS data have a high dimensional structure (number of predictors >> number of observations), generating information on the abundance of many chemical features (predictors) which may be highly correlated. Unsupervised dimension reduction techniques can allow dimensionality reduction of the various features into components that capture the essence of the variability in the exposome data set. We illustrate and discuss the relevance of three different unsupervised dimension reduction techniques: principal component analysis, factor analysis, and non-negative matrix factorization. We focus on the utility of each method in understanding the relationship between the exposome and a disease outcome and describe their strengths and limitations. Although the utility of these methods is context specific, it remains important to focus on the interpretability of results from each method.
AB - Understanding the effect of the environment on human health has benefited from progress made in measuring the exposome. High-resolution mass spectrometry (HRMS) has made it possible to measure small molecules across a large dynamic range, allowing researchers to study the role of low abundance environmental toxicants in causing human disease. HRMS data have a high dimensional structure (number of predictors >> number of observations), generating information on the abundance of many chemical features (predictors) which may be highly correlated. Unsupervised dimension reduction techniques can allow dimensionality reduction of the various features into components that capture the essence of the variability in the exposome data set. We illustrate and discuss the relevance of three different unsupervised dimension reduction techniques: principal component analysis, factor analysis, and non-negative matrix factorization. We focus on the utility of each method in understanding the relationship between the exposome and a disease outcome and describe their strengths and limitations. Although the utility of these methods is context specific, it remains important to focus on the interpretability of results from each method.
KW - Dimensionality reduction
KW - Exposome
KW - Factor analysis
KW - High-resolution mass spectrometry
KW - Non-negative matrix factorization
KW - Principal components analysis
UR - http://www.scopus.com/inward/record.url?scp=85087015564&partnerID=8YFLogxK
U2 - 10.1016/j.coesh.2020.05.001
DO - 10.1016/j.coesh.2020.05.001
M3 - Review article
AN - SCOPUS:85087015564
SN - 2468-5844
VL - 15
SP - 32
EP - 38
JO - Current Opinion in Environmental Science and Health
JF - Current Opinion in Environmental Science and Health
ER -