TY - JOUR
T1 - Reprint of "abstraction for data integration
T2 - Fusing mammalian molecular, cellular and phenotype big datasets for better knowledge extraction"
AU - Rouillard, Andrew D.
AU - Wang, Zichen
AU - Ma'ayan, Avi
N1 - Funding Information:
Funding: This work was supported in part by grants from the NIH : U54HL127624 , U54CA189201 , R01GM098316 and T32HL007824 .
Publisher Copyright:
© 2015 Elsevier Ltd. All rights reserved.
PY - 2015/12
Y1 - 2015/12
N2 - With advances in genomics, transcriptomics, metabolomics and proteomics, and more expansive electronic clinical record monitoring, as well as advances in computation, we have entered the Big Data era in biomedical research. Data gathering is growing rapidly while only a small fraction of this data is converted to useful knowledge or reused in future studies. To improve this, an important concept that is often overlooked is data abstraction. To fuse and reuse biomedical datasets from diverse resources, data abstraction is frequently required. Here we summarize some of the major Big Data biomedical research resources for genomics, proteomics and phenotype data, collected from mammalian cells, tissues and organisms. We then suggest simple data abstraction methods for fusing this diverse but related data. Finally, we demonstrate examples of the potential utility of such data integration efforts, while warning about the inherit biases that exist within such data.
AB - With advances in genomics, transcriptomics, metabolomics and proteomics, and more expansive electronic clinical record monitoring, as well as advances in computation, we have entered the Big Data era in biomedical research. Data gathering is growing rapidly while only a small fraction of this data is converted to useful knowledge or reused in future studies. To improve this, an important concept that is often overlooked is data abstraction. To fuse and reuse biomedical datasets from diverse resources, data abstraction is frequently required. Here we summarize some of the major Big Data biomedical research resources for genomics, proteomics and phenotype data, collected from mammalian cells, tissues and organisms. We then suggest simple data abstraction methods for fusing this diverse but related data. Finally, we demonstrate examples of the potential utility of such data integration efforts, while warning about the inherit biases that exist within such data.
KW - Bioinformatics
KW - Data integration
KW - Network biology
KW - Systems biology
KW - Systems pharmacology
UR - http://www.scopus.com/inward/record.url?scp=84939780008&partnerID=8YFLogxK
U2 - 10.1016/j.compbiolchem.2015.08.005
DO - 10.1016/j.compbiolchem.2015.08.005
M3 - Review article
AN - SCOPUS:84939780008
SN - 1476-9271
VL - 59
SP - 123
EP - 138
JO - Computational Biology and Chemistry
JF - Computational Biology and Chemistry
ER -