TY - JOUR
T1 - A novel Markov Blanket-based repeated-fishing strategy for capturing phenotype-related biomarkers in big omics data
AU - Li, Hongkai
AU - Yuan, Zhongshang
AU - Ji, Jiadong
AU - Xu, Jing
AU - Zhang, Tao
AU - Zhang, Xiaoshuai
AU - Xue, Fuzhong
N1 - Publisher Copyright:
© 2016 Li et al.
PY - 2016/3/9
Y1 - 2016/3/9
N2 - Background: We propose a novel Markov Blanket-based repeated-fishing strategy (MBRFS) in attempt to increase the power of existing Markov Blanket method (DASSO-MB) and maintain its advantages in omic data analysis. Results: Both simulation and real data analysis were conducted to assess its performances by comparing with other methods including χ 2 test with Bonferroni and B-H adjustment, least absolute shrinkage and selection operator (LASSO) and DASSO-MB. A serious of simulation studies showed that the true discovery rate (TDR) of proposed MBRFS was always close to zero under null hypothesis (odds ratio = 1 for each SNPs) with excellent stability in all three scenarios of independent phenotype-related SNPs without linkage disequilibrium (LD) around them, correlated phenotype-related SNPs without LD around them, and phenotype-related SNPs with strong LD around them. As expected, under different odds ratio and minor allel frequency (MAFs), MBRFS always had the best performances in capturing the true phenotype-related biomarkers with higher matthews correlation coefficience (MCC) for all three scenarios above. More importantly, since proposed MBRFS using the repeated fishing strategy, it still captures more phenotype-related SNPs with minor effects when non-significant phenotype-related SNPs emerged under χ 2 test after Bonferroni multiple correction. The various real omics data analysis, including GWAS data, DNA methylation data, gene expression data and metabolites data, indicated that the proposed MBRFS always detected relatively reasonable biomarkers. Conclusions: Our proposed MBRFS can exactly capture the true phenotype-related biomarkers with the reduction of false negative rate when the phenotype-related biomarkers are independent or correlated, as well as the circumstance that phenotype-related biomarkers are associated with non-phenotype-related ones.
AB - Background: We propose a novel Markov Blanket-based repeated-fishing strategy (MBRFS) in attempt to increase the power of existing Markov Blanket method (DASSO-MB) and maintain its advantages in omic data analysis. Results: Both simulation and real data analysis were conducted to assess its performances by comparing with other methods including χ 2 test with Bonferroni and B-H adjustment, least absolute shrinkage and selection operator (LASSO) and DASSO-MB. A serious of simulation studies showed that the true discovery rate (TDR) of proposed MBRFS was always close to zero under null hypothesis (odds ratio = 1 for each SNPs) with excellent stability in all three scenarios of independent phenotype-related SNPs without linkage disequilibrium (LD) around them, correlated phenotype-related SNPs without LD around them, and phenotype-related SNPs with strong LD around them. As expected, under different odds ratio and minor allel frequency (MAFs), MBRFS always had the best performances in capturing the true phenotype-related biomarkers with higher matthews correlation coefficience (MCC) for all three scenarios above. More importantly, since proposed MBRFS using the repeated fishing strategy, it still captures more phenotype-related SNPs with minor effects when non-significant phenotype-related SNPs emerged under χ 2 test after Bonferroni multiple correction. The various real omics data analysis, including GWAS data, DNA methylation data, gene expression data and metabolites data, indicated that the proposed MBRFS always detected relatively reasonable biomarkers. Conclusions: Our proposed MBRFS can exactly capture the true phenotype-related biomarkers with the reduction of false negative rate when the phenotype-related biomarkers are independent or correlated, as well as the circumstance that phenotype-related biomarkers are associated with non-phenotype-related ones.
KW - Big omics data
KW - Markov Blanket-based repeated-fishing strategy (MBRFS)
KW - Phenotype-related biomarkers
UR - https://www.scopus.com/pages/publications/84960119739
U2 - 10.1186/s12863-016-0358-5
DO - 10.1186/s12863-016-0358-5
M3 - Article
C2 - 26957081
AN - SCOPUS:84960119739
SN - 1471-2156
VL - 17
JO - BMC Genetics
JF - BMC Genetics
IS - 1
M1 - 51
ER -