TY - JOUR
T1 - RFMix
T2 - A discriminative modeling approach for rapid and robust local-ancestry inference
AU - Maples, Brian K.
AU - Gravel, Simon
AU - Kenny, Eimear E.
AU - Bustamante, Carlos D.
N1 - Funding Information:
We thank Andres Moreno for providing data and Fouad Zakharia and Suyash Shringarpure for helpful comments. This work was supported by National Science Foundation (NSF) Graduate Research Fellowship grant DGE-1147470, National Library of Medicine training grant LM007033, National Human Genome Research Institute grant 2R01HG003229, and NSF Division of Mathematical Sciences grant 1201234. C.D.B. consults for Personalis, Inc., Ancestry.com , Invitae (formerly Locus Development), and the 23andMe.com project “Roots into the Future.” None of these entities played any role in the design of the research or interpretation of results presented here.
PY - 2013/8/8
Y1 - 2013/8/8
N2 - Local-ancestry inference is an important step in the genetic analysis of fully sequenced human genomes. Current methods can only detect continental-level ancestry (i.e., European versus African versus Asian) accurately even when using millions of markers. Here, we present RFMix, a powerful discriminative modeling approach that is faster (∼30×) and more accurate than existing methods. We accomplish this by using a conditional random field parameterized by random forests trained on reference panels. RFMix is capable of learning from the admixed samples themselves to boost performance and autocorrect phasing errors. RFMix shows high sensitivity and specificity in simulated Hispanics/Latinos and African Americans and admixed Europeans, Africans, and Asians. Finally, we demonstrate that African Americans in HapMap contain modest (but nonzero) levels of Native American ancestry (∼0.4%).
AB - Local-ancestry inference is an important step in the genetic analysis of fully sequenced human genomes. Current methods can only detect continental-level ancestry (i.e., European versus African versus Asian) accurately even when using millions of markers. Here, we present RFMix, a powerful discriminative modeling approach that is faster (∼30×) and more accurate than existing methods. We accomplish this by using a conditional random field parameterized by random forests trained on reference panels. RFMix is capable of learning from the admixed samples themselves to boost performance and autocorrect phasing errors. RFMix shows high sensitivity and specificity in simulated Hispanics/Latinos and African Americans and admixed Europeans, Africans, and Asians. Finally, we demonstrate that African Americans in HapMap contain modest (but nonzero) levels of Native American ancestry (∼0.4%).
UR - https://www.scopus.com/pages/publications/84881665614
U2 - 10.1016/j.ajhg.2013.06.020
DO - 10.1016/j.ajhg.2013.06.020
M3 - Article
C2 - 23910464
AN - SCOPUS:84881665614
SN - 0002-9297
VL - 93
SP - 278
EP - 288
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 2
ER -