TY - JOUR
T1 - A Novel Framework for Characterizing Genomic Haplotype Diversity in the Human Immunoglobulin Heavy Chain Locus
AU - Rodriguez, Oscar L.
AU - Gibson, William S.
AU - Parks, Tom
AU - Emery, Matthew
AU - Powell, James
AU - Strahl, Maya
AU - Deikus, Gintaras
AU - Auckland, Kathryn
AU - Eichler, Evan E.
AU - Marasco, Wayne A.
AU - Sebra, Robert
AU - Sharp, Andrew J.
AU - Smith, Melissa L.
AU - Bashir, Ali
AU - Watson, Corey T.
N1 - Funding Information:
This study makes use of a sample from an individual recruited by the Pacific Islands Rheumatic Heart Disease Genetics Network. This work was supported in part through the computational resources and staff expertise provided by the Scientific Computing at the Icahn School of Medicine at Mount Sinai. This manuscript has been released as a pre-print at bioRxiv (76). Funding. This work was supported, in part, by grants from the U.S. National Institutes of Health R21AI142590 (to CTW and WAM), NIH R24AI138963 (to CTW and MLS), NIH R21AI117407 (to AB and AJS), NIH 1F31NS108797 (to OLR), NIH HG010169 (to EEE), British Heart Foundation PG/14/26/30509 (to TP), and Medical Research Council UK Fellowship G1100449 (to TP). EEE is an investigator of the Howard Hughes Medical Institute.
Funding Information:
This study makes use of a sample from an individual recruited by the Pacific Islands Rheumatic Heart Disease Genetics Network. This work was supported in part through the computational resources and staff expertise provided by the Scientific Computing at the Icahn School of Medicine at Mount Sinai. This manuscript has been released as a pre-print at bioRxiv (76).
Funding Information:
This work was supported, in part, by grants from the U.S. National Institutes of Health R21AI142590 (to CTW and WAM), NIH R24AI138963 (to CTW and MLS), NIH R21AI117407 (to AB and AJS), NIH 1F31NS108797 (to OLR), NIH HG010169 (to EEE), British Heart Foundation PG/14/26/30509 (to TP), and Medical Research Council UK Fellowship G1100449 (to TP). EEE is an investigator of the Howard Hughes Medical Institute.
Publisher Copyright:
© Copyright © 2020 Rodriguez, Gibson, Parks, Emery, Powell, Strahl, Deikus, Auckland, Eichler, Marasco, Sebra, Sharp, Smith, Bashir and Watson.
PY - 2020/9/23
Y1 - 2020/9/23
N2 - An incomplete ascertainment of genetic variation within the highly polymorphic immunoglobulin heavy chain locus (IGH) has hindered our ability to define genetic factors that influence antibody-mediated processes. Due to locus complexity, standard high-throughput approaches have failed to accurately and comprehensively capture IGH polymorphism. As a result, the locus has only been fully characterized two times, severely limiting our knowledge of human IGH diversity. Here, we combine targeted long-read sequencing with a novel bioinformatics tool, IGenotyper, to fully characterize IGH variation in a haplotype-specific manner. We apply this approach to eight human samples, including a haploid cell line and two mother-father-child trios, and demonstrate the ability to generate high-quality assemblies (>98% complete and >99% accurate), genotypes, and gene annotations, identifying 2 novel structural variants and 15 novel IGH alleles. We show multiplexing allows for scaling of the approach without impacting data quality, and that our genotype call sets are more accurate than short-read (>35% increase in true positives and >97% decrease in false-positives) and array/imputation-based datasets. This framework establishes a desperately needed foundation for leveraging IG genomic data to study population-level variation in antibody-mediated immunity, critical for bettering our understanding of disease risk, and responses to vaccines and therapeutics.
AB - An incomplete ascertainment of genetic variation within the highly polymorphic immunoglobulin heavy chain locus (IGH) has hindered our ability to define genetic factors that influence antibody-mediated processes. Due to locus complexity, standard high-throughput approaches have failed to accurately and comprehensively capture IGH polymorphism. As a result, the locus has only been fully characterized two times, severely limiting our knowledge of human IGH diversity. Here, we combine targeted long-read sequencing with a novel bioinformatics tool, IGenotyper, to fully characterize IGH variation in a haplotype-specific manner. We apply this approach to eight human samples, including a haploid cell line and two mother-father-child trios, and demonstrate the ability to generate high-quality assemblies (>98% complete and >99% accurate), genotypes, and gene annotations, identifying 2 novel structural variants and 15 novel IGH alleles. We show multiplexing allows for scaling of the approach without impacting data quality, and that our genotype call sets are more accurate than short-read (>35% increase in true positives and >97% decrease in false-positives) and array/imputation-based datasets. This framework establishes a desperately needed foundation for leveraging IG genomic data to study population-level variation in antibody-mediated immunity, critical for bettering our understanding of disease risk, and responses to vaccines and therapeutics.
KW - B cell receptor
KW - antibody
KW - immunoglobulin heavy chain locus
KW - long-read sequencing
KW - single nucleotide variation
KW - structural variation
UR - http://www.scopus.com/inward/record.url?scp=85092488310&partnerID=8YFLogxK
U2 - 10.3389/fimmu.2020.02136
DO - 10.3389/fimmu.2020.02136
M3 - Article
C2 - 33072076
AN - SCOPUS:85092488310
SN - 1664-3224
VL - 11
JO - Frontiers in Immunology
JF - Frontiers in Immunology
M1 - 2136
ER -