TY - JOUR
T1 - Complete haplotype sequence of the human immunoglobulin heavy-chain variable, diversity, and joining genes and characterization of allelic and copy-number variation
AU - Watson, Corey T.
AU - Steinberg, Karyn M.
AU - Huddleston, John
AU - Warren, Rene L.
AU - Malig, Maika
AU - Schein, Jacqueline
AU - Willsey, A. Jeremy
AU - Joy, Jeffrey B.
AU - Scott, Jamie K.
AU - Graves, Tina A.
AU - Wilson, Richard K.
AU - Holt, Robert A.
AU - Eichler, Evan E.
AU - Breden, Felix
N1 - Funding Information:
We are grateful to T. Brown for assistance with manuscript preparation. We are grateful to Marie-Paule Lefranc and to the IMGT Nomenclature Committee for their help in defining IGHV genes and alleles and for providing the IMGT standardized rules for descriptions of CNVs. C.T.W. was supported in part by a President’s Research Stipend and graduate fellowship awarded by Simon Fraser University. K.M.S. was supported by a Ruth L. Kirschstein National Research Service Award (NRSA) training grant to the University of Washington (T32HG00035) and an individual NRSA Fellowship (F32GM097807). This work was supported by the US National Institutes of Health (grants HG002385 and HG004120 to E.E.E.) and a National Science and Engineering Research Council of Canada grant to F.B. E.E.E. is an Investigator of the Howard Hughes Medical Institute. E.E.E. is on the scientific advisory boards for Pacific Biosciences, SynapDx, and DNAnexus.
PY - 2013/4/4
Y1 - 2013/4/4
N2 - The immunoglobulin heavy-chain locus (IGH) encodes variable (IGHV), diversity (IGHD), joining (IGHJ), and constant (IGHC) genes and is responsible for antibody heavy-chain biosynthesis, which is vital to the adaptive immune response. Programmed V-(D)-J somatic rearrangement and the complex duplicated nature of the locus have impeded attempts to reconcile its genomic organization based on traditional B-lymphocyte derived genetic material. As a result, sequence descriptions of germline variation within IGHV are lacking, haplotype inference using traditional linkage disequilibrium methods has been difficult, and the human genome reference assembly is missing several expressed IGHV genes. By using a hydatidiform mole BAC clone resource, we present the most complete haplotype of IGHV, IGHD, and IGHJ gene regions derived from a single chromosome, representing an alternate assembly of ∼1 Mbp of high-quality finished sequence. From this we add 101 kbp of previously uncharacterized sequence, including functional IGHV genes, and characterize four large germline copy-number variants (CNVs). In addition to this germline reference, we identify and characterize eight CNV-containing haplotypes from a panel of nine diploid genomes of diverse ethnic origin, discovering previously unmapped IGHV genes and an additional 121 kbp of insertion sequence. We genotype four of these CNVs by using PCR in 425 individuals from nine human populations. We find that all four are highly polymorphic and show considerable evidence of stratification (F st = 0.3-0.5), with the greatest differences observed between African and Asian populations. These CNVs exhibit weak linkage disequilibrium with SNPs from two commercial arrays in most of the populations tested.
AB - The immunoglobulin heavy-chain locus (IGH) encodes variable (IGHV), diversity (IGHD), joining (IGHJ), and constant (IGHC) genes and is responsible for antibody heavy-chain biosynthesis, which is vital to the adaptive immune response. Programmed V-(D)-J somatic rearrangement and the complex duplicated nature of the locus have impeded attempts to reconcile its genomic organization based on traditional B-lymphocyte derived genetic material. As a result, sequence descriptions of germline variation within IGHV are lacking, haplotype inference using traditional linkage disequilibrium methods has been difficult, and the human genome reference assembly is missing several expressed IGHV genes. By using a hydatidiform mole BAC clone resource, we present the most complete haplotype of IGHV, IGHD, and IGHJ gene regions derived from a single chromosome, representing an alternate assembly of ∼1 Mbp of high-quality finished sequence. From this we add 101 kbp of previously uncharacterized sequence, including functional IGHV genes, and characterize four large germline copy-number variants (CNVs). In addition to this germline reference, we identify and characterize eight CNV-containing haplotypes from a panel of nine diploid genomes of diverse ethnic origin, discovering previously unmapped IGHV genes and an additional 121 kbp of insertion sequence. We genotype four of these CNVs by using PCR in 425 individuals from nine human populations. We find that all four are highly polymorphic and show considerable evidence of stratification (F st = 0.3-0.5), with the greatest differences observed between African and Asian populations. These CNVs exhibit weak linkage disequilibrium with SNPs from two commercial arrays in most of the populations tested.
UR - http://www.scopus.com/inward/record.url?scp=84875948859&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2013.03.004
DO - 10.1016/j.ajhg.2013.03.004
M3 - Article
C2 - 23541343
AN - SCOPUS:84875948859
SN - 0002-9297
VL - 92
SP - 530
EP - 546
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 4
ER -