SNP subset selection for genetic association studies

M. C. Byng, J. C. Whittaker, A. P. Cuthbert, C. G. Mathew, Cathryn M. Lewis

Research output: Contribution to journalArticlepeer-review

28 Scopus citations


Association studies for disease susceptibility genes rely on the high density of SNPs within candidate genes. However, the linkage disequilibrium between SNPs imply that not all SNPs identified in the candidate region need be genotyped. Here we develop several approaches to SNP subset selection, which can substantially reduce the number of SNPs to be genotyped in an association study. We apply clustering algorithms to pairwise linkage disequilibrium measures, with SNP subsets determined for different cut-off values of Δ using nearest and furthest neighbour clusters. Alternatively, SNP subsets may be determined by the proportion of haplotypes they identify. We also show how power calculations, based on the average power to identify a SNP as the disease susceptibility mutation using haplotype-based or logistic regression based statistical analyses, can be used to choose SNP subsets. All these methods provide a ranking method for subsets of a specific size, but do not provide criteria for overall choice of SNP subset size. We develop such criteria by incorporating power calculations into a decision analysis, where the choice of SNP subset size depends on the genotyping costs and the perceived benefits of identifying association. These methods are illustrated using eleven SNPs in the MMP2 gene.

Original languageEnglish
Pages (from-to)543-556
Number of pages14
JournalAnnals of Human Genetics
Issue number6
StatePublished - Nov 2003
Externally publishedYes


Dive into the research topics of 'SNP subset selection for genetic association studies'. Together they form a unique fingerprint.

Cite this