Statistical methods for finding associations between genes and common disease

Taane Clark supervised by Robert Griffiths (Oxford) and Maria De Iorio (London)

Large-scale association studies hold promise for discovering the genetic basis of common human diseases. These studies will consist of a large number of individuals, as well as a large number of genetic markers, such as single nucleotide polymorphisms (SNPs). The potential size of the data and the resulting model space require the development of efficient methodology to unravel associations between disease outcomes and SNPs in dense genetic maps. Due to evolutionary processes the human genome has a block-like structure, with haplotype blocks consisting of SNPs in high linkage disequilibrium (LD), that is, SNPs that are highly correlated with each other. We developed methods for the analysis of association studies that exploit the block structure and incorporate population genetic measures into the model building.

Two methods construct logic trees consisting of Boolean expressions involving SNPs (as leaves), that may be associated with either a continuous or binary outcome. Our first method assumed little or no recombination and uses a perfect phylogeny to demonstrate the evolutionary relationship between SNPs in the haplotype blocks.

The approach extends the logic regression technique of Ruczinski et al 2003 to a Bayesian framework, and constrains the model space to that of a perfect phylogeny. A second method incorporated recombination and uses a genetic algorithm to build logic trees consisting of SNPs within and between leaves that are in high and low LD respectively. As both methods are within (Bayesian) regression frameworks, non-genetic factors, as well as their interactions with SNPs, may be incorporated. The methods were applied successfully to candidate gene data for hypertension.

Funding: NHS South East Region Research Training Fellowship in Medical Statistics