Design and Analysis of Two-stage Association Studies

Michael Boehnke; Andrew Skol; Laura Scott; Gonalo Abecasis; Jun Li; Robert C. Thompson; Fan Meng; Weihua Guan; Devin Absher; Huda Akil; Stanley Watson; Margit Burmeister; Richard M. Myers
World Congress of Psychiatric Genetics. 2006.

Abstract

Data on human genetic variants from the International HapMap project and the precipitous drop in genotyping costs have made genomewide association studies a practical approach to study the genetics of complex diseases. Such studies require genotyping hundreds of thousands of genetic markers on hundreds or thousands of subjects. I will discuss optimal design and analysis of two-stage association studies in which a subset of the samples is genotyped on all genetic markers in stage 1, and the remaining samples are genotyped on the most interesting markers in stage 2. Consistent with Satagopan et al., we find that two-stage designs can maintain nearly the same power to detect association as the corresponding one-stage design in which all samples are genotyped for all markers. We find that joint analysis of stage 1 and 2 samples is nearly always more powerful than replication-based analysis, despite the need to account for a much larger number of tests. I will address the impact on optimal study design of proportion of samples in stage 1, proportion of markers followed up in stage 2, per genotype cost ratio between stages 1 and 2, and etiologic heterogeneity between stages 1 and 2. I also will describe practical issues that arise in the design of a two-stage genome wide association study of bipolar disorder and in the analysis of stage 1 data on 476 bipolar I cases and 470 ethnically-matched controls generously provided by NIMH Bipolar Disorder Genetics Initiative investigators and Dr. Pablo Gejman.