Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Finding common susceptibility variants for complex disease: past, present and future

Finding common susceptibility variants for complex disease: past, present and future The identification of complex disease susceptibility loci has been accelerated considerably by advances in high-throughput genotyping technologies, improved insight into correlation patterns of common variants and the availability of large-scale sample sets. Linkage scans and small-scale candidate gene studies have now given way to genome-wide association scans. In this review, we summarize insights gained from the past, highlight practical issues relating to the design and analysis of current state-of-the-art GWA studies and look into future trends in the field of human complex trait genetics. Keywords: association study; complex disease; single nucleotide polymorphism; genome-wide association scan; meta-analysis; sequencing INTRODUCTION robust identification. The journey has witnessed Common complex diseases have traditionally been study design trends come and go, with valuable les- sons learnt from each such era. Rapid technological ascribed to complicated networks of genetic and developments, coupled with the availability of larger environmental factors. The search for genetic suscep- sample sizes and a better understanding of human tibility loci has been much more straightforward for genome sequence variation, continue to facilitate Mendelian disorders than for multifactorial traits, progress in the field. In this review, we aim to where numerous variants of modest or small effect distil lessons from the past few years in the field of sizes contribute to the genetic background of disease. complex disease genetics, describe the present state- The common disease–common variant and multiple of-the-art for finding common susceptibility loci and rare variant hypotheses had been proposed as distinct look into emerging themes for the near future. scenarios and polarized the field of complex disease genetics for some time. However, emerging evi- dence indicates that the genetic aetiology of complex traits is likely to be based on a combination of mul- PAST tiple rare and common susceptibility loci. Genetic association studies have, over the last decade, The field of human complex trait genetics evolved from genome-wide linkage scans to candi- has undergone major transformation over the past date gene approaches, to gene-centric designs aiming decade. Researchers have gradually moved from to capture the majority of common variation and, family-based approaches for investigating linkage ultimately, to genome-wide association (GWA) to association studies offering (and, lately, deliver- scans. Several factors have influenced this trajectory, ing) the promise of complex disease locus including our understanding of human genome Corresponding author. Eleftheria Zeggini, Wellcome Trust Sanger Institute, The Morgan Building, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1HH, UK. Tel: +44 1223 496868; Fax: +44 1223 496826; E-mail: eleftheria@sanger.ac.uk Kalliope Panoutsopoulou is a postdoctoral research fellow at the Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK, working towards the identification of genetic variants conferring susceptibility to osteoarthritis. Eleftheria Zeggini is an investigator in Human Genetics at the Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK, where she leads the Applied Statistical Genetics team. Her research focus is on design, analysis and interpretation issues in large-scale complex disease association and resequencing studies. 2009 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by- nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. 346 Panoutsopoulou and Zeggini sequence variation, and ongoing development of factors: low power (as a result of small sample sizes) to detect what we now recognize as modest or small genotyping technologies (moving from low- to medium- to high-throughput approaches). effects; limited understanding of disease aetiopatho- Family-based linkage studies prevailed in the lit- genesis leading to inappropriate selection of candi- erature for several years as they constituted the only date loci; low thresholds for declaring significance means of targeting variation genome-wide at the and over-interpretation of results; and inadequate time. Linkage studies tended to lead to the identifi- capture of variation across the genes of interest. cation of numerous peaks that were rarely repro- The International HapMap Project [11] greatly duced in independent studies. For example, in type increased our understanding of correlation patterns 2 diabetes (T2D), although more than 40 linkage (LD) between common variants across the genome. This enabled the selection of maximally informative, scans have been performed, the overall picture has non-redundant sets of markers across genes or been one of multiple modest signals, few of which regions of interest. A wide variety of haplotype- show evidence of replication [1, 2]. Linkage signals based and pairwise tagging methods were developed typically encompass several megabases of sequence [12–15]. Tag SNP studies continue to be carried out; and the resulting localization resolution is low they employ information from relevant HapMap [although this improved marginally when single populations to select SNPs capturing the majority nucleotide polymorphism (SNP)-based linkage of common variation across targeted loci. These scans were introduced] [3, 4]. Consortia formed for markers are then genotyped and analysed in the data- the meta-analysis of linkage scans of particular phe- sets of interest, and inferences about their proxy notypes served to distil the number of statistically variants are made on the basis of the association believable linkage peaks [2] and promising signals patterns observed. were traditionally followed up by fine-mapping Advances in high-throughput, high-accuracy experiments [5]. Very few such endeavours have genotyping platforms marked a new era for associa- led to the identification of causal disease susceptibility tion studies, enabling the concurrent examination of variants [6, 7]. This is perhaps not surprising, as link- hundreds of thousands of SNPs. Sufficient power age disequilibrium (LD) mapping efforts under link- in GWA studies was facilitated by the availability age peaks tended to make use of SNPs with common of large-scale sample collections. Over the last few minor allele frequencies (MAFs), whereas linkage years, GWA scans have succeeded in detecting signals were more likely to reflect more penetrant and establishing complex trait associations, and effects of rare variants. Moreover, because of the rel- have started to provide valuable insights into disease atively small number of families and microsatellite aetiopathogenesis. markers used, most of these studies may have been underpowered to detect many of the effects that association approaches have thus far discovered. The field shifted towards association studies, PRESENT exemplified over the last decade by the candidate GWA studies undoubtedly constitute the present gene study. Candidate gene studies focused on a state-of-the-art in efforts to elucidate the genetic few, if not just a single, variant(s) within a biologi- aetiology of complex phenotypes. Several commer- cally plausible candidate gene. They were typically cial products offering the potential to simultaneously carried out in a few hundreds of disease cases and assay hundreds of thousands of SNPs genome-wide controls, or in a few hundreds of nuclear families, are available from companies such as Affymetrix consisting of affected offsprings and unaffected par- and Illumina. These have varying SNP content and ents. The latter approach (transmission disequilib- density, and have been designed using diverse rium test) [8] reached high popularity levels in the marker selection strategies (Table 1). For example, nineties due to its property of being robust to pop- arrays with an exon-centric SNP content, such as ulation stratification. Although several notable the Illumina Human-1, reflect strategies focusing exceptions exist (for example [9, 10] from the field on potentially functional variants. LD-based plat- of T2D), candidate gene studies on the whole did forms contain tag sets of SNPs selected to maximize not deliver many robustly replicating disease suscep- the amount of common variation captured on the tibility loci. This irreproducibility of results could be basis of HapMap data. Affymetrix platforms comprise ascribed to a combination of several contributing quasi-randomly distributed SNPs or a combination Common susceptibility variants for complex disease 347 Table 1: Overview of marker content and array design across commercially available platforms and coverage of common variation (MAF>0.05)based on HapMap phaseIIdata Platform Number of Array Coverage Coverage in Coverage Source a b c d markers design in CEU (%) JPT +CHB (%) inYRI (%) Illumina Human-1 More than 109 000 Gene 26 28 12 [16] Illumina HumanHap300 317 511 Tag 75 63 28 [16] Affymetrix SNP Array 5.0 500 568 Random 65 66 41 [16] Illumina HumanHap550 555352 Tag 87 83 50 [17] Illumina Human610 620 901 Tag, CNV 89 86 58 [18] Illumina HumanHap650Y 660 917 Tag 87 84 60 [17] Affymetrix SNP Array 6.0 More than 1800 000 Random + Tag, CNV 83 84 62 [17] Illumina Human1M 1199187 Tag, CNV 93 92 68 [17] Utah residents with ancestry from northern and western Europe. Japanese fromTokyo, Japan. Han Chinese from Beijing, China. Yoruba from Ibadan, Nigeria. Copy number variation. of random and tag SNPs. In recognition of their loci are likely to have modest or small effect sizes potential role in complex disease susceptibility, [allelic odds ratios (ORs) between 1.1 and 1.5]. copy number variants (CNVs) are also increasingly In a genome-wide setting, the large number of featured. tests performed requires stringent thresholds Table 1 summarizes the extent to which different for declaring statistical genome-wide significance platforms capture common (MAF > 0.05) variation (P¼ 5 10 ) [22, 23], necessitating large-scale based on published evaluations in the three different sample sizes. For example, in order to achieve 90% HapMap phase II populations [11]. Coverage power to detect a risk allele with 0.20 frequency and in European- and East Asian-descent populations an allelic OR of 1.2 (at the genome-wide signifi- is very high and has substantially improved with cance level), more than 6000 affected individuals next generation chips. Information capture in and twice as many controls would be required African-descent populations is lower, reflecting (Figure 1). To achieve the same power to detect higher recombination rates and lower levels of similar effects at lower frequency variants (frequency inter-marker correlation. However, it has been of 0.05 or less), a GWA study would need upwards shown theoretically that coverage of all common of 20 000 cases (Figure 1). variation based on HapMap has been overestimated Along with sample size considerations, GWA and that larger sample sizes and denser marker sets are studies have also given rise to several logistical required for more accurate estimation of tagging challenges: for example, issues relating to automated SNP efficacy [19, 20]. Overestimation of previously but accurate genotype calling, programmatic data reported coverage estimates has also been empirically handling and parsing, genotype quality control confirmed by the analysis of sequence-derived vari- (QC) standards and analytical considerations that ation data from 76 genes in HapMap samples [21]. did not previously apply to smaller scale studies. Although variation capture is an important consider- Genotype calling is the process by which hybrid- ation in GWA study design, it is not the sole deter- ization intensities on genome-wide chips are trans- minant of power. lated into genotypes. Typically, intensities are The statistical power of a GWA study to detect normalized and transformed into coordinates which variants associated with disease is a function of sample yield distinct genotype clouds. As high call rate and size, the susceptibility locus effect magnitude, risk accuracy of genotype calling are important factors in allele frequency of the queried SNP and its correla- safe-guarding QC standards in GWA scans, a variety tion with the causal variant. Although the allelic of genotype calling algorithms have been developed architecture of complex traits has not been fully and continue to evolve [24–27]. The possible adverse characterized yet, recent GWA scans and follow-up effects of inaccurate genotype calling in downstream studies have highlighted that common susceptibility analyses have been recognized for a while [28]. 348 Panoutsopoulou and Zeggini Figure 1: Number of affected individuals required (given a case/control ratio of 1:2) in order to achieve 10, 50 and 90% power to detect an effect at ¼ 510 for variants with modest to low effect sizes (allelic odds ratios 1.10, 1.15 and 1.20) and varying risk allele frequencies: (a)0.05, (b)0.20, (c) 0.50 and (d) 0.90. Calculations assume complete LD between the causal and genotyped variant. Therefore, inspection of intensity plots for interest- ing association signals is an essential aspect of geno- type QC. Genotype QC is an extremely important step in GWA studies, as it can dramatically reduce the number of false positive associations. The field has converged to an essential set of QC checks; Figure 2 summarizes the sample- and SNP-based QC steps that are typically employed. SNP call rate is a good indicator of genotype probe performance. Removing SNPs with a greater proportion of missing genotypes is essential to con- trol for false positives, as spurious associations can arise due to non-random missingness. Checking for gross departure from Hardy–Weinberg equilibrium (HWE) could help in identifying SNPs with geno- Figure 2: Flowchart of the main quality control steps typing errors (e.g. excess of heterozygotes). in a GWA study. Common susceptibility variants for complex disease 349 As clustering algorithms tend to perform less well sliding haplotype window analyses) are less feasible for SNPs with low-frequency alleles, it is current at the genome-wide scale. However, imputation practice in GWA studies to exclude rare SNPs approaches have recently been developed to take from single point analyses (these are underpowered into account information from multiple surrounding to detect effects anyway). Genotype calling algo- markers in order to infer genotypes at untyped loci rithms have the potential to make incorrect calls. [33]. Imputation therefore currently allows testing Therefore, inspecting intensity plots, though not for association at >2.5 million markers genome- feasible on a genome-wide scale, is necessary for wide, thus maximizing information output from SNPs with interesting association signals. GWA studies, and additionally serves as an ideal Sample call rate is a good indicator of hybridiza- tool for the combination of data from GWA scans tion performance; high rates of missingness usually that have been carried out on different platforms. indicate low DNA quality or problematic arrays. The analysis of imputed data necessitates taking Discrepancies in gender assignment (SNP data into account uncertainty by analysing the full geno- versus phenotype data) can help identify sample type probability distribution appropriately. mix-ups. Excess genome-wide heterozygosity may The sheer number of SNPs tested for association indicate possible contamination leading to a larger with disease raises important statistical considera- proportion of heterozygous genotypes. Accidentally tions about type I error and statistical significance duplicated and related individuals in large-scale stu- levels. To account for the inflation in false positives, dies can be identified through identity-by-descent a variety of approaches, such as the conservative estimation given identity-by-state information in a Bonferroni correction and the less stringent control relatively large homogeneous sample [29]. Typically, of the false discovery rate [34], have been proposed. the sample with the lowest call rate from each pair of Obtaining empirical P-values after hundreds of related individuals is removed. Finally, ethnic outliers thousands or millions of permutations are an alterna- can be detected and either removed or accounted for tive but prohibitively computer-intensive way to in downstream analyses. assess statistical significance. To overcome the mul- Population stratification can be a major con- tiple testing problem, stringent genome-wide signif- founding factor in GWA studies, both for case/ icance thresholds have been proposed: adjustment control designs and population-based quantitative for 1–2 million independent tests at common analyses. If undetected, it can lead to false positive variants genome-wide has resulted in the aforemen- associations due to differences in allele frequency tioned generally accepted significance threshold between the different populations [30]. To guard of P¼ 5 10 [22, 23]. In practise, most GWA against it, most GWA scans attempt to match cases studies prioritize signals for follow-up on the basis and controls for broad ethnic background from of their relative statistical strength for association the outset and then rely on statistical approaches to and on evidence accrued from bioinformatics detect population substructure and correct for it approaches. Replication in independent datasets [29, 31, 32]. Genomic control () is an estimate of (of the same variant, in the same direction, under the degree of inflation of the test statistics genome- the same model) constitutes the gold standard in wide and can serve as a crude correction factor [31]. genetic association studies of any scale. Principal component analysis [32] and multidimen- T2D serves as a prime example of the success sional scaling [29] are methods employed to identify of the GWA scan approach. Over the past 2 years, individuals of different ethnic origin visualized multiple GWA scans have been published, greatly onto a two-dimensional projection on axes of accelerating progress in identifying novel susceptibil- genetic variation. Inferred principal components ity variants for the disease [24, 35–42]. This first can be included as covariates in association analyses. wave of studies collectively raised the number of Directly typed SNPs in GWA studies are typically established T2D loci to 11. analysed by single-point methods, most frequently Approaches aiming to identify complex trait under the additive or multiplicative model. General susceptibility loci have recently also extended models are less frequently tested as they increase to the meta-analysis of diverse scans carried out for dimensionality; dominant and recessive models are the same phenotype. This move in the field has been equally parsimonious but generally less powerful brought about by the realization that effect sizes for than the additive model. Multimarker tests (such as common variants are becoming increasingly low. 350 Panoutsopoulou and Zeggini As Figure 1 attests, sample size is one of the most regions delineated by recombination hotspots, important factors in boosting power for an associa- typically spanning several kilobases, in order to iden- tion study. Synergy across research groups, leading to tify the truly causal variants. Deep resequencing the synthesis of GWA scan results, can greatly in samples of interest and subsequent large-scale increase sample size and, hence, power to detect follow-up of interesting markers through fine- small individual effects. Several design and analytical mapping is an emerging study design paradigm, challenges are associated with GWA scan meta- enabled by next generation sequencing technologies. analysis (reviewed in [43]). These collaborative However, several study design issues remain unclear, efforts have recently started to successfully extend including the choice of resequencing and fine- the list of robustly replicating associations with mapping samples and their ethnicity, sample size, complex traits [44–48]. For example, the Diabetes spectrum of typed marker allele frequency and Genetics Initiative, Finland–United States Investiga- analytical approach. It is generally recognized that tion of NIDDM and Wellcome Trust Case Control the benefits of fine-mapping will be finite, particu- Consortium T2D scans undertook a three-way larly in regions of very strong LD, and that functional meta-analysis, which led to the identification of 6 studies will be necessary in order to pinpoint the novel susceptibility loci [44]. truly causal variant. The availability of global gene expression profiles coupled with genotype data from the same samples can also serve as a valuable resource, as associated variants might display strong FUTURE cis associations with expression of a nearby gene The first wave of GWA studies and meta-analyses whose expression levels are causally linked with the conducted indicate that only a small amount of the underlying phenotype or disease trait [50]. genetic variance underlying the heritable component The future of genetic association studies is poised of common complex traits has been identified. For to have an increasing focus on CNVs; this will be example, in the case of T2D, the so far identified loci facilitated by ongoing efforts to provide a catalogue account for <4% of the estimated heritability of structural variants (e.g. the CNV project [51]). (reviewed in [49]). This reflects the fact that current Along with rare variants, CNVs could account for studies involving thousands of individuals are still some of the missing complex trait heritability. underpowered to discover most of the common For example, schizophrenia studies have uncovered genetic variants with the very modest to low effect CNV associations [52, 53] in a disease where GWA sizes that are likely to exist. It is anticipated that studies have not returned significant evidence for sample sizes of many tens of thousands or even hun- robust common SNP associations (reviewed in [54]). dreds of thousands will be required to fulfil this Current studies are focused on common variants, purpose. The identification of further common which invariably have small effects. However, the variants with small effect sizes may not have imme- field is now starting to recognize the role of rare diate consequences in disease prediction and prog- variants, which can have larger effect sizes, in com- nosis, but will hopefully continue to provide novel plex disease susceptibility. The analysis of lower insights into implicated biological pathways, pointing frequency polymorphisms necessitates larger sample to new targets for therapy. Therefore, the future is sizes and tailored analytical approaches in order to poised to continue in the same trend of large-scale consortia being formed to facilitate the accumula- increase power [55]. The 1000 genomes project tion of data and the combination of expertise, in [56] will improve our understanding of variation at order to make the next generation of GWA scan the lower end of the frequency spectrum and is expected to enhance information capture and inter- meta-analyses possible. These will in turn start to pretation in genetic association studies. enable the investigation of gene–gene and gene– There is little doubt that large-scale sequencing environment interactions, currently hindered by studies will constitute the way forward for character- low power. izing the allelic architecture of complex disease. The associated SNPs uncovered by GWA scans Several challenges with respect to the design, analysis are unlikely to be the functional polymorphisms. One of the major challenges that the field of com- and interpretation of such studies continue to emerge plex disease genetics faces over the next few years and will undoubtedly keep researchers busy for the is how best to explore information in association foreseeable future. The landscape of human complex Common susceptibility variants for complex disease 351 associated with decreased risk of type 2 diabetes. Nat Genet disease genetics has witnessed major changes over the 2000;26:76–80. past 10 years, and is poised to change even more 10. Gloyn AL, Weedon MN, Owen KR, et al. Large-scale dramatically in the near future. association studies of variants in genes encoding the pan- creatic beta-cell KATP channel subunits Kir6.2 (KCNJ11) and SUR1 (ABCC8) confirm that the KCNJ11 E23K variant is associated with type 2 diabetes. Diabetes 2003;52: Key Points 568–72. 11. Frazer KA, Ballinger DG, Cox DR, et al. International The genetic aetiology of complex traits is likely to be based on HapMap Consortium. A second generation human haplo- a combination of multiple common and rare susceptibility loci. Genome-wide linkage scans and small-scale candidate gene type map of over 3.1 million SNPs. Nature 2007;449:851–61. studies had not previously met with widespread success. 12. Johnson GC, Esposito L, Barratt BJ, etal. Haplotype tagging GWA studies follow a hypothesis-free approach and interrogate for the identification of common disease genes. Nat Genet the majority of common SNPs across the human genome. 2001;29:233–7. Sufficiently large sample sizes, stringent genotype QC, use of 13. Gabriel SB, Schaffner SF, Nguyen H, etal. The structure of appropriate significance thresholds and replication of findings haplotype blocks in the human genome. Science 2002;296: in independent datasets have been crucial determinants of 2225–9. GWA study success. Further advances in genotyping and next generation sequencing 14. Carlson CS, Eberle MA, Rieder MJ, et al. Selecting a maxi- technologies, facilitating the study of rare and structural mally informative set of single-nucleotide polymorphisms variation, hold the promise of an improved understanding of for association analyses using linkage disequilibrium. Am J the allelic architecture of complex disease. Hum Genet 2004;74:106–20. 15. Ke X, Miretti MM, Broxholme J, et al. A comparison of tagging methods and their tagging space. Hum Mol Genet 2005;14:2757–67. FUNDING 16. Barrett JC, Cardon LR. Evaluating coverage of genome- Wellcome Trust (WT088885/Z/09/Z). wide association studies. Nat Genet 2006;38:659–62. 17. Li M, Li C, Guan W. Evaluation of coverage variation of SNP chips for genome-wide association studies. EurJ Hum Genet 2008;16:635–43. References 18. Whole-genome genotyping & CNV analysis: human610- quad beadchip. http://www.illumina.com/pages.ilmn? 1. McCarthy MI. Growing evidence for diabetes susceptibility ID¼248 (9 March 2009, date last accessed). genes from genome scan data. Curr Diab Rep 2003;3: 19. Weale ME, Depondt C, Macdonald SJ, et al. Selection and 159–67. evaluation of tagging SNPs in the neuronal-sodium-channel 2. Guan W, Pluzhnikov A, Cox NJ, et al. Meta-analysis of 23 gene SCN1A: implications for linkage-disequilibrium gene type 2 diabetes linkage studies from the International Type mapping. AmJ Hum Genet 2003;73:551–65. 2 Diabetes Linkage Analysis Consortium. Hum Hered 2008; 20. Iles MM. Quantification and correction of bias in tagging 66:35–49. SNPs caused by insufficient sample size and marker density 3. John S, Shephard N, Liu G, et al. Whole-genome scan, by means of haplotype-dropping. Genet Epidemiol 2008;32: in a complex disease, using 11,245 single-nucleotide poly- 20–8. morphisms: comparison with microsatellites. Am J Hum 21. Bhangale TR, Rieder MJ, Nickerson DA. Estimating Genet 2004;75:54–64. coverage and power for genetic association studies using 4. Evans DM, Cardon LR. Guidelines for genotyping in gen- near-complete variation data. Nat Genet 2008;40:841–3. omewide linkage studies: single-nucleotide-polymorphism 22. McCarthy MI, Abecasis GR, Cardon LR, et al. Genome- maps versus microsatellite maps. Am J Hum Genet 2004;75: wide association studies for complex traits: consensus, 687–92. uncertainty and challenges. Nat Rev Genet 2008;9:356–69. 5. Wiltshire S, Morris AP, McCarthy MI, et al. How useful is the fine-scale mapping of complex trait linkage peaks? 23. International HapMap Consortium. A haplotype map of the Evaluating the impact of additional microsatellite genotyp- human genome. Nature 2005;437:1299–320. ing on the posterior probability of linkage. Genet Epidemiol 24. Wellcome Trust Case Control Consortium. Genome-wide 2005;28:1–10. association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007;447:661–78. 6. Hugot JP, Chamaillard M, Zouali H, et al. Association of NOD2 leucine-rich repeat variants with susceptibility to 25. Plagnol V, Cooper JD, Todd JA, et al. A method to address Crohn’s disease. Nature 2001;411:599–603. differential bias in genotyping in large-scale association studies. PLoS Genet 2007;3:e74. 7. Ogura Y, Bonen D, Inohara N, et al. A frameshift mutation in NOD2 associated with susceptibility to Crohn’s disease. 26. Teo YY, Inouye M, Small KS, et al. A genotype calling Nature 2001;411:603–6. algorithm for the Illumina BeadArray platform. Bioinformatics 2007;23:2741–6. 8. Sham PC, Curtis D. An extended transmission/ disequilibrium test (TDT) for multi-allele marker loci. 27. Korn JM, Kuruvilla FG, McCarroll SA, et al. Integrated Ann Hum Genet 1995;59:323–6. genotype calling and association analysis of SNPs, 9. Altshuler D, Hirschhorn JN, Klannemark M, et al. common copy number polymorphisms and rare CNVs. The common PPARgamma Pro12Ala polymorphism is Nat Genet 2008;40:1253–60. 352 Panoutsopoulou and Zeggini 28. Clayton DG, Walker NM, Smyth DJ, et al. Population in East Asian and European populations. Nat Genet 2008;40: structure, differential bias and genomic control in a large- 1098–102. scale, case-control association study. Nat Genet 2005;37: 43. Zeggini E, Ioannidis JP. Meta-analysis in genome-wide 1243–6. association studies. Pharmacogenomics 2009;10:191–201. 29. Purcell S, Neale B, Todd-Brown K, etal. PLINK: a tool set 44. Zeggini E, Scott LJ, Saxena R, et al. Meta-analysis of for whole-genome association and population-based linkage genome-wide association data and large-scale replication analyses. AmJ Hum Genet 2007;81:559–75. identifies additional susceptibility loci for type 2 diabetes. 30. Marchini J, Cardon LR, Phillips MS, et al. The effects of Nat Genet 2008;40:638–45. human population structure on large genetic association 45. Barrett JC, Hansoul S, Nicolae DL, et al. Genome-wide studies. Nat Genet 2004;36:512–17. association defines more than 30 distinct susceptibility loci 31. Devlin B, Roeder K. Genomic control for association for Crohn’s disease. Nat Genet 2008;40:955–62. studies. Biometrics 1999;55:997–1004. 46. Weedon MN, Lango H, Lindgren CM, etal. Genome-wide 32. Price AL, Patterson NJ, Plenge RM, etal. Principal compo- association analysis identifies 20 loci that influence adult nents analysis corrects for stratification in genome-wide height. Nat Genet 2008;40:575–83. association studies. Nat Genet 2006;38:904–9. 47. Cooper JD, Smyth DJ, Smiles AM, et al. Meta-analysis 33. Marchini J, Howie B, Myers S, et al. A new multipoint of genome-wide association study data identifies method for genome-wide association studies by imputation additional type 1 diabetes risk loci. Nat Genet 2008;40: of genotypes. Nat Genet 2007;39:906–13. 1399–401. 34. Benjamini Y, Hochberg Y. Controlling the false discovery 48. Willer CJ, Speliotes EK, Loos RJ, et al. Six new loci rate: a practical and powerful approach to multiple testing. associated with body mass index highlight a neuronal J Roy Stat Soc Ser B 1995;57:289–300. influence on body weight regulation. Nat Genet 2009;41: 25–34. 35. Saxena R, Voight BF, Lyssenko V, et al. Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund 49. Frazer KA, Murray SS, Schork NJ, et al. Human genetic University, and Novartis Institutes of BioMedical Research, variation and its contribution to complex traits. Nat Rev Genome-wide association analysis identifies loci for type 2 Genet 2009;10:241–51. diabetes and triglyceride levels. Science 2007;316:1331–5. 50. Libioulle C, Louis E, Hansoul S, et al. Novel Crohn disease 36. Zeggini E, Weedon MN, Lindgren CM, et al. Replication locus identified by genome-wide association maps to a gene of genome-wide association signals in UK samples reveals desert on 5p13.1 and modulates expression of PTGER4. risk loci for type 2 diabetes. Science 2007;316:1336–41. PLoS Genet 2007;3:e58. 37. Scott LJ, Mohlke KL, Bonnycastle LL, etal. A genome-wide 51. The copy number variation project. http://www.sanger.ac. association study of type 2 diabetes in Finns detects multiple uk/humgen/cnv (21 May 2008, date last accessed) susceptibility variants. Science 2007;316:1341–5. 52. International Schizophrenia Consortium. Rare chromoso- 38. Sladek R, Rocheleau G, Rung J, et al. A genome-wide mal deletions and duplications increase risk of schizophrenia. association study identifies novel risk loci for type 2 diabetes. Nature 2008;455:178–9. Nature 2007;445:881–5. 53. Stefansson H, Rujescu D, Cichon S, et al. Large recurrent 39. Salonen JT, Uimari P, Aalto JM, et al. Type 2 diabetes microdeletions associated with schizophrenia. Nature 2008; whole-genome association study in four populations: the 455:232–6. DiaGen consortium. AmJ Hum Genet 2007;81:338–45. 54. Cichon S, Craddock N, Daly M, et al. Psychiatric GWAS 40. Steinsthorsdottir V, Thorleifsson G, Reynisdottir I, et al. Consortium Coordinating Committee, Genomewide asso- A variant in CDKAL1 influences insulin response and risk ciation studies: history, rationale, and prospects for psychiat- of type 2 diabetes. Nat Genet 2007;39:770–5. ric disorders. AmJ Psychiatry 2009;166:540–56. 41. Yasuda K, Miyake K, Horikawa Y, et al. Variants in 55. Li B, Leal SM. Methods for detecting associations with rare KCNQ1 are associated with susceptibility to type 2 diabetes variants for common diseases: application to analysis of mellitus. Nat Genet 2008;40:1092–7. sequence data. AmJ Hum Genet 2008;83:311–21. 42. Unoki H, Takahashi A, Kawaguchi T, et al. SNPs in 56. 1000 genomes project. http://www.1000genomes.org KCNQ1 are associated with susceptibility to type 2 diabetes (9 March 2009, date last accessed). http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Briefings in Functional Genomics and Proteomics Pubmed Central

Finding common susceptibility variants for complex disease: past, present and future

Briefings in Functional Genomics and Proteomics , Volume 8 (5) – Jul 1, 2009

Loading next page...
 
/lp/pubmed-central/finding-common-susceptibility-variants-for-complex-disease-past-xpZ5x761C0

References (60)

Publisher
Pubmed Central
Copyright
2009 The Author(s)
ISSN
1473-9550
eISSN
1477-4062
DOI
10.1093/bfgp/elp020
Publisher site
See Article on Publisher Site

Abstract

The identification of complex disease susceptibility loci has been accelerated considerably by advances in high-throughput genotyping technologies, improved insight into correlation patterns of common variants and the availability of large-scale sample sets. Linkage scans and small-scale candidate gene studies have now given way to genome-wide association scans. In this review, we summarize insights gained from the past, highlight practical issues relating to the design and analysis of current state-of-the-art GWA studies and look into future trends in the field of human complex trait genetics. Keywords: association study; complex disease; single nucleotide polymorphism; genome-wide association scan; meta-analysis; sequencing INTRODUCTION robust identification. The journey has witnessed Common complex diseases have traditionally been study design trends come and go, with valuable les- sons learnt from each such era. Rapid technological ascribed to complicated networks of genetic and developments, coupled with the availability of larger environmental factors. The search for genetic suscep- sample sizes and a better understanding of human tibility loci has been much more straightforward for genome sequence variation, continue to facilitate Mendelian disorders than for multifactorial traits, progress in the field. In this review, we aim to where numerous variants of modest or small effect distil lessons from the past few years in the field of sizes contribute to the genetic background of disease. complex disease genetics, describe the present state- The common disease–common variant and multiple of-the-art for finding common susceptibility loci and rare variant hypotheses had been proposed as distinct look into emerging themes for the near future. scenarios and polarized the field of complex disease genetics for some time. However, emerging evi- dence indicates that the genetic aetiology of complex traits is likely to be based on a combination of mul- PAST tiple rare and common susceptibility loci. Genetic association studies have, over the last decade, The field of human complex trait genetics evolved from genome-wide linkage scans to candi- has undergone major transformation over the past date gene approaches, to gene-centric designs aiming decade. Researchers have gradually moved from to capture the majority of common variation and, family-based approaches for investigating linkage ultimately, to genome-wide association (GWA) to association studies offering (and, lately, deliver- scans. Several factors have influenced this trajectory, ing) the promise of complex disease locus including our understanding of human genome Corresponding author. Eleftheria Zeggini, Wellcome Trust Sanger Institute, The Morgan Building, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1HH, UK. Tel: +44 1223 496868; Fax: +44 1223 496826; E-mail: eleftheria@sanger.ac.uk Kalliope Panoutsopoulou is a postdoctoral research fellow at the Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK, working towards the identification of genetic variants conferring susceptibility to osteoarthritis. Eleftheria Zeggini is an investigator in Human Genetics at the Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK, where she leads the Applied Statistical Genetics team. Her research focus is on design, analysis and interpretation issues in large-scale complex disease association and resequencing studies. 2009 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by- nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. 346 Panoutsopoulou and Zeggini sequence variation, and ongoing development of factors: low power (as a result of small sample sizes) to detect what we now recognize as modest or small genotyping technologies (moving from low- to medium- to high-throughput approaches). effects; limited understanding of disease aetiopatho- Family-based linkage studies prevailed in the lit- genesis leading to inappropriate selection of candi- erature for several years as they constituted the only date loci; low thresholds for declaring significance means of targeting variation genome-wide at the and over-interpretation of results; and inadequate time. Linkage studies tended to lead to the identifi- capture of variation across the genes of interest. cation of numerous peaks that were rarely repro- The International HapMap Project [11] greatly duced in independent studies. For example, in type increased our understanding of correlation patterns 2 diabetes (T2D), although more than 40 linkage (LD) between common variants across the genome. This enabled the selection of maximally informative, scans have been performed, the overall picture has non-redundant sets of markers across genes or been one of multiple modest signals, few of which regions of interest. A wide variety of haplotype- show evidence of replication [1, 2]. Linkage signals based and pairwise tagging methods were developed typically encompass several megabases of sequence [12–15]. Tag SNP studies continue to be carried out; and the resulting localization resolution is low they employ information from relevant HapMap [although this improved marginally when single populations to select SNPs capturing the majority nucleotide polymorphism (SNP)-based linkage of common variation across targeted loci. These scans were introduced] [3, 4]. Consortia formed for markers are then genotyped and analysed in the data- the meta-analysis of linkage scans of particular phe- sets of interest, and inferences about their proxy notypes served to distil the number of statistically variants are made on the basis of the association believable linkage peaks [2] and promising signals patterns observed. were traditionally followed up by fine-mapping Advances in high-throughput, high-accuracy experiments [5]. Very few such endeavours have genotyping platforms marked a new era for associa- led to the identification of causal disease susceptibility tion studies, enabling the concurrent examination of variants [6, 7]. This is perhaps not surprising, as link- hundreds of thousands of SNPs. Sufficient power age disequilibrium (LD) mapping efforts under link- in GWA studies was facilitated by the availability age peaks tended to make use of SNPs with common of large-scale sample collections. Over the last few minor allele frequencies (MAFs), whereas linkage years, GWA scans have succeeded in detecting signals were more likely to reflect more penetrant and establishing complex trait associations, and effects of rare variants. Moreover, because of the rel- have started to provide valuable insights into disease atively small number of families and microsatellite aetiopathogenesis. markers used, most of these studies may have been underpowered to detect many of the effects that association approaches have thus far discovered. The field shifted towards association studies, PRESENT exemplified over the last decade by the candidate GWA studies undoubtedly constitute the present gene study. Candidate gene studies focused on a state-of-the-art in efforts to elucidate the genetic few, if not just a single, variant(s) within a biologi- aetiology of complex phenotypes. Several commer- cally plausible candidate gene. They were typically cial products offering the potential to simultaneously carried out in a few hundreds of disease cases and assay hundreds of thousands of SNPs genome-wide controls, or in a few hundreds of nuclear families, are available from companies such as Affymetrix consisting of affected offsprings and unaffected par- and Illumina. These have varying SNP content and ents. The latter approach (transmission disequilib- density, and have been designed using diverse rium test) [8] reached high popularity levels in the marker selection strategies (Table 1). For example, nineties due to its property of being robust to pop- arrays with an exon-centric SNP content, such as ulation stratification. Although several notable the Illumina Human-1, reflect strategies focusing exceptions exist (for example [9, 10] from the field on potentially functional variants. LD-based plat- of T2D), candidate gene studies on the whole did forms contain tag sets of SNPs selected to maximize not deliver many robustly replicating disease suscep- the amount of common variation captured on the tibility loci. This irreproducibility of results could be basis of HapMap data. Affymetrix platforms comprise ascribed to a combination of several contributing quasi-randomly distributed SNPs or a combination Common susceptibility variants for complex disease 347 Table 1: Overview of marker content and array design across commercially available platforms and coverage of common variation (MAF>0.05)based on HapMap phaseIIdata Platform Number of Array Coverage Coverage in Coverage Source a b c d markers design in CEU (%) JPT +CHB (%) inYRI (%) Illumina Human-1 More than 109 000 Gene 26 28 12 [16] Illumina HumanHap300 317 511 Tag 75 63 28 [16] Affymetrix SNP Array 5.0 500 568 Random 65 66 41 [16] Illumina HumanHap550 555352 Tag 87 83 50 [17] Illumina Human610 620 901 Tag, CNV 89 86 58 [18] Illumina HumanHap650Y 660 917 Tag 87 84 60 [17] Affymetrix SNP Array 6.0 More than 1800 000 Random + Tag, CNV 83 84 62 [17] Illumina Human1M 1199187 Tag, CNV 93 92 68 [17] Utah residents with ancestry from northern and western Europe. Japanese fromTokyo, Japan. Han Chinese from Beijing, China. Yoruba from Ibadan, Nigeria. Copy number variation. of random and tag SNPs. In recognition of their loci are likely to have modest or small effect sizes potential role in complex disease susceptibility, [allelic odds ratios (ORs) between 1.1 and 1.5]. copy number variants (CNVs) are also increasingly In a genome-wide setting, the large number of featured. tests performed requires stringent thresholds Table 1 summarizes the extent to which different for declaring statistical genome-wide significance platforms capture common (MAF > 0.05) variation (P¼ 5 10 ) [22, 23], necessitating large-scale based on published evaluations in the three different sample sizes. For example, in order to achieve 90% HapMap phase II populations [11]. Coverage power to detect a risk allele with 0.20 frequency and in European- and East Asian-descent populations an allelic OR of 1.2 (at the genome-wide signifi- is very high and has substantially improved with cance level), more than 6000 affected individuals next generation chips. Information capture in and twice as many controls would be required African-descent populations is lower, reflecting (Figure 1). To achieve the same power to detect higher recombination rates and lower levels of similar effects at lower frequency variants (frequency inter-marker correlation. However, it has been of 0.05 or less), a GWA study would need upwards shown theoretically that coverage of all common of 20 000 cases (Figure 1). variation based on HapMap has been overestimated Along with sample size considerations, GWA and that larger sample sizes and denser marker sets are studies have also given rise to several logistical required for more accurate estimation of tagging challenges: for example, issues relating to automated SNP efficacy [19, 20]. Overestimation of previously but accurate genotype calling, programmatic data reported coverage estimates has also been empirically handling and parsing, genotype quality control confirmed by the analysis of sequence-derived vari- (QC) standards and analytical considerations that ation data from 76 genes in HapMap samples [21]. did not previously apply to smaller scale studies. Although variation capture is an important consider- Genotype calling is the process by which hybrid- ation in GWA study design, it is not the sole deter- ization intensities on genome-wide chips are trans- minant of power. lated into genotypes. Typically, intensities are The statistical power of a GWA study to detect normalized and transformed into coordinates which variants associated with disease is a function of sample yield distinct genotype clouds. As high call rate and size, the susceptibility locus effect magnitude, risk accuracy of genotype calling are important factors in allele frequency of the queried SNP and its correla- safe-guarding QC standards in GWA scans, a variety tion with the causal variant. Although the allelic of genotype calling algorithms have been developed architecture of complex traits has not been fully and continue to evolve [24–27]. The possible adverse characterized yet, recent GWA scans and follow-up effects of inaccurate genotype calling in downstream studies have highlighted that common susceptibility analyses have been recognized for a while [28]. 348 Panoutsopoulou and Zeggini Figure 1: Number of affected individuals required (given a case/control ratio of 1:2) in order to achieve 10, 50 and 90% power to detect an effect at ¼ 510 for variants with modest to low effect sizes (allelic odds ratios 1.10, 1.15 and 1.20) and varying risk allele frequencies: (a)0.05, (b)0.20, (c) 0.50 and (d) 0.90. Calculations assume complete LD between the causal and genotyped variant. Therefore, inspection of intensity plots for interest- ing association signals is an essential aspect of geno- type QC. Genotype QC is an extremely important step in GWA studies, as it can dramatically reduce the number of false positive associations. The field has converged to an essential set of QC checks; Figure 2 summarizes the sample- and SNP-based QC steps that are typically employed. SNP call rate is a good indicator of genotype probe performance. Removing SNPs with a greater proportion of missing genotypes is essential to con- trol for false positives, as spurious associations can arise due to non-random missingness. Checking for gross departure from Hardy–Weinberg equilibrium (HWE) could help in identifying SNPs with geno- Figure 2: Flowchart of the main quality control steps typing errors (e.g. excess of heterozygotes). in a GWA study. Common susceptibility variants for complex disease 349 As clustering algorithms tend to perform less well sliding haplotype window analyses) are less feasible for SNPs with low-frequency alleles, it is current at the genome-wide scale. However, imputation practice in GWA studies to exclude rare SNPs approaches have recently been developed to take from single point analyses (these are underpowered into account information from multiple surrounding to detect effects anyway). Genotype calling algo- markers in order to infer genotypes at untyped loci rithms have the potential to make incorrect calls. [33]. Imputation therefore currently allows testing Therefore, inspecting intensity plots, though not for association at >2.5 million markers genome- feasible on a genome-wide scale, is necessary for wide, thus maximizing information output from SNPs with interesting association signals. GWA studies, and additionally serves as an ideal Sample call rate is a good indicator of hybridiza- tool for the combination of data from GWA scans tion performance; high rates of missingness usually that have been carried out on different platforms. indicate low DNA quality or problematic arrays. The analysis of imputed data necessitates taking Discrepancies in gender assignment (SNP data into account uncertainty by analysing the full geno- versus phenotype data) can help identify sample type probability distribution appropriately. mix-ups. Excess genome-wide heterozygosity may The sheer number of SNPs tested for association indicate possible contamination leading to a larger with disease raises important statistical considera- proportion of heterozygous genotypes. Accidentally tions about type I error and statistical significance duplicated and related individuals in large-scale stu- levels. To account for the inflation in false positives, dies can be identified through identity-by-descent a variety of approaches, such as the conservative estimation given identity-by-state information in a Bonferroni correction and the less stringent control relatively large homogeneous sample [29]. Typically, of the false discovery rate [34], have been proposed. the sample with the lowest call rate from each pair of Obtaining empirical P-values after hundreds of related individuals is removed. Finally, ethnic outliers thousands or millions of permutations are an alterna- can be detected and either removed or accounted for tive but prohibitively computer-intensive way to in downstream analyses. assess statistical significance. To overcome the mul- Population stratification can be a major con- tiple testing problem, stringent genome-wide signif- founding factor in GWA studies, both for case/ icance thresholds have been proposed: adjustment control designs and population-based quantitative for 1–2 million independent tests at common analyses. If undetected, it can lead to false positive variants genome-wide has resulted in the aforemen- associations due to differences in allele frequency tioned generally accepted significance threshold between the different populations [30]. To guard of P¼ 5 10 [22, 23]. In practise, most GWA against it, most GWA scans attempt to match cases studies prioritize signals for follow-up on the basis and controls for broad ethnic background from of their relative statistical strength for association the outset and then rely on statistical approaches to and on evidence accrued from bioinformatics detect population substructure and correct for it approaches. Replication in independent datasets [29, 31, 32]. Genomic control () is an estimate of (of the same variant, in the same direction, under the degree of inflation of the test statistics genome- the same model) constitutes the gold standard in wide and can serve as a crude correction factor [31]. genetic association studies of any scale. Principal component analysis [32] and multidimen- T2D serves as a prime example of the success sional scaling [29] are methods employed to identify of the GWA scan approach. Over the past 2 years, individuals of different ethnic origin visualized multiple GWA scans have been published, greatly onto a two-dimensional projection on axes of accelerating progress in identifying novel susceptibil- genetic variation. Inferred principal components ity variants for the disease [24, 35–42]. This first can be included as covariates in association analyses. wave of studies collectively raised the number of Directly typed SNPs in GWA studies are typically established T2D loci to 11. analysed by single-point methods, most frequently Approaches aiming to identify complex trait under the additive or multiplicative model. General susceptibility loci have recently also extended models are less frequently tested as they increase to the meta-analysis of diverse scans carried out for dimensionality; dominant and recessive models are the same phenotype. This move in the field has been equally parsimonious but generally less powerful brought about by the realization that effect sizes for than the additive model. Multimarker tests (such as common variants are becoming increasingly low. 350 Panoutsopoulou and Zeggini As Figure 1 attests, sample size is one of the most regions delineated by recombination hotspots, important factors in boosting power for an associa- typically spanning several kilobases, in order to iden- tion study. Synergy across research groups, leading to tify the truly causal variants. Deep resequencing the synthesis of GWA scan results, can greatly in samples of interest and subsequent large-scale increase sample size and, hence, power to detect follow-up of interesting markers through fine- small individual effects. Several design and analytical mapping is an emerging study design paradigm, challenges are associated with GWA scan meta- enabled by next generation sequencing technologies. analysis (reviewed in [43]). These collaborative However, several study design issues remain unclear, efforts have recently started to successfully extend including the choice of resequencing and fine- the list of robustly replicating associations with mapping samples and their ethnicity, sample size, complex traits [44–48]. For example, the Diabetes spectrum of typed marker allele frequency and Genetics Initiative, Finland–United States Investiga- analytical approach. It is generally recognized that tion of NIDDM and Wellcome Trust Case Control the benefits of fine-mapping will be finite, particu- Consortium T2D scans undertook a three-way larly in regions of very strong LD, and that functional meta-analysis, which led to the identification of 6 studies will be necessary in order to pinpoint the novel susceptibility loci [44]. truly causal variant. The availability of global gene expression profiles coupled with genotype data from the same samples can also serve as a valuable resource, as associated variants might display strong FUTURE cis associations with expression of a nearby gene The first wave of GWA studies and meta-analyses whose expression levels are causally linked with the conducted indicate that only a small amount of the underlying phenotype or disease trait [50]. genetic variance underlying the heritable component The future of genetic association studies is poised of common complex traits has been identified. For to have an increasing focus on CNVs; this will be example, in the case of T2D, the so far identified loci facilitated by ongoing efforts to provide a catalogue account for <4% of the estimated heritability of structural variants (e.g. the CNV project [51]). (reviewed in [49]). This reflects the fact that current Along with rare variants, CNVs could account for studies involving thousands of individuals are still some of the missing complex trait heritability. underpowered to discover most of the common For example, schizophrenia studies have uncovered genetic variants with the very modest to low effect CNV associations [52, 53] in a disease where GWA sizes that are likely to exist. It is anticipated that studies have not returned significant evidence for sample sizes of many tens of thousands or even hun- robust common SNP associations (reviewed in [54]). dreds of thousands will be required to fulfil this Current studies are focused on common variants, purpose. The identification of further common which invariably have small effects. However, the variants with small effect sizes may not have imme- field is now starting to recognize the role of rare diate consequences in disease prediction and prog- variants, which can have larger effect sizes, in com- nosis, but will hopefully continue to provide novel plex disease susceptibility. The analysis of lower insights into implicated biological pathways, pointing frequency polymorphisms necessitates larger sample to new targets for therapy. Therefore, the future is sizes and tailored analytical approaches in order to poised to continue in the same trend of large-scale consortia being formed to facilitate the accumula- increase power [55]. The 1000 genomes project tion of data and the combination of expertise, in [56] will improve our understanding of variation at order to make the next generation of GWA scan the lower end of the frequency spectrum and is expected to enhance information capture and inter- meta-analyses possible. These will in turn start to pretation in genetic association studies. enable the investigation of gene–gene and gene– There is little doubt that large-scale sequencing environment interactions, currently hindered by studies will constitute the way forward for character- low power. izing the allelic architecture of complex disease. The associated SNPs uncovered by GWA scans Several challenges with respect to the design, analysis are unlikely to be the functional polymorphisms. One of the major challenges that the field of com- and interpretation of such studies continue to emerge plex disease genetics faces over the next few years and will undoubtedly keep researchers busy for the is how best to explore information in association foreseeable future. The landscape of human complex Common susceptibility variants for complex disease 351 associated with decreased risk of type 2 diabetes. Nat Genet disease genetics has witnessed major changes over the 2000;26:76–80. past 10 years, and is poised to change even more 10. Gloyn AL, Weedon MN, Owen KR, et al. Large-scale dramatically in the near future. association studies of variants in genes encoding the pan- creatic beta-cell KATP channel subunits Kir6.2 (KCNJ11) and SUR1 (ABCC8) confirm that the KCNJ11 E23K variant is associated with type 2 diabetes. Diabetes 2003;52: Key Points 568–72. 11. Frazer KA, Ballinger DG, Cox DR, et al. International The genetic aetiology of complex traits is likely to be based on HapMap Consortium. A second generation human haplo- a combination of multiple common and rare susceptibility loci. Genome-wide linkage scans and small-scale candidate gene type map of over 3.1 million SNPs. Nature 2007;449:851–61. studies had not previously met with widespread success. 12. Johnson GC, Esposito L, Barratt BJ, etal. Haplotype tagging GWA studies follow a hypothesis-free approach and interrogate for the identification of common disease genes. Nat Genet the majority of common SNPs across the human genome. 2001;29:233–7. Sufficiently large sample sizes, stringent genotype QC, use of 13. Gabriel SB, Schaffner SF, Nguyen H, etal. The structure of appropriate significance thresholds and replication of findings haplotype blocks in the human genome. Science 2002;296: in independent datasets have been crucial determinants of 2225–9. GWA study success. Further advances in genotyping and next generation sequencing 14. Carlson CS, Eberle MA, Rieder MJ, et al. Selecting a maxi- technologies, facilitating the study of rare and structural mally informative set of single-nucleotide polymorphisms variation, hold the promise of an improved understanding of for association analyses using linkage disequilibrium. Am J the allelic architecture of complex disease. Hum Genet 2004;74:106–20. 15. Ke X, Miretti MM, Broxholme J, et al. A comparison of tagging methods and their tagging space. Hum Mol Genet 2005;14:2757–67. FUNDING 16. Barrett JC, Cardon LR. Evaluating coverage of genome- Wellcome Trust (WT088885/Z/09/Z). wide association studies. Nat Genet 2006;38:659–62. 17. Li M, Li C, Guan W. Evaluation of coverage variation of SNP chips for genome-wide association studies. EurJ Hum Genet 2008;16:635–43. References 18. Whole-genome genotyping & CNV analysis: human610- quad beadchip. http://www.illumina.com/pages.ilmn? 1. McCarthy MI. Growing evidence for diabetes susceptibility ID¼248 (9 March 2009, date last accessed). genes from genome scan data. Curr Diab Rep 2003;3: 19. Weale ME, Depondt C, Macdonald SJ, et al. Selection and 159–67. evaluation of tagging SNPs in the neuronal-sodium-channel 2. Guan W, Pluzhnikov A, Cox NJ, et al. Meta-analysis of 23 gene SCN1A: implications for linkage-disequilibrium gene type 2 diabetes linkage studies from the International Type mapping. AmJ Hum Genet 2003;73:551–65. 2 Diabetes Linkage Analysis Consortium. Hum Hered 2008; 20. Iles MM. Quantification and correction of bias in tagging 66:35–49. SNPs caused by insufficient sample size and marker density 3. John S, Shephard N, Liu G, et al. Whole-genome scan, by means of haplotype-dropping. Genet Epidemiol 2008;32: in a complex disease, using 11,245 single-nucleotide poly- 20–8. morphisms: comparison with microsatellites. Am J Hum 21. Bhangale TR, Rieder MJ, Nickerson DA. Estimating Genet 2004;75:54–64. coverage and power for genetic association studies using 4. Evans DM, Cardon LR. Guidelines for genotyping in gen- near-complete variation data. Nat Genet 2008;40:841–3. omewide linkage studies: single-nucleotide-polymorphism 22. McCarthy MI, Abecasis GR, Cardon LR, et al. Genome- maps versus microsatellite maps. Am J Hum Genet 2004;75: wide association studies for complex traits: consensus, 687–92. uncertainty and challenges. Nat Rev Genet 2008;9:356–69. 5. Wiltshire S, Morris AP, McCarthy MI, et al. How useful is the fine-scale mapping of complex trait linkage peaks? 23. International HapMap Consortium. A haplotype map of the Evaluating the impact of additional microsatellite genotyp- human genome. Nature 2005;437:1299–320. ing on the posterior probability of linkage. Genet Epidemiol 24. Wellcome Trust Case Control Consortium. Genome-wide 2005;28:1–10. association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007;447:661–78. 6. Hugot JP, Chamaillard M, Zouali H, et al. Association of NOD2 leucine-rich repeat variants with susceptibility to 25. Plagnol V, Cooper JD, Todd JA, et al. A method to address Crohn’s disease. Nature 2001;411:599–603. differential bias in genotyping in large-scale association studies. PLoS Genet 2007;3:e74. 7. Ogura Y, Bonen D, Inohara N, et al. A frameshift mutation in NOD2 associated with susceptibility to Crohn’s disease. 26. Teo YY, Inouye M, Small KS, et al. A genotype calling Nature 2001;411:603–6. algorithm for the Illumina BeadArray platform. Bioinformatics 2007;23:2741–6. 8. Sham PC, Curtis D. An extended transmission/ disequilibrium test (TDT) for multi-allele marker loci. 27. Korn JM, Kuruvilla FG, McCarroll SA, et al. Integrated Ann Hum Genet 1995;59:323–6. genotype calling and association analysis of SNPs, 9. Altshuler D, Hirschhorn JN, Klannemark M, et al. common copy number polymorphisms and rare CNVs. The common PPARgamma Pro12Ala polymorphism is Nat Genet 2008;40:1253–60. 352 Panoutsopoulou and Zeggini 28. Clayton DG, Walker NM, Smyth DJ, et al. Population in East Asian and European populations. Nat Genet 2008;40: structure, differential bias and genomic control in a large- 1098–102. scale, case-control association study. Nat Genet 2005;37: 43. Zeggini E, Ioannidis JP. Meta-analysis in genome-wide 1243–6. association studies. Pharmacogenomics 2009;10:191–201. 29. Purcell S, Neale B, Todd-Brown K, etal. PLINK: a tool set 44. Zeggini E, Scott LJ, Saxena R, et al. Meta-analysis of for whole-genome association and population-based linkage genome-wide association data and large-scale replication analyses. AmJ Hum Genet 2007;81:559–75. identifies additional susceptibility loci for type 2 diabetes. 30. Marchini J, Cardon LR, Phillips MS, et al. The effects of Nat Genet 2008;40:638–45. human population structure on large genetic association 45. Barrett JC, Hansoul S, Nicolae DL, et al. Genome-wide studies. Nat Genet 2004;36:512–17. association defines more than 30 distinct susceptibility loci 31. Devlin B, Roeder K. Genomic control for association for Crohn’s disease. Nat Genet 2008;40:955–62. studies. Biometrics 1999;55:997–1004. 46. Weedon MN, Lango H, Lindgren CM, etal. Genome-wide 32. Price AL, Patterson NJ, Plenge RM, etal. Principal compo- association analysis identifies 20 loci that influence adult nents analysis corrects for stratification in genome-wide height. Nat Genet 2008;40:575–83. association studies. Nat Genet 2006;38:904–9. 47. Cooper JD, Smyth DJ, Smiles AM, et al. Meta-analysis 33. Marchini J, Howie B, Myers S, et al. A new multipoint of genome-wide association study data identifies method for genome-wide association studies by imputation additional type 1 diabetes risk loci. Nat Genet 2008;40: of genotypes. Nat Genet 2007;39:906–13. 1399–401. 34. Benjamini Y, Hochberg Y. Controlling the false discovery 48. Willer CJ, Speliotes EK, Loos RJ, et al. Six new loci rate: a practical and powerful approach to multiple testing. associated with body mass index highlight a neuronal J Roy Stat Soc Ser B 1995;57:289–300. influence on body weight regulation. Nat Genet 2009;41: 25–34. 35. Saxena R, Voight BF, Lyssenko V, et al. Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund 49. Frazer KA, Murray SS, Schork NJ, et al. Human genetic University, and Novartis Institutes of BioMedical Research, variation and its contribution to complex traits. Nat Rev Genome-wide association analysis identifies loci for type 2 Genet 2009;10:241–51. diabetes and triglyceride levels. Science 2007;316:1331–5. 50. Libioulle C, Louis E, Hansoul S, et al. Novel Crohn disease 36. Zeggini E, Weedon MN, Lindgren CM, et al. Replication locus identified by genome-wide association maps to a gene of genome-wide association signals in UK samples reveals desert on 5p13.1 and modulates expression of PTGER4. risk loci for type 2 diabetes. Science 2007;316:1336–41. PLoS Genet 2007;3:e58. 37. Scott LJ, Mohlke KL, Bonnycastle LL, etal. A genome-wide 51. The copy number variation project. http://www.sanger.ac. association study of type 2 diabetes in Finns detects multiple uk/humgen/cnv (21 May 2008, date last accessed) susceptibility variants. Science 2007;316:1341–5. 52. International Schizophrenia Consortium. Rare chromoso- 38. Sladek R, Rocheleau G, Rung J, et al. A genome-wide mal deletions and duplications increase risk of schizophrenia. association study identifies novel risk loci for type 2 diabetes. Nature 2008;455:178–9. Nature 2007;445:881–5. 53. Stefansson H, Rujescu D, Cichon S, et al. Large recurrent 39. Salonen JT, Uimari P, Aalto JM, et al. Type 2 diabetes microdeletions associated with schizophrenia. Nature 2008; whole-genome association study in four populations: the 455:232–6. DiaGen consortium. AmJ Hum Genet 2007;81:338–45. 54. Cichon S, Craddock N, Daly M, et al. Psychiatric GWAS 40. Steinsthorsdottir V, Thorleifsson G, Reynisdottir I, et al. Consortium Coordinating Committee, Genomewide asso- A variant in CDKAL1 influences insulin response and risk ciation studies: history, rationale, and prospects for psychiat- of type 2 diabetes. Nat Genet 2007;39:770–5. ric disorders. AmJ Psychiatry 2009;166:540–56. 41. Yasuda K, Miyake K, Horikawa Y, et al. Variants in 55. Li B, Leal SM. Methods for detecting associations with rare KCNQ1 are associated with susceptibility to type 2 diabetes variants for common diseases: application to analysis of mellitus. Nat Genet 2008;40:1092–7. sequence data. AmJ Hum Genet 2008;83:311–21. 42. Unoki H, Takahashi A, Kawaguchi T, et al. SNPs in 56. 1000 genomes project. http://www.1000genomes.org KCNQ1 are associated with susceptibility to type 2 diabetes (9 March 2009, date last accessed).

Journal

Briefings in Functional Genomics and ProteomicsPubmed Central

Published: Jul 1, 2009

There are no references for this article.