Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Haploview: analysis and visualization of LD and haplotype maps

Haploview: analysis and visualization of LD and haplotype maps Vol. 21 no. 2 2005, pages 263–265 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/bth457 Haploview: analysis and visualization of LD and haplotype maps 1,∗ 2 1 1,3 J. C. Barrett ,B.Fry , J. Maller and M. J. Daly Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA, 2 3 MIT Media Lab, Cambridge, MA 02139, USA and Broad Institute of Harvard and MIT, Cambridge, MA, USA Received on June 23, 2004; revised and accepted on July 23, 2004 Advance Access publication August 5, 2004 ABSTRACT format. All the features are customizable and all computa- Summary: Research over the last few years has revealed tions performed in real time, even for datasets with hundreds significant haplotype structure in the human genome. The of individuals and hundreds of markers. characterization of these patterns, particularly in the con- text of medical genetic association studies, is becoming FEATURES a routine research activity. Haploview is a software pack- age that provides computation of linkage disequilibrium Haploview accepts input in a variety of formats. Pedigree statistics and population haplotype patterns from primary data can be loaded as either partially or fully phased chro- genotype data in a visually appealing and interactive mosomes or as unphased diplotypes in the standard Linkage interface. format. The latter format also allows the user to specify Availability: http://www.broad.mit.edu/mpg/haploview/ family structure information as well as disease affection or Contact: jcbarret@broad.mit.edu case/control status. Marker information, including name and location is loaded separately. Haploview also directly accepts genotype data dumped from the Human HapMap website INTRODUCTION (http://www.hapmap.org). A graphical genome browser main- Knowledge of local linkage disequilibrium (LD) and common tained at that site allows researchers to navigate to a particular haplotype patterns in disease association and positional clon- region of the genome and dump HapMap genotype data for all ing studies is becoming increasingly widespread since it has genotyped markers in the selected region in a format accepted become clear (Van Eerdewegh et al., 2002; Rioux et al., 2001; by Haploview. Geesaman et al., 2003; Stoll et al., 2004) that intelligent use Upon loading a dataset, the software presents to the user of this information has the potential to make them much more a series of marker genotyping quality metrics. These include comprehensive and efficient. Early studies identifying unex- a check for conformance with Hardy–Weinberg equilibrium, pected extent of correlation and structure in haplotype patterns a tally of Mendelian inheritance errors and the percentage (Reich et al., 2001; Daly et al., 2001; Gabriel et al., 2002) have of individuals successfully genotyped for that marker. The led to the initiation of the Human Haplotype Map project program filters out markers which fall below a preset threshold (HapMap) to make this information available to all med- for these tests. The user can adjust these thresholds as well ical genetics researchers (International HapMap Consortium, as handpick markers to add or remove from the subsequent 2003). Given the dramatic increase in the size and number steps. At any time later in the process, the user may return to of disease association studies worldwide and the enormous this quality control panel, add or remove additional markers, amount of public genotype data from HapMap, tools for ana- and have the changes immediately reflected in the ongoing lyzing, interpreting and visualizing these data are of critical analyses. importance to researchers everywhere. Haploview calculates several pairwise measures of LD, Haploview is designed to provide a comprehensive suite which it uses to create a graphical representation (Fig. 1). The of tools for haplotype analysis for a wide variety of data- user has the option to select one of several commonly used set sizes. Haploview generates marker quality statistics, LD block definitions (Gabriel et al., 2002; Wang et al., 2002) to information, haplotype blocks, population haplotype frequen- partition the region into segments of strong LD. Alternatively, cies and single marker association statistics in a user-friendly the user may manually select groups of markers for sub- sequent haplotype analysis. This view also allows a number To whom correspondence should be addressed. of different color schemes to represent the LD relationships. Bioinformatics vol. 21 issue2©OxfordUniversity Press 2004; all rights reserved. 263 J.C.Barrett et al. Fig. 1. Haploview LD display with recombination rate plotted above (left) and haplotypes display (right). Interface developed at MIT Media Lab by B.Fry (http://acg.media.mit.edu/people/fry/). Further, the program allows the display of an ‘analysis track’ reflected in the LD and haplotype panels. This provides the above the LD plot, to display continuous variables such as ability to analyze the data in real-time. The information on recombination rate estimates (McVean et al., 2004) (Fig. 1). each panel is also able to be exported to a PNG for use in Once groups of markers are selected (either automatically presentations or publications or dumped to a text file. Addi- or manually), the program generates haplotypes and their tionally, the program has a fully functional command-line population frequencies (Fig. 1). This display shows lines to mode, which allows users to run all the analyses without indicate transitions from one block to the next with frequency opening the GUI on one or more files at once. corresponding to the thickness of the line and also presents Hedrick’s multiallelic D , which represents the degree of LD IMPLEMENTATION between two blocks, treating each haplotype within a block Haploview is written entirely in Java, which means it is usable as an ‘allele’ of that region. Again, customization is avail- on any platform with Java 1.3 or later installed. Running on a able for nearly all aspects of the display, including displaying 1.8 GHz Pentium 4 with 1 GB of RAM, Haploview can dis- alleles as letters, numbers or colored boxes and displaying play a dataset with 200 markers genotyped in 400 individuals only those haplotypes above an adjustable threshold in the and adjust parameters with no noticeable delay. The program population. is also able to be used (from the command line) to do the If affection status is included in the input file, Haplo- LD calculations on very large datasets in comparatively small view also calculates the standard TDT statistic (for trio 2 amounts of time. Haploview was able to compute 3.3 million data) or simple χ (for case/control data) for each marker pairwise LD values (comparisons of all markers closer than that can be used for association studies. Future versions 500 KB in a 45 500 marker dataset) in 30 min. will include several haplotype-tag SNP selection methods as Haploview uses a two marker EM (ignoring missing data) well as haplotype-based association testing and evaluation to estimate the maximum-likelihood values of the four gam- of significance using permutation testing. These final fea- ete frequencies, from which the D , LOD and r calcula- tures allow the user to go from raw genotype data through tions derive. Haplotype phase and population frequency are exploring genetic associations in one easy to use soft- inferred using a standard EM algorithm with a partition– ware package. Haploview is maintained as an open source ligation approach for blocks with greater than 10 markers. project (http://sourceforge.net/projects/haploview/), which Conformance with Hardy–Weinberg equilibrium is computed allows external parties to add their own methods in addition using an exact test (G.Abecasis and J.Wigginton, personal to the continuing development by the authors. communication). Each of these views of the data is shown on a separate tab (Fig. 1), allowing the user to move from one to the next, with interactive modifications made by the user in any panel REFERENCES reflected in all the others. For example, one can return at any Daly,M.J. Rioux,J.D., Schaffner,S.F., Hudson,T.J. and Lander,E.S. time to the review of marker quality and manually include (2001) High-resolution haplotype structure in the human genome. or exclude individual markers—these changes are instantly Nat. Genet., 29, 229–232. 264 Haploview Gabriel,S.B. Schaffner,S.F., Nguyen,H., Moore,J.M., Roy,J., Rioux,J.D., Daly,M.J., Silverberg,M.S., Lindblad,K., Steinhart,H., Blumenstiel,B., Higgins,J., Defelice,M., Lochner,A., Faggart,M. Cohen,Z., Delmonte,T., Kocher,K., Miller,K., Guschwan,S. et al. et al. (2002) The structure of haplotype blocks in the human (2001) Genetic variation in the 5q31 cytokine gene cluster confers genome. Science, 296, 2225–2229. susceptibility to Crohn disease. Nat. Genet., 29, 223–228. Geesaman,B.J., Benson,E., Brewster,S.J., Kunkel,L.M., Blanche,H., Stoll,M., Corneliussen,B., Costello,C.M., Waetzig,G.H., Thomas,G., Perls,T.T., Daly,M.J. and Puca,A.A. (2003) Mellgard,B., Kroch,W.A., Rosenstiel,P., Albrecht,M., Haplotype-based identification of a microsomal transfer protein Croucher,P.J., Seegert,D. et al. (2004) Genetic variation in marker associated with the human lifespan. Proc. Natl Acad. Sci., DLG5 is associated with inflammatory bowel disease. Nat. USA, 100, 14115–20. Genet., 36, 476–480. The International HapMap Consortium (2003) The International Van Eerdewegh,P., Little,R.D., Dupuis,J., Del Mastro,R.G., Falls,K., HapMap Project. Nature, 18, 789–796. Simon,J., Jorrey,D., Pandit,S., McKenny,J., Braunschweiger,K. McVean,G.A., Myers,S.R., Hunt,S., Deloukas,P., Bentley,D.R. and et al. (2002) Association of the ADAM33 gene with Donnelly,P. (2004) The fine-scale structure of recombination rate asthma and bronchial hyperresponsiveness. Nature, 418, variation in the human genome. Science, 304, 581–584. 426–430. Reich,D.E., Cargill,M., Bolk,S., Ireland,J., Sabeti,P.C., Richter,D.J., Wang,N., Akey,J.M., Zhang,K., Chakraborty,R. and Jin,L. (2002) Lavery,T., Kouyoumjian,R., Farhadian,S.F., Ward,R. and Distribution of recombination crossovers and the origin of hap- Lander,E.S. (2001) Linkage disequilibrium in the human genome. lotype blocks: the interplay of population history, recombination, Nature, 411, 199–204. and mutation. Am. J. Hum. Genet., 71, 1227–1234. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

Haploview: analysis and visualization of LD and haplotype maps

Bioinformatics , Volume 21 (2): 3 – Aug 5, 2004

Loading next page...
 
/lp/oxford-university-press/haploview-analysis-and-visualization-of-ld-and-haplotype-maps-qth8L1BMSg

References (20)

Publisher
Oxford University Press
Copyright
Bioinformatics vol. 21 issue 2 © Oxford University Press 2005; all rights reserved.
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/bth457
pmid
15297300
Publisher site
See Article on Publisher Site

Abstract

Vol. 21 no. 2 2005, pages 263–265 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/bth457 Haploview: analysis and visualization of LD and haplotype maps 1,∗ 2 1 1,3 J. C. Barrett ,B.Fry , J. Maller and M. J. Daly Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA, 2 3 MIT Media Lab, Cambridge, MA 02139, USA and Broad Institute of Harvard and MIT, Cambridge, MA, USA Received on June 23, 2004; revised and accepted on July 23, 2004 Advance Access publication August 5, 2004 ABSTRACT format. All the features are customizable and all computa- Summary: Research over the last few years has revealed tions performed in real time, even for datasets with hundreds significant haplotype structure in the human genome. The of individuals and hundreds of markers. characterization of these patterns, particularly in the con- text of medical genetic association studies, is becoming FEATURES a routine research activity. Haploview is a software pack- age that provides computation of linkage disequilibrium Haploview accepts input in a variety of formats. Pedigree statistics and population haplotype patterns from primary data can be loaded as either partially or fully phased chro- genotype data in a visually appealing and interactive mosomes or as unphased diplotypes in the standard Linkage interface. format. The latter format also allows the user to specify Availability: http://www.broad.mit.edu/mpg/haploview/ family structure information as well as disease affection or Contact: jcbarret@broad.mit.edu case/control status. Marker information, including name and location is loaded separately. Haploview also directly accepts genotype data dumped from the Human HapMap website INTRODUCTION (http://www.hapmap.org). A graphical genome browser main- Knowledge of local linkage disequilibrium (LD) and common tained at that site allows researchers to navigate to a particular haplotype patterns in disease association and positional clon- region of the genome and dump HapMap genotype data for all ing studies is becoming increasingly widespread since it has genotyped markers in the selected region in a format accepted become clear (Van Eerdewegh et al., 2002; Rioux et al., 2001; by Haploview. Geesaman et al., 2003; Stoll et al., 2004) that intelligent use Upon loading a dataset, the software presents to the user of this information has the potential to make them much more a series of marker genotyping quality metrics. These include comprehensive and efficient. Early studies identifying unex- a check for conformance with Hardy–Weinberg equilibrium, pected extent of correlation and structure in haplotype patterns a tally of Mendelian inheritance errors and the percentage (Reich et al., 2001; Daly et al., 2001; Gabriel et al., 2002) have of individuals successfully genotyped for that marker. The led to the initiation of the Human Haplotype Map project program filters out markers which fall below a preset threshold (HapMap) to make this information available to all med- for these tests. The user can adjust these thresholds as well ical genetics researchers (International HapMap Consortium, as handpick markers to add or remove from the subsequent 2003). Given the dramatic increase in the size and number steps. At any time later in the process, the user may return to of disease association studies worldwide and the enormous this quality control panel, add or remove additional markers, amount of public genotype data from HapMap, tools for ana- and have the changes immediately reflected in the ongoing lyzing, interpreting and visualizing these data are of critical analyses. importance to researchers everywhere. Haploview calculates several pairwise measures of LD, Haploview is designed to provide a comprehensive suite which it uses to create a graphical representation (Fig. 1). The of tools for haplotype analysis for a wide variety of data- user has the option to select one of several commonly used set sizes. Haploview generates marker quality statistics, LD block definitions (Gabriel et al., 2002; Wang et al., 2002) to information, haplotype blocks, population haplotype frequen- partition the region into segments of strong LD. Alternatively, cies and single marker association statistics in a user-friendly the user may manually select groups of markers for sub- sequent haplotype analysis. This view also allows a number To whom correspondence should be addressed. of different color schemes to represent the LD relationships. Bioinformatics vol. 21 issue2©OxfordUniversity Press 2004; all rights reserved. 263 J.C.Barrett et al. Fig. 1. Haploview LD display with recombination rate plotted above (left) and haplotypes display (right). Interface developed at MIT Media Lab by B.Fry (http://acg.media.mit.edu/people/fry/). Further, the program allows the display of an ‘analysis track’ reflected in the LD and haplotype panels. This provides the above the LD plot, to display continuous variables such as ability to analyze the data in real-time. The information on recombination rate estimates (McVean et al., 2004) (Fig. 1). each panel is also able to be exported to a PNG for use in Once groups of markers are selected (either automatically presentations or publications or dumped to a text file. Addi- or manually), the program generates haplotypes and their tionally, the program has a fully functional command-line population frequencies (Fig. 1). This display shows lines to mode, which allows users to run all the analyses without indicate transitions from one block to the next with frequency opening the GUI on one or more files at once. corresponding to the thickness of the line and also presents Hedrick’s multiallelic D , which represents the degree of LD IMPLEMENTATION between two blocks, treating each haplotype within a block Haploview is written entirely in Java, which means it is usable as an ‘allele’ of that region. Again, customization is avail- on any platform with Java 1.3 or later installed. Running on a able for nearly all aspects of the display, including displaying 1.8 GHz Pentium 4 with 1 GB of RAM, Haploview can dis- alleles as letters, numbers or colored boxes and displaying play a dataset with 200 markers genotyped in 400 individuals only those haplotypes above an adjustable threshold in the and adjust parameters with no noticeable delay. The program population. is also able to be used (from the command line) to do the If affection status is included in the input file, Haplo- LD calculations on very large datasets in comparatively small view also calculates the standard TDT statistic (for trio 2 amounts of time. Haploview was able to compute 3.3 million data) or simple χ (for case/control data) for each marker pairwise LD values (comparisons of all markers closer than that can be used for association studies. Future versions 500 KB in a 45 500 marker dataset) in 30 min. will include several haplotype-tag SNP selection methods as Haploview uses a two marker EM (ignoring missing data) well as haplotype-based association testing and evaluation to estimate the maximum-likelihood values of the four gam- of significance using permutation testing. These final fea- ete frequencies, from which the D , LOD and r calcula- tures allow the user to go from raw genotype data through tions derive. Haplotype phase and population frequency are exploring genetic associations in one easy to use soft- inferred using a standard EM algorithm with a partition– ware package. Haploview is maintained as an open source ligation approach for blocks with greater than 10 markers. project (http://sourceforge.net/projects/haploview/), which Conformance with Hardy–Weinberg equilibrium is computed allows external parties to add their own methods in addition using an exact test (G.Abecasis and J.Wigginton, personal to the continuing development by the authors. communication). Each of these views of the data is shown on a separate tab (Fig. 1), allowing the user to move from one to the next, with interactive modifications made by the user in any panel REFERENCES reflected in all the others. For example, one can return at any Daly,M.J. Rioux,J.D., Schaffner,S.F., Hudson,T.J. and Lander,E.S. time to the review of marker quality and manually include (2001) High-resolution haplotype structure in the human genome. or exclude individual markers—these changes are instantly Nat. Genet., 29, 229–232. 264 Haploview Gabriel,S.B. Schaffner,S.F., Nguyen,H., Moore,J.M., Roy,J., Rioux,J.D., Daly,M.J., Silverberg,M.S., Lindblad,K., Steinhart,H., Blumenstiel,B., Higgins,J., Defelice,M., Lochner,A., Faggart,M. Cohen,Z., Delmonte,T., Kocher,K., Miller,K., Guschwan,S. et al. et al. (2002) The structure of haplotype blocks in the human (2001) Genetic variation in the 5q31 cytokine gene cluster confers genome. Science, 296, 2225–2229. susceptibility to Crohn disease. Nat. Genet., 29, 223–228. Geesaman,B.J., Benson,E., Brewster,S.J., Kunkel,L.M., Blanche,H., Stoll,M., Corneliussen,B., Costello,C.M., Waetzig,G.H., Thomas,G., Perls,T.T., Daly,M.J. and Puca,A.A. (2003) Mellgard,B., Kroch,W.A., Rosenstiel,P., Albrecht,M., Haplotype-based identification of a microsomal transfer protein Croucher,P.J., Seegert,D. et al. (2004) Genetic variation in marker associated with the human lifespan. Proc. Natl Acad. Sci., DLG5 is associated with inflammatory bowel disease. Nat. USA, 100, 14115–20. Genet., 36, 476–480. The International HapMap Consortium (2003) The International Van Eerdewegh,P., Little,R.D., Dupuis,J., Del Mastro,R.G., Falls,K., HapMap Project. Nature, 18, 789–796. Simon,J., Jorrey,D., Pandit,S., McKenny,J., Braunschweiger,K. McVean,G.A., Myers,S.R., Hunt,S., Deloukas,P., Bentley,D.R. and et al. (2002) Association of the ADAM33 gene with Donnelly,P. (2004) The fine-scale structure of recombination rate asthma and bronchial hyperresponsiveness. Nature, 418, variation in the human genome. Science, 304, 581–584. 426–430. Reich,D.E., Cargill,M., Bolk,S., Ireland,J., Sabeti,P.C., Richter,D.J., Wang,N., Akey,J.M., Zhang,K., Chakraborty,R. and Jin,L. (2002) Lavery,T., Kouyoumjian,R., Farhadian,S.F., Ward,R. and Distribution of recombination crossovers and the origin of hap- Lander,E.S. (2001) Linkage disequilibrium in the human genome. lotype blocks: the interplay of population history, recombination, Nature, 411, 199–204. and mutation. Am. J. Hum. Genet., 71, 1227–1234.

Journal

BioinformaticsOxford University Press

Published: Aug 5, 2004

There are no references for this article.