Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Exome sequencing: the sweet spot before whole genomes

Exome sequencing: the sweet spot before whole genomes Human Molecular Genetics, 2010, Vol. 19, Review Issue 2 R145–R151 doi:10.1093/hmg/ddq333 Advance Access published on August 12, 2010 Exome sequencing: the sweet spot before whole genomes 1 2, Jamie K. Teer and James C. Mullikin 1 2 Genetic Disease Research Branch and Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA Received July 7, 2010; Revised and Accepted August 4, 2010 The development of massively parallel sequencing technologies, coupled with new massively parallel DNA enrichment technologies (genomic capture), has allowed the sequencing of targeted regions of the human genome in rapidly increasing numbers of samples. Genomic capture can target specific areas in the genome, including genes of interest and linkage regions, but this limits the study to what is already known. Exome capture allows an unbiased investigation of the complete protein-coding regions in the genome. Researchers can use exome capture to focus on a critical part of the human genome, allowing larger numbers of samples than are currently practical with whole-genome sequencing. In this review, we briefly describe some of the methodologies currently used for genomic and exome capture and highlight recent applications of this technology. they also require an educated guess as to which regions or INTRODUCTION genes may be interesting. Several of these methods have been The introduction and widespread use of massively parallel extended to capture the human exome, eliminating the need to sequencing has made it possible for individual laboratories to choose a subset of genes for interrogation and focussing on the sequence a whole human genome. However, the cost and best understood 1% of the genome, the protein-coding exons. capacity required are still significant, especially considering that the function of much of the genome is still largely CAPTURE METHODS unknown. Before massively parallel sequencing, specific regions of the genome were targeted using PCR, followed by Solid-phase hybridization capillary sequencing. This approach was effective at narrowing Solid-phase hybridization methods generally utilize probes the scope of investigation, but required a tightly defined guess complimentary to the sequences of interest affixed to a solid as to which region should be targeted. Larger-scale studies support, such as microarrays (7–11) (Fig. 1A) or filters (12). have used this method [X-chromosome exons (1), human The total DNA is applied to the probes, where the desired frag- exome (2)], but this remains a major undertaking that is not feas- ments hybridize. The non-targeted fragments are subsequently ible for many research groups. Recent studies have described new washed away, and the enriched DNA is eluted for sequencing. methods to target much larger regions of the human genome (up Recently, these methods have been improved using multiple to 3 Mb) in a more cost- and time-efficient manner (reviewed enrichment cycles (13,14). Agilent, Roche/Nimblegen and in 3–6). Such methods, described as genome capture, genome Febit offer commercial kits implementing these methods. partitioning, genome enrichment etc., are well suited to current massively parallel sequencing platforms, as they produce a pool of desired molecules that are separated by the parallel Liquid-phase hybridization nature of the sequencing technologies themselves. Although these methods can cover more of the human genome in a Liquid-phase hybridization is similar to solid phase; the probes shorter amount of time at reduced cost compared with PCR, in this method are not attached to a solid matrix, but instead To whom correspondence should be addressed at: 5625 Fishers Lane, Rm 5N-01Q, MSC 9400, Bethesda, MD 20892-9400, USA. Tel: +1 3014962416; Fax: +1 3014800634; Email: mullikin@mail.nih.gov Published by Oxford University Press 2010 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/ licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. R146 Human Molecular Genetics, 2010, Vol. 19, Review Issue 2 Figure 1. Illustration of different capture methods. Light blue bars represent desired genomic sequence, red bars represent unwanted sequence. (A) Solid-phase hybridization. Bait probes (light blue and black) complementary to the desired sequence are synthesized on a microarray. Fragmented genomic DNA is applied, and the desired fragments hybridize. The array is washed, and desired fragments are eluted. (B) Liquid-phase hybridization. Bait probes (light blue and black) complementary to the desired regions are synthesized, often using microarray technology. The probes are generally biotinylated (asterisk). The bait probes are mixed with fragmented genomic DNA, and the desired fragments hybridize to baits in solution. Streptavidin beads (black circles) are added to allow physical separation. The bead-bait complexes are washed, and desired DNA is eluted. (C) MIP. Single-stranded probes composed of a universal linker backbone (black line) and arms complementary to the sequence flanking desired regions (red and white) are synthesized, often using microarray or microfluidics technology. The probes are added to genomic DNA and hybridize in an inverted manner. A polymerase (yellow oval) fills in the gap between the two arms. A ligase (yellow star) seals the nick, resulting in a closed single-strand circle. Genomic DNA is digested with exonucleases, and the captured DNA is amplified using sequences in the universal backbone. (D) PEC. Biotinylated primers (red and white) are added to fragmented genomic DNA, where they hybridize to the desired sequence. A polymerase (yellow oval) extends the primer, creating a tighter interaction. Streptavidin beads (black circles) are added and are used to physically separate the desired DNA from the unwanted DNA. The desired DNA is then eluted. Human Molecular Genetics, 2010, Vol. 19, Review Issue 2 R147 are biotinylated (Fig. 1B). Following hybridization, the bioti- (SeqCap/SeqCap EZ)]. In the future, other methods may also nylated probes (with the complementary desired genomic be able to scale up as well. DNA) are bound to magnetic streptavidin beads and are separ- The term ‘whole human exome’ can be defined in many differ- ated from the undesired DNA by washing. After elution, ent ways. Two companies offer commercial kits for exome enriched DNA can be sequenced. Initial reports on this capture and have targeted the human consensus coding sequence method used biotinylated RNA probes (15) (commercially regions (28), which cover 29 Mb of the genome. This is a more available from Agilent), and recent methods use DNA conservative set of genes and includes only protein-coding probes (commercially available from Roche/Nimblegen). sequence. It covers 83% of the RefSeq coding exon bases. Both companies also target selected miRNAs, and extra regions can sometimes be added (Agilent). Although still a Polymerase-mediated capture subset of the genome, exome capture allows the investigation of a more complete set of human genes with the cost and time Although all capture methods use polymerases to amplify cap- advantages of genome capture. tured fragments, these methods use polymerases in a more integral way. Padlock probe technology has been extended to develop Molecular Inversion Probes (MIP) and Spacer Mul- APPLICATIONS tiplex Amplification ReacTion (SMART), in which a single probe acts as both a primer to start elongation and a receiver Following initial method descriptions, current research is to end elongation and allow ligation (Fig. 1C). Subsequent applying genome capture methods to a variety of questions. digestion of linear DNA leaves only the closed circular exten- From disease causation and diagnosis to evolutionary com- sion/ligation products with the desired sequence [MIP (16– parison of ancient genomes, genome capture and massively 19), SMART (20)]. Primer extension capture (PEC) was parallel sequencing is a powerful investigative tool. developed with small amounts of DNA in mind (Fig. 1D). This method uses a biotinylated primer with complimentary sequence to the DNA of interest. After annealing, the primer Medical sequencing is extended, effectively generating a hybridization probe to One of the more common exome capture experiments will be capture the sequence of interest like other hybridization the search for genetic variation underlying a particular disease. methods (21). Highly parallel PCR has been an effective For some diseases, causative genes have been identified, and method to prepare samples for capillary sequencing, and researchers can use custom captures to examine those genes recent work has extended this idea using microfluidics. for known and novel variants in their samples. For other dis- Instead of using plates with hundreds of wells, aqueous micro- eases, whole exome capture is suitable, as the causative droplets can segregate thousands of individual reactions in the gene is unknown, or many different genes may contribute. same tube, allowing for a much more highly parallel use of Several recent studies have captured and sequenced different PCR (22) (commercially available from Raindance). Another regions of individual genomes with known causative variants commercially available kit uses restriction enzymes to frag- or genes. These proof of principle experiments demonstrate ment DNA; probes specific to the ends of desired fragments the utility, as well as some shortcomings, of capture followed are used to amplify the desired sequence (Olink Genomics). by massively parallel sequencing. Ng et al.(26) have used array-based hybridization to sequence 12 human exomes Regional capture (28 Mb). The study included four unrelated individuals with Freeman–Sheldon syndrome, a dominantly inherited Other methods exist to isolate larger sections of the genome. rare Mendelian disorder. The investigators were able to ident- Chromosome sorting (reviewed in 23) has long been useful ify variants in the known causative gene in each sample. Inter- for genomics. Massively parallel sequencing is well suited to estingly, the known causative gene was the only candidate sequence libraries generated by fragmenting flow sorted following the application of numerous filters, including requir- chromosomes and offers a way to sequence a single chromo- ing a gene to have a novel variant in each sample. In their some. When odd chromosomal structures are present, or study of neurofibromatosis type 1, Chou et al.(29) used DNA is only available from a handful of molecules, microdis- custom array capture and pyrosequencing to target the section of metaphase chromosomes followed by sequencing 280 kb region containing the NF1 gene, which is known to has been reported (24). Although these methods require harbor causal dominant mutations. The authors captured highly specialized instruments, they do offer a powerful DNA from two different samples with known genotypes, but approach for unique cases. were initially only able to recover a known single-base del- etion. The other known variant, an Alu sequence insertion, was only observed after de novo assembly of unmapped EXOME CAPTURE reads. Additionally, the authors found many positions at Although many different methods for targeted capture have which the captured genotypes did not agree with Sanger been described, only few have been extended to target the sequencing confirmations. They found that while some discre- human exome. These methods belong to the hybridization pancies were due to pyrosequencing errors, others were misa- type and include array-based hybridization (9,25,26) and lignments from the numerous pseudogenes of NF1, illustrating liquid-based hybridization (27) [products available from one of the potential pitfalls of the method. Hoischen et al.(30) Agilent Technologies (SureSelect), RocheNimbleGen also used array-based capture (2 Mb) and pyrosequencing to R148 Human Molecular Genetics, 2010, Vol. 19, Review Issue 2 re-identify known variants in five individuals with autosomal known to cause congenital chloride-losing diarrhea (25). This recessive ataxia. They were able to initially identify 6/7 genetic finding allowed the researchers to correct an earlier known variants investigated; the seventh variant was visible diagnosis of the patient’s disorder. Variants in the same only after adding three times more sequence, although at a gene were present in other individuals, allowing the corrected low number of reads (2/9 reads contained the mutation). diagnosis for them as well. In the second study, Ng et al. used A known variant trinucleotide repeat was not included in the exome capture to search for variants causing Miller syndrome design, due to the repetitive nature of these variants, and there- in three unrelated families. They identify variants in DHODH fore not recovered. Raca et al.(31) searched for two known in all three families, using filters for novel variants that fit variants causative for Papillorenal syndrome using array-based inheritance models. These studies both showed that exome capture targeting the causative gene, PAX2, as well as .100 capture is an effective way to discover causative variants candidate genes for other ocular disorders (370 kb), followed and genes and to correctly diagnose heritable disorders by pyrosequencing. They were able to identify a known substi- caused by variants in known genes. tution using the provided sequencing analysis software, but did not recover the known single-base deletion in a homo-polymer Human evolution run, despite seeing reads containing the variant. The authors concluded the vendor provided software was conservative Recent advances in the sequencing of ancient DNA have also when dealing with insertions/deletions in homo-polymer benefited from targeted capture. Researchers used PEC to specifi- runs, as pyrosequencing has a higher error rate with this cally target mitochondrial DNA from five Neandertal samples type of sequence. Other analysis packages were able to ident- (21). The PEC method allowed complete coverage of the Nean- ify the variant. dertal mtDNA, using only 5–50 ng of amplified pyrosequencing Although these studies were not designed to identify novel library template. More recently, researchers used array-based variants causative for disease, much can be learned from them. capture to target, in Neandertal DNA, non-synonymous substi- Importantly, not every known variant was recovered. This was tutions that have been fixed in humans since the divergence due to low sequence depth at the variant position, as well as from the human/chimpanzee ancestor (37). Although the array- issues relating to repeat regions and alignment. One study esti- based capture did not have the low DNA requirements of PEC, mated that the probability of detecting a causative variant in the method allowed sequencing of a Neandertal sample contain- any given gene is 86%, although this ignores non-coding ing 99.8% contaminating microbial DNA. Owing to the high and structural variants (26). In order to ensure sufficient contamination, this sample was unsuitable for shotgun sequen- allele sampling, as well as to prevent sequencing errors from cing, but targeted capture allowed recovery for almost all of appearing to be actual variants, all four studies use or rec- the Neandertal sequence at the desired positions. The authors ommend a minimum sequence depth threshold, ranging from were able to then identify 88 substitutions that have become 8- to 30-fold depth of coverage. These recommendations fixed in humans since the split from Neandertal, giving insight will affect the amount of sequencing required for a given into what distinguishes us at the genetic, and perhaps molecular capture size and will therefore affect the cost of the exper- level. iment. Exome capture has been used to investigate more recent Targeted capture has also been used to identify novel genes variation as well. Researchers used whole exome capture to that cause hereditary disorders. Novel, putative causative var- identify changes in allele frequency between high-altitude iants have recently been discovered for a variety of disorders populations (Tibetans) and low altitude populations (Han [sensory/motor neuropathy with ataxia (32), Clericuzio-type Chinese and Danes) (38). They were able to identify a poikiloderma with neutropenia (33), familial exudative number of genes likely to have been selected for as a part of vitreoretinopathy (34), recessive non-syndromic hearing loss adaptation to a high-altitude environment. Several of these (35), talipes equinovarus, atrial septal defect, robin sequence, genes were identified in other studies using microarray geno- persistent left superior vena cava (36)] using genome capture typing (39,40). This suggests that exome capture techniques to target linkage regions from the affected families. The ident- are accurate and useful for these types of allelic frequency ified variants were almost all non-synonymous substitutions, studies and would be especially useful for rarer SNPs that but follow-up studies on additional unrelated samples using may not be included on the microarray platforms. Both Sanger sequencing also identified insertions/deletions in the recent and ancient genetic differences have been investigated same genes (33,35). Volpi et al.(33) identified a substitution using exome capture, allowing us to see a more complete that disrupted a splice site, resulting in an exon skip and a fra- view of our evolutionary history. meshift. Interestingly, Johnston et al.(36) were able to ident- ify variants in two different families (one non-sense, one Biological frameshifting insertion) without sequencing the probands, for which DNA was not available. These studies demonstrate Basic biology questions are also being investigated on a much the ability of genomic capture to discover different types of greater scale than previously possible using genome capture. novel variants important for human disease. Although the genetic information in DNA is frequently the In addition to custom capture studies, two whole exome initial focus of genome studies, epigenetic modification of studies have been recently reported. In the first, Choi et al. the DNA also plays an important role in the biological func- identified a novel coding variant in a consanguineous region tion of an organism. Two groups used genome capture with of an affected individual. The variant was a homozygous mis- padlock probes (19) or array-based capture (41) to investigate sense substitution in SLC26A3, a gene in which mutations are DNA methylation using bisulfite sequencing. Both studies Human Molecular Genetics, 2010, Vol. 19, Review Issue 2 R149 found this to be very accurate when compared with the stan- costs independent of sequencing, capture experiments focus dard capillary methods. The latter study also showed that sen- on subsets of the whole genome and will therefore always sitivity using array-based capture was high: 86–91% of require less sequencing. Thus, more capture experiments can targeted bases were covered by 10 or more reads. An be performed given a set amount of sequencing capacity. additional study focussed not on methylation status, but on Higher sample numbers result in higher power to detect vari- genetic variation at CpG sites, which are subject to a higher ation, a key metric for discovering causative variants, mutation rate via 5-methylcytosine deamination (17). Using especially for more common disorders. An argument in padlock probes, the researchers were able to determine geno- favor of whole-genome sequencing is that it is unwise to types for 65% of targeted bases. The accuracy was very high limit the data by doing capture experiments; it may be worth when compared with an independent genotype assessment. the additional cost to sequence ‘everything’. While this may These CpG region studies show that capture is useful to be true, if researchers are confident that the desired genome focus on the desired regions and is effective, even on difficult subset (linkage regions, CpG islands, genes of interest etc.) (high GC content) regions. is all they need to look at, more samples can be examined, Copy number variation (CNV) is another source of genetic and the data are limited to what is of interest. Data fatigue variation implicated in disease. The detection of copy number from attempting to interpret whole-genome sequence is not changes is often performed using low-resolution methods, insignificant. Will an investigator be able to pick out the such as array-comparative genomic hybridization and single important variants out of a list of millions of positions? nucleotide polymorphism (SNP) microarrays. Conrad et al. Although capture data can also contain large numbers of var- (42) have used targeted sequencing to capture breakpoint iants, the number is nearly two orders of magnitude lower than regions and identify the actual breaks with a high resolution. that from whole-genome sequence, making secondary ana- They were able to identify breakpoints for a number of lyses much less onerous. This is particularly important when known CNVs and were then able to classify the breaks into bioinformatics personnel and resources are limiting (annotat- likely repair mechanisms used. The authors point out that ing lists of hundreds of variants is possible to accomplish by this method is useful for CNVs in simpler regions, as repeat hand; doing so for tens of thousands variants is not). There- elements and complex genomic regions present challenges fore, it seems likely that targeted sequencing will be useful both for capture and post-sequence alignment. along side of whole-genome sequencing. Researchers will Capture is not only limited to genomic DNA. Several need to consider all aspects of a given project before deciding studies have used targeted sequencing to investigate RNA as on whether to proceed with whole genome or targeted sequen- well. One group used padlock probes to target regions contain- cing. Fortunately, ever decreasing sequencing costs may allow ing known RNA-editing sites (43). They were able to identify mixed approaches. Targeted sequencing has been shown to be sites in 10 of 13 known edited genes, by comparing captures of a robust, effective technique that leverages the unique aspects genomic DNA and cDNA from various tissues. The authors of massively parallel sequencing and has already yielded many chose 18 editing sites at random and confirmed 15 with capil- exciting new discoveries. lary sequencing. This research showed that padlock capture techniques work with cDNA and can be used to identify Conflict of Interest statement. None declared. sites of RNA editing. Hybridization capture was also shown to capture cDNA (44,45). In (44), the authors capture both cDNA and genomic DNA with an array-based method. They FUNDING then determine allele-specific expression using both data The authors are supported by the Intramural Research Program sets. In (45), the authors use solution hybridization to focus of the National Human Genome Research Institute. Funding to on enriching cDNA from a set of genes of interest. They pay the Open Access Charge was provided by the Intramural were able to effectively enrich these genes, suggesting that Research Program of the National Human Genome Research genes of low abundance could be detected without huge Institute, National Institutes of Health. increases in total sequencing. Interestingly, they were also able to identify gene fusions, including fusions in which one gene was not targeted. Applying targeted sequencing to REFERENCES cDNA is another way to focus on specific questions, even 1. Tarpey, P.S., Smith, R., Pleasance, E., Whibley, A., Edkins, S., Hardy, C., without whole-genome sequence. O’Meara, S., Latimer, C., Dicks, E., Menzies, A. et al. (2009) A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation. Nat. Genet., 41, 535–543. 2. Jones, S., Zhang, X., Parsons, D.W., Lin, J.C., Leary, R.J., Angenendt, P., FUTURE Mankoo, P., Carter, H., Kamiyama, H., Jimeno, A. et al. (2008) Core signaling pathways in human pancreatic cancers revealed by global One of the main reasons for performing a capture experiment genomic analyses. Science, 321, 1801–1806. is the significantly increased cost and time required for whole- 3. Garber, K. (2008) Fixing the front end. Nat. Biotechnol., 26, 1101–1104. genome sequencing. However, the constant improvements to 4. Summerer, D. (2009) Enabling technologies of genomic-scale sequence enrichment for targeted high-throughput sequencing. Genomics, 94, massively parallel sequencing technologies and the impending 363–368. massively parallel single-molecule sequencing technologies 5. Turner, E.H., Ng, S.B., Nickerson, D.A. and Shendure, J. (2009) Methods for will certainly reduce these cost and time barriers. One may genomic partitioning. Annu. Rev. Genomics Hum. Genet., 10, 263–284. wonder what role capture will play as whole-genome sequen- 6. Mamanova, L., Coffey, A.J., Scott, C.E., Kozarewa, I., Turner, E.H., cing is no longer impractical. Although capture has inherent Kumar, A., Howard, E., Shendure, J. and Turner, D.J. (2010) R150 Human Molecular Genetics, 2010, Vol. 19, Review Issue 2 Target-enrichment strategies for next-generation sequencing. Nat. 26. Ng, S.B., Turner, E.H., Robertson, P.D., Flygare, S.D., Bigham, A.W., Methods, 7, 111–118. Lee, C., Shaffer, T., Wong, M., Bhattacharjee, A., Eichler, E.E. et al. 7. Albert, T.J., Molla, M.N., Muzny, D.M., Nazareth, L., Wheeler, D., Song, X., (2009) Targeted capture and massively parallel sequencing of 12 human Richmond, T.A., Middle, C.M., Rodesch, M.J., Packard, C.J. et al. (2007) exomes. Nature, 461, 272–276. Direct selection of human genomic loci by microarray hybridization. Nat. 27. Bainbridge, M.N., Wang, M., Burgess, D.L., Kovar, C., Rodesch, M.J., Methods, 4, 903–905. D’Ascenzo, M., Kitzman, J., Wu, Y.Q., Newsham, I., Richmond, T.A. 8. Okou, D.T., Steinberg, K.M., Middle, C., Cutler, D.J., Albert, T.J. and et al. (2010) Whole exome capture in solution with 3 Gbp of data. Zwick, M.E. (2007) Microarray-based genomic selection for Genome Biol., 11, R62. high-throughput resequencing. Nat. Methods, 4, 907–909. 28. Pruitt, K.D., Harrow, J., Harte, R.A., Wallin, C., Diekhans, M., Maglott, D.R., 9. Hodges, E., Xuan, Z., Balija, V., Kramer, M., Molla, M.N., Smith, S.W., Searle, S., Farrell, C.M., Loveland, J.E., Ruef, B.J. et al. (2009) The consensus Middle, C.M., Rodesch, M.J., Albert, T.J., Hannon, G.J. et al. (2007) coding sequence (CCDS) project: Identifying a common protein-coding Genome-wide in situ exon capture for selective resequencing. Nat. Genet., gene set for the human and mouse genomes. Genome Res., 19, 39, 1522–1527. 1316–1323. 10. Hodges, E., Rooks, M., Xuan, Z., Bhattacharjee, A., Benjamin Gordon, D., 29. Chou, L.S., Liu, C.S., Boese, B., Zhang, X. and Mao, R. (2010) DNA Brizuela, L., Richard McCombie, W. and Hannon, G.J. (2009) Hybrid sequence capture and enrichment by microarray followed by selection of discrete genomic intervals on custom-designed microarrays for next-generation sequencing for targeted resequencing: neurofibromatosis massively parallel sequencing. Nat. Protoc., 4, 960–974. type 1 gene as a model. Clin. Chem., 56, 62–72. 11. Bau, S., Schracke, N., Kranzle, M., Wu, H., Stahler, P.F., Hoheisel, J.D., 30. Hoischen, A., Gilissen, C., Arts, P., Wieskamp, N., van der Vliet, W., Beier, M. and Summerer, D. (2009) Targeted next-generation sequencing Vermeer, S., Steehouwer, M., de Vries, P., Meijer, R., Seiqueros, J. et al. by specific capture of multiple genomic loci using low-volume (2010) Massively parallel sequencing of ataxia genes after array-based microfluidic DNA arrays. Anal. Bioanal. Chem., 393, 171–175. enrichment. Hum. Mutat., 31, 494–499. 12. Herman, D.S., Hovingh, G.K., Iartchouk, O., Rehm, H.L., Kucherlapati, R., 31. Raca, G., Jackson, C., Warman, B., Bair, T. and Schimmenti, L.A. (2010) Seidman, J.G. and Seidman, C.E. (2009) Filter-based hybridization capture of Next generation sequencing in research and diagnostics of ocular birth subgenomes enables resequencing and copy-number detection. Nat. Methods, defects. Mol. Genet. Metab., 100, 184–192. 6, 507–510. 32. Brkanac, Z., Spencer, D., Shendure, J., Robertson, P.D., Matsushita, M., 13. Summerer, D., Wu, H., Haase, B., Cheng, Y., Schracke, N., Stahler, C.F., Vu, T., Bird, T.D., Olson, M.V. and Raskind, W.H. (2009) IFRD1 is a Chee, M.S., Stahler, P.F. and Beier, M. (2009) Microarray-based candidate gene for SMNA on chromosome 7q22–q23. Am. J. Hum. multicycle-enrichment of genomic subsets for targeted next-generation Genet., 84, 692–697. sequencing. Genome Res., 19, 1616–1621. 33. Volpi, L., Roversi, G., Colombo, E.A., Leijsten, N., Concolino, D., 14. Lee, H., O’Connor, B.D., Merriman, B., Funari, V.A., Homer, N., Chen, Z., Calabria, A., Mencarelli, M.A., Fimiani, M., Macciardi, F., Pfundt, R. Cohn, D.H. and Nelson, S.F. (2009) Improving the efficiency of genomic loci et al. (2010) Targeted next-generation sequencing appoints c16orf57 as capture using oligonucleotide arrays for high throughput resequencing. BMC clericuzio-type poikiloderma with neutropenia gene. Am. J. Hum. Genet., Genomics, 10,646. 86, 72–76. 15. Gnirke, A., Melnikov, A., Maguire, J., Rogov, P., LeProust, E.M., 34. Nikopoulos, K., Gilissen, C., Hoischen, A., van Nouhuys, C.E., Boonstra, F.N., Brockman, W., Fennell, T., Giannoukos, G., Fisher, S., Russ, C. et al. Blokland, E.A., Arts, P., Wieskamp, N., Strom, T.M., Ayuso, C. et al. (2010) (2009) Solution hybrid selection with ultra-long oligonucleotides for Next-generation sequencing of a 40 Mb linkage interval reveals TSPAN12 massively parallel targeted sequencing. Nat. Biotechnol., 27, 182–189. mutations in patients with familial exudative vitreoretinopathy. Am.J.Hum. 16. Porreca, G.J., Zhang, K., Li, J.B., Xie, B., Austin, D., Vassallo, S.L., Genet., 86, 240–247. LeProust, E.M., Peck, B.J., Emig, C.J., Dahl, F. et al. (2007) Multiplex 35. Rehman, A.U., Morell, R.J., Belyantseva, I.A., Khan, S.Y., Boger, E.T., amplification of large sets of human exons. Nat. Methods, 4, 931–936. Shahzad, M., Ahmed, Z.M., Riazuddin, S., Khan, S.N. and Friedman, T.B. 17. Li, J.B., Gao, Y., Aach, J., Zhang, K., Kryukov, G.V., Xie, B., Ahlford, A., (2010) Targeted capture and next-generation sequencing identifies Yoon, J.K., Rosenbaum, A.M., Zaranek, A.W. et al. (2009) Multiplex padlock C9orf75, encoding taperin, as the mutated gene in nonsyndromic deafness targeted sequencing reveals human hypermutable CpG variations. Genome DFNB79. Am. J. Hum. Genet., 86, 378–388. Res., 19, 1606–1615. 36. Johnston, J.J., Teer, J.K., Cherukuri, P.F., Hansen, N.F., Loftus, S.K., 18. Turner, E.H., Lee, C., Ng, S.B., Nickerson, D.A. and Shendure, J. (2009) Chong, K., Mullikin, J.C. and Biesecker, L.G. (2010) Massively parallel Massively parallel exon capture and library-free resequencing across 16 sequencing of exons on the X chromosome identifies RBM10 as the gene genomes. Nat. Methods, 6, 315–316. that causes a syndromic form of cleft palate. Am. J. Hum. Genet., 86, 19. Deng, J., Shoemaker, R., Xie, B., Gore, A., LeProust, E.M., 743–748. Antosiewicz-Bourget, J., Egli, D., Maherali, N., Park, I.H., Yu, J. et al. 37. Burbano, H.A., Hodges, E., Green, R.E., Briggs, A.W., Krause, J., Meyer, M., (2009) Targeted bisulfite sequencing reveals changes in DNA methylation Good, J.M., Maricic, T., Johnson, P.L., Xuan, Z. et al. (2010) Targeted associated with nuclear reprogramming. Nat. Biotechnol., 27, 353–360. investigation of the Neandertal genome by array-based sequence capture. 20. Krishnakumar, S., Zheng, J., Wilhelmy, J., Faham, M., Mindrinos, M. and Science, 328, 723–725. Davis, R. (2008) A comprehensive assay for targeted multiplex 38. Yi, X., Liang, Y., Huerta-Sanchez, E., Jin, X., Cuo, Z.X.P., Pool, J.E., Xu, X., amplification of human DNA sequences. Proc. Natl Acad. Sci. USA, 105, Jiang, H., Vinckenbosch, N., Korneliussen, T.S. et al. (2010) Sequencing of 9296–9301. 50 human exomes reveals adaptation to high altitude. Science, 329, 75–78. 21. Briggs, A.W., Good, J.M., Green, R.E., Krause, J., Maricic, T., Stenzel, U., 39. Beall, C.M., Cavalleri, G.L., Deng, L., Elston, R.C., Gao, Y., Knight, J., Lalueza-Fox, C., Rudan, P., Brajkovic, D., Kucan, Z. et al. (2009) Targeted Li, C., Li, J.C., Liang, Y., McCormack, M. et al. (2010) Natural selection retrieval and analysis of five Neandertal mtDNA genomes. Science, 325, on EPAS1 (HIF2alpha) associated with low hemoglobin concentration in 318–321. Tibetan highlanders. Proc. Natl Acad. Sci. USA, 107, 11459–11464. 22. Tewhey, R., Warner, J.B., Nakano, M., Libby, B., Medkova, M., David, P.H., 40. Simonson, T.S., Yang, Y., Huff, C.D., Yun, H., Qin, G., Witherspoon, D.J., Kotsopoulos, S.K., Samuels, M.L., Hutchison, J.B., Larson, J.W. et al. (2009) Bai, Z., Lorenzo, F.R., Xing, J., Jorde, L.B. et al. (2010) Genetic evidence for Microdroplet-based PCR enrichment for large-scale targeted sequencing. high-altitude adaptation in Tibet. Science, 329, 72–75. Nat. Biotechnol., 27, 1025–1031. 41. Hodges, E., Smith, A.D., Kendall, J., Xuan, Z., Ravi, K., Rooks, M., 23. Ibrahim, S.F. and van den Engh, G. (2004) High-speed chromosome Zhang, M.Q., Ye, K., Bhattacharjee, A., Brizuela, L. et al. (2009) High sorting. Chromosome Res., 12, 5–14. definition profiling of mammalian DNA methylation by array capture and 24. Weise, A., Timmermann, B., Grabherr, M., Werber, M., Heyn, P., single molecule bisulfite sequencing. Genome Res., 19, 1593–1605. Kosyakova, N., Liehr, T., Neitzel, H., Konrat, K., Bommer, C. et al. 42. Conrad, D.F., Bird, C., Blackburne, B., Lindsay, S., Mamanova, L., Lee, C., (2010) High-throughput sequencing of microdissected chromosomal Turner, D.J. and Hurles, M.E. (2010) Mutation spectrum revealed by regions. Eur. J. Hum. Genet., 18, 457–462. breakpoint sequencing of human germline CNVs. Nat. Genet., 42, 385–391. 25. Choi, M., Scholl, U.I., Ji, W., Liu, T., Tikhonova, I.R., Zumbo, P., Nayir, A., 43. Li, J.B., Levanon, E.Y., Yoon, J.K., Aach, J., Xie, B., Leproust, E., Zhang, K., Bakkaloglu, A., Ozen, S., Sanjad, S. et al. (2009) Genetic diagnosis by whole Gao, Y. and Church, G.M. (2009) Genome-wide identification of human exome capture and massively parallel DNA sequencing. Proc. Natl Acad. Sci. RNA editing sites by parallel DNA capturing and sequencing. Science, 324, USA, 106, 19096–19101. 1210–1213. Human Molecular Genetics, 2010, Vol. 19, Review Issue 2 R151 44. Heap, G.A., Yang, J.H., Downes, K., Healy, B.C., Hunt, K.A., Bockett, N., 45. Levin, J.Z., Berger, M.F., Adiconis, X., Rogov, P., Melnikov, A., Fennell, T., Franke, L., Dubois, P.C., Mein, C.A., Dobson, R.J. et al. (2010) Genome-wide Nusbaum, C., Garraway, L.A. and Gnirke, A. (2009) Targeted next- analysis of allelic expression imbalance in human primary cells by generation sequencing of a cancer transcriptome enhances detection of high-throughput transcriptome resequencing. Hum. Mol. Genet., 19, sequence variants and novel fusion transcripts. Genome Biol., 10, 122–134. R115. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Human Molecular Genetics Pubmed Central

Exome sequencing: the sweet spot before whole genomes

Human Molecular Genetics , Volume 19 (R2) – Aug 12, 2010

Loading next page...
 
/lp/pubmed-central/exome-sequencing-the-sweet-spot-before-whole-genomes-RH3CU0faEE

References (51)

Publisher
Pubmed Central
Copyright
Published by Oxford University Press 2010
ISSN
0964-6906
eISSN
1460-2083
DOI
10.1093/hmg/ddq333
Publisher site
See Article on Publisher Site

Abstract

Human Molecular Genetics, 2010, Vol. 19, Review Issue 2 R145–R151 doi:10.1093/hmg/ddq333 Advance Access published on August 12, 2010 Exome sequencing: the sweet spot before whole genomes 1 2, Jamie K. Teer and James C. Mullikin 1 2 Genetic Disease Research Branch and Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA Received July 7, 2010; Revised and Accepted August 4, 2010 The development of massively parallel sequencing technologies, coupled with new massively parallel DNA enrichment technologies (genomic capture), has allowed the sequencing of targeted regions of the human genome in rapidly increasing numbers of samples. Genomic capture can target specific areas in the genome, including genes of interest and linkage regions, but this limits the study to what is already known. Exome capture allows an unbiased investigation of the complete protein-coding regions in the genome. Researchers can use exome capture to focus on a critical part of the human genome, allowing larger numbers of samples than are currently practical with whole-genome sequencing. In this review, we briefly describe some of the methodologies currently used for genomic and exome capture and highlight recent applications of this technology. they also require an educated guess as to which regions or INTRODUCTION genes may be interesting. Several of these methods have been The introduction and widespread use of massively parallel extended to capture the human exome, eliminating the need to sequencing has made it possible for individual laboratories to choose a subset of genes for interrogation and focussing on the sequence a whole human genome. However, the cost and best understood 1% of the genome, the protein-coding exons. capacity required are still significant, especially considering that the function of much of the genome is still largely CAPTURE METHODS unknown. Before massively parallel sequencing, specific regions of the genome were targeted using PCR, followed by Solid-phase hybridization capillary sequencing. This approach was effective at narrowing Solid-phase hybridization methods generally utilize probes the scope of investigation, but required a tightly defined guess complimentary to the sequences of interest affixed to a solid as to which region should be targeted. Larger-scale studies support, such as microarrays (7–11) (Fig. 1A) or filters (12). have used this method [X-chromosome exons (1), human The total DNA is applied to the probes, where the desired frag- exome (2)], but this remains a major undertaking that is not feas- ments hybridize. The non-targeted fragments are subsequently ible for many research groups. Recent studies have described new washed away, and the enriched DNA is eluted for sequencing. methods to target much larger regions of the human genome (up Recently, these methods have been improved using multiple to 3 Mb) in a more cost- and time-efficient manner (reviewed enrichment cycles (13,14). Agilent, Roche/Nimblegen and in 3–6). Such methods, described as genome capture, genome Febit offer commercial kits implementing these methods. partitioning, genome enrichment etc., are well suited to current massively parallel sequencing platforms, as they produce a pool of desired molecules that are separated by the parallel Liquid-phase hybridization nature of the sequencing technologies themselves. Although these methods can cover more of the human genome in a Liquid-phase hybridization is similar to solid phase; the probes shorter amount of time at reduced cost compared with PCR, in this method are not attached to a solid matrix, but instead To whom correspondence should be addressed at: 5625 Fishers Lane, Rm 5N-01Q, MSC 9400, Bethesda, MD 20892-9400, USA. Tel: +1 3014962416; Fax: +1 3014800634; Email: mullikin@mail.nih.gov Published by Oxford University Press 2010 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/ licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. R146 Human Molecular Genetics, 2010, Vol. 19, Review Issue 2 Figure 1. Illustration of different capture methods. Light blue bars represent desired genomic sequence, red bars represent unwanted sequence. (A) Solid-phase hybridization. Bait probes (light blue and black) complementary to the desired sequence are synthesized on a microarray. Fragmented genomic DNA is applied, and the desired fragments hybridize. The array is washed, and desired fragments are eluted. (B) Liquid-phase hybridization. Bait probes (light blue and black) complementary to the desired regions are synthesized, often using microarray technology. The probes are generally biotinylated (asterisk). The bait probes are mixed with fragmented genomic DNA, and the desired fragments hybridize to baits in solution. Streptavidin beads (black circles) are added to allow physical separation. The bead-bait complexes are washed, and desired DNA is eluted. (C) MIP. Single-stranded probes composed of a universal linker backbone (black line) and arms complementary to the sequence flanking desired regions (red and white) are synthesized, often using microarray or microfluidics technology. The probes are added to genomic DNA and hybridize in an inverted manner. A polymerase (yellow oval) fills in the gap between the two arms. A ligase (yellow star) seals the nick, resulting in a closed single-strand circle. Genomic DNA is digested with exonucleases, and the captured DNA is amplified using sequences in the universal backbone. (D) PEC. Biotinylated primers (red and white) are added to fragmented genomic DNA, where they hybridize to the desired sequence. A polymerase (yellow oval) extends the primer, creating a tighter interaction. Streptavidin beads (black circles) are added and are used to physically separate the desired DNA from the unwanted DNA. The desired DNA is then eluted. Human Molecular Genetics, 2010, Vol. 19, Review Issue 2 R147 are biotinylated (Fig. 1B). Following hybridization, the bioti- (SeqCap/SeqCap EZ)]. In the future, other methods may also nylated probes (with the complementary desired genomic be able to scale up as well. DNA) are bound to magnetic streptavidin beads and are separ- The term ‘whole human exome’ can be defined in many differ- ated from the undesired DNA by washing. After elution, ent ways. Two companies offer commercial kits for exome enriched DNA can be sequenced. Initial reports on this capture and have targeted the human consensus coding sequence method used biotinylated RNA probes (15) (commercially regions (28), which cover 29 Mb of the genome. This is a more available from Agilent), and recent methods use DNA conservative set of genes and includes only protein-coding probes (commercially available from Roche/Nimblegen). sequence. It covers 83% of the RefSeq coding exon bases. Both companies also target selected miRNAs, and extra regions can sometimes be added (Agilent). Although still a Polymerase-mediated capture subset of the genome, exome capture allows the investigation of a more complete set of human genes with the cost and time Although all capture methods use polymerases to amplify cap- advantages of genome capture. tured fragments, these methods use polymerases in a more integral way. Padlock probe technology has been extended to develop Molecular Inversion Probes (MIP) and Spacer Mul- APPLICATIONS tiplex Amplification ReacTion (SMART), in which a single probe acts as both a primer to start elongation and a receiver Following initial method descriptions, current research is to end elongation and allow ligation (Fig. 1C). Subsequent applying genome capture methods to a variety of questions. digestion of linear DNA leaves only the closed circular exten- From disease causation and diagnosis to evolutionary com- sion/ligation products with the desired sequence [MIP (16– parison of ancient genomes, genome capture and massively 19), SMART (20)]. Primer extension capture (PEC) was parallel sequencing is a powerful investigative tool. developed with small amounts of DNA in mind (Fig. 1D). This method uses a biotinylated primer with complimentary sequence to the DNA of interest. After annealing, the primer Medical sequencing is extended, effectively generating a hybridization probe to One of the more common exome capture experiments will be capture the sequence of interest like other hybridization the search for genetic variation underlying a particular disease. methods (21). Highly parallel PCR has been an effective For some diseases, causative genes have been identified, and method to prepare samples for capillary sequencing, and researchers can use custom captures to examine those genes recent work has extended this idea using microfluidics. for known and novel variants in their samples. For other dis- Instead of using plates with hundreds of wells, aqueous micro- eases, whole exome capture is suitable, as the causative droplets can segregate thousands of individual reactions in the gene is unknown, or many different genes may contribute. same tube, allowing for a much more highly parallel use of Several recent studies have captured and sequenced different PCR (22) (commercially available from Raindance). Another regions of individual genomes with known causative variants commercially available kit uses restriction enzymes to frag- or genes. These proof of principle experiments demonstrate ment DNA; probes specific to the ends of desired fragments the utility, as well as some shortcomings, of capture followed are used to amplify the desired sequence (Olink Genomics). by massively parallel sequencing. Ng et al.(26) have used array-based hybridization to sequence 12 human exomes Regional capture (28 Mb). The study included four unrelated individuals with Freeman–Sheldon syndrome, a dominantly inherited Other methods exist to isolate larger sections of the genome. rare Mendelian disorder. The investigators were able to ident- Chromosome sorting (reviewed in 23) has long been useful ify variants in the known causative gene in each sample. Inter- for genomics. Massively parallel sequencing is well suited to estingly, the known causative gene was the only candidate sequence libraries generated by fragmenting flow sorted following the application of numerous filters, including requir- chromosomes and offers a way to sequence a single chromo- ing a gene to have a novel variant in each sample. In their some. When odd chromosomal structures are present, or study of neurofibromatosis type 1, Chou et al.(29) used DNA is only available from a handful of molecules, microdis- custom array capture and pyrosequencing to target the section of metaphase chromosomes followed by sequencing 280 kb region containing the NF1 gene, which is known to has been reported (24). Although these methods require harbor causal dominant mutations. The authors captured highly specialized instruments, they do offer a powerful DNA from two different samples with known genotypes, but approach for unique cases. were initially only able to recover a known single-base del- etion. The other known variant, an Alu sequence insertion, was only observed after de novo assembly of unmapped EXOME CAPTURE reads. Additionally, the authors found many positions at Although many different methods for targeted capture have which the captured genotypes did not agree with Sanger been described, only few have been extended to target the sequencing confirmations. They found that while some discre- human exome. These methods belong to the hybridization pancies were due to pyrosequencing errors, others were misa- type and include array-based hybridization (9,25,26) and lignments from the numerous pseudogenes of NF1, illustrating liquid-based hybridization (27) [products available from one of the potential pitfalls of the method. Hoischen et al.(30) Agilent Technologies (SureSelect), RocheNimbleGen also used array-based capture (2 Mb) and pyrosequencing to R148 Human Molecular Genetics, 2010, Vol. 19, Review Issue 2 re-identify known variants in five individuals with autosomal known to cause congenital chloride-losing diarrhea (25). This recessive ataxia. They were able to initially identify 6/7 genetic finding allowed the researchers to correct an earlier known variants investigated; the seventh variant was visible diagnosis of the patient’s disorder. Variants in the same only after adding three times more sequence, although at a gene were present in other individuals, allowing the corrected low number of reads (2/9 reads contained the mutation). diagnosis for them as well. In the second study, Ng et al. used A known variant trinucleotide repeat was not included in the exome capture to search for variants causing Miller syndrome design, due to the repetitive nature of these variants, and there- in three unrelated families. They identify variants in DHODH fore not recovered. Raca et al.(31) searched for two known in all three families, using filters for novel variants that fit variants causative for Papillorenal syndrome using array-based inheritance models. These studies both showed that exome capture targeting the causative gene, PAX2, as well as .100 capture is an effective way to discover causative variants candidate genes for other ocular disorders (370 kb), followed and genes and to correctly diagnose heritable disorders by pyrosequencing. They were able to identify a known substi- caused by variants in known genes. tution using the provided sequencing analysis software, but did not recover the known single-base deletion in a homo-polymer Human evolution run, despite seeing reads containing the variant. The authors concluded the vendor provided software was conservative Recent advances in the sequencing of ancient DNA have also when dealing with insertions/deletions in homo-polymer benefited from targeted capture. Researchers used PEC to specifi- runs, as pyrosequencing has a higher error rate with this cally target mitochondrial DNA from five Neandertal samples type of sequence. Other analysis packages were able to ident- (21). The PEC method allowed complete coverage of the Nean- ify the variant. dertal mtDNA, using only 5–50 ng of amplified pyrosequencing Although these studies were not designed to identify novel library template. More recently, researchers used array-based variants causative for disease, much can be learned from them. capture to target, in Neandertal DNA, non-synonymous substi- Importantly, not every known variant was recovered. This was tutions that have been fixed in humans since the divergence due to low sequence depth at the variant position, as well as from the human/chimpanzee ancestor (37). Although the array- issues relating to repeat regions and alignment. One study esti- based capture did not have the low DNA requirements of PEC, mated that the probability of detecting a causative variant in the method allowed sequencing of a Neandertal sample contain- any given gene is 86%, although this ignores non-coding ing 99.8% contaminating microbial DNA. Owing to the high and structural variants (26). In order to ensure sufficient contamination, this sample was unsuitable for shotgun sequen- allele sampling, as well as to prevent sequencing errors from cing, but targeted capture allowed recovery for almost all of appearing to be actual variants, all four studies use or rec- the Neandertal sequence at the desired positions. The authors ommend a minimum sequence depth threshold, ranging from were able to then identify 88 substitutions that have become 8- to 30-fold depth of coverage. These recommendations fixed in humans since the split from Neandertal, giving insight will affect the amount of sequencing required for a given into what distinguishes us at the genetic, and perhaps molecular capture size and will therefore affect the cost of the exper- level. iment. Exome capture has been used to investigate more recent Targeted capture has also been used to identify novel genes variation as well. Researchers used whole exome capture to that cause hereditary disorders. Novel, putative causative var- identify changes in allele frequency between high-altitude iants have recently been discovered for a variety of disorders populations (Tibetans) and low altitude populations (Han [sensory/motor neuropathy with ataxia (32), Clericuzio-type Chinese and Danes) (38). They were able to identify a poikiloderma with neutropenia (33), familial exudative number of genes likely to have been selected for as a part of vitreoretinopathy (34), recessive non-syndromic hearing loss adaptation to a high-altitude environment. Several of these (35), talipes equinovarus, atrial septal defect, robin sequence, genes were identified in other studies using microarray geno- persistent left superior vena cava (36)] using genome capture typing (39,40). This suggests that exome capture techniques to target linkage regions from the affected families. The ident- are accurate and useful for these types of allelic frequency ified variants were almost all non-synonymous substitutions, studies and would be especially useful for rarer SNPs that but follow-up studies on additional unrelated samples using may not be included on the microarray platforms. Both Sanger sequencing also identified insertions/deletions in the recent and ancient genetic differences have been investigated same genes (33,35). Volpi et al.(33) identified a substitution using exome capture, allowing us to see a more complete that disrupted a splice site, resulting in an exon skip and a fra- view of our evolutionary history. meshift. Interestingly, Johnston et al.(36) were able to ident- ify variants in two different families (one non-sense, one Biological frameshifting insertion) without sequencing the probands, for which DNA was not available. These studies demonstrate Basic biology questions are also being investigated on a much the ability of genomic capture to discover different types of greater scale than previously possible using genome capture. novel variants important for human disease. Although the genetic information in DNA is frequently the In addition to custom capture studies, two whole exome initial focus of genome studies, epigenetic modification of studies have been recently reported. In the first, Choi et al. the DNA also plays an important role in the biological func- identified a novel coding variant in a consanguineous region tion of an organism. Two groups used genome capture with of an affected individual. The variant was a homozygous mis- padlock probes (19) or array-based capture (41) to investigate sense substitution in SLC26A3, a gene in which mutations are DNA methylation using bisulfite sequencing. Both studies Human Molecular Genetics, 2010, Vol. 19, Review Issue 2 R149 found this to be very accurate when compared with the stan- costs independent of sequencing, capture experiments focus dard capillary methods. The latter study also showed that sen- on subsets of the whole genome and will therefore always sitivity using array-based capture was high: 86–91% of require less sequencing. Thus, more capture experiments can targeted bases were covered by 10 or more reads. An be performed given a set amount of sequencing capacity. additional study focussed not on methylation status, but on Higher sample numbers result in higher power to detect vari- genetic variation at CpG sites, which are subject to a higher ation, a key metric for discovering causative variants, mutation rate via 5-methylcytosine deamination (17). Using especially for more common disorders. An argument in padlock probes, the researchers were able to determine geno- favor of whole-genome sequencing is that it is unwise to types for 65% of targeted bases. The accuracy was very high limit the data by doing capture experiments; it may be worth when compared with an independent genotype assessment. the additional cost to sequence ‘everything’. While this may These CpG region studies show that capture is useful to be true, if researchers are confident that the desired genome focus on the desired regions and is effective, even on difficult subset (linkage regions, CpG islands, genes of interest etc.) (high GC content) regions. is all they need to look at, more samples can be examined, Copy number variation (CNV) is another source of genetic and the data are limited to what is of interest. Data fatigue variation implicated in disease. The detection of copy number from attempting to interpret whole-genome sequence is not changes is often performed using low-resolution methods, insignificant. Will an investigator be able to pick out the such as array-comparative genomic hybridization and single important variants out of a list of millions of positions? nucleotide polymorphism (SNP) microarrays. Conrad et al. Although capture data can also contain large numbers of var- (42) have used targeted sequencing to capture breakpoint iants, the number is nearly two orders of magnitude lower than regions and identify the actual breaks with a high resolution. that from whole-genome sequence, making secondary ana- They were able to identify breakpoints for a number of lyses much less onerous. This is particularly important when known CNVs and were then able to classify the breaks into bioinformatics personnel and resources are limiting (annotat- likely repair mechanisms used. The authors point out that ing lists of hundreds of variants is possible to accomplish by this method is useful for CNVs in simpler regions, as repeat hand; doing so for tens of thousands variants is not). There- elements and complex genomic regions present challenges fore, it seems likely that targeted sequencing will be useful both for capture and post-sequence alignment. along side of whole-genome sequencing. Researchers will Capture is not only limited to genomic DNA. Several need to consider all aspects of a given project before deciding studies have used targeted sequencing to investigate RNA as on whether to proceed with whole genome or targeted sequen- well. One group used padlock probes to target regions contain- cing. Fortunately, ever decreasing sequencing costs may allow ing known RNA-editing sites (43). They were able to identify mixed approaches. Targeted sequencing has been shown to be sites in 10 of 13 known edited genes, by comparing captures of a robust, effective technique that leverages the unique aspects genomic DNA and cDNA from various tissues. The authors of massively parallel sequencing and has already yielded many chose 18 editing sites at random and confirmed 15 with capil- exciting new discoveries. lary sequencing. This research showed that padlock capture techniques work with cDNA and can be used to identify Conflict of Interest statement. None declared. sites of RNA editing. Hybridization capture was also shown to capture cDNA (44,45). In (44), the authors capture both cDNA and genomic DNA with an array-based method. They FUNDING then determine allele-specific expression using both data The authors are supported by the Intramural Research Program sets. In (45), the authors use solution hybridization to focus of the National Human Genome Research Institute. Funding to on enriching cDNA from a set of genes of interest. They pay the Open Access Charge was provided by the Intramural were able to effectively enrich these genes, suggesting that Research Program of the National Human Genome Research genes of low abundance could be detected without huge Institute, National Institutes of Health. increases in total sequencing. Interestingly, they were also able to identify gene fusions, including fusions in which one gene was not targeted. Applying targeted sequencing to REFERENCES cDNA is another way to focus on specific questions, even 1. Tarpey, P.S., Smith, R., Pleasance, E., Whibley, A., Edkins, S., Hardy, C., without whole-genome sequence. O’Meara, S., Latimer, C., Dicks, E., Menzies, A. et al. (2009) A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation. Nat. Genet., 41, 535–543. 2. Jones, S., Zhang, X., Parsons, D.W., Lin, J.C., Leary, R.J., Angenendt, P., FUTURE Mankoo, P., Carter, H., Kamiyama, H., Jimeno, A. et al. (2008) Core signaling pathways in human pancreatic cancers revealed by global One of the main reasons for performing a capture experiment genomic analyses. Science, 321, 1801–1806. is the significantly increased cost and time required for whole- 3. Garber, K. (2008) Fixing the front end. Nat. Biotechnol., 26, 1101–1104. genome sequencing. However, the constant improvements to 4. Summerer, D. (2009) Enabling technologies of genomic-scale sequence enrichment for targeted high-throughput sequencing. Genomics, 94, massively parallel sequencing technologies and the impending 363–368. massively parallel single-molecule sequencing technologies 5. Turner, E.H., Ng, S.B., Nickerson, D.A. and Shendure, J. (2009) Methods for will certainly reduce these cost and time barriers. One may genomic partitioning. Annu. Rev. Genomics Hum. Genet., 10, 263–284. wonder what role capture will play as whole-genome sequen- 6. Mamanova, L., Coffey, A.J., Scott, C.E., Kozarewa, I., Turner, E.H., cing is no longer impractical. Although capture has inherent Kumar, A., Howard, E., Shendure, J. and Turner, D.J. (2010) R150 Human Molecular Genetics, 2010, Vol. 19, Review Issue 2 Target-enrichment strategies for next-generation sequencing. Nat. 26. Ng, S.B., Turner, E.H., Robertson, P.D., Flygare, S.D., Bigham, A.W., Methods, 7, 111–118. Lee, C., Shaffer, T., Wong, M., Bhattacharjee, A., Eichler, E.E. et al. 7. Albert, T.J., Molla, M.N., Muzny, D.M., Nazareth, L., Wheeler, D., Song, X., (2009) Targeted capture and massively parallel sequencing of 12 human Richmond, T.A., Middle, C.M., Rodesch, M.J., Packard, C.J. et al. (2007) exomes. Nature, 461, 272–276. Direct selection of human genomic loci by microarray hybridization. Nat. 27. Bainbridge, M.N., Wang, M., Burgess, D.L., Kovar, C., Rodesch, M.J., Methods, 4, 903–905. D’Ascenzo, M., Kitzman, J., Wu, Y.Q., Newsham, I., Richmond, T.A. 8. Okou, D.T., Steinberg, K.M., Middle, C., Cutler, D.J., Albert, T.J. and et al. (2010) Whole exome capture in solution with 3 Gbp of data. Zwick, M.E. (2007) Microarray-based genomic selection for Genome Biol., 11, R62. high-throughput resequencing. Nat. Methods, 4, 907–909. 28. Pruitt, K.D., Harrow, J., Harte, R.A., Wallin, C., Diekhans, M., Maglott, D.R., 9. Hodges, E., Xuan, Z., Balija, V., Kramer, M., Molla, M.N., Smith, S.W., Searle, S., Farrell, C.M., Loveland, J.E., Ruef, B.J. et al. (2009) The consensus Middle, C.M., Rodesch, M.J., Albert, T.J., Hannon, G.J. et al. (2007) coding sequence (CCDS) project: Identifying a common protein-coding Genome-wide in situ exon capture for selective resequencing. Nat. Genet., gene set for the human and mouse genomes. Genome Res., 19, 39, 1522–1527. 1316–1323. 10. Hodges, E., Rooks, M., Xuan, Z., Bhattacharjee, A., Benjamin Gordon, D., 29. Chou, L.S., Liu, C.S., Boese, B., Zhang, X. and Mao, R. (2010) DNA Brizuela, L., Richard McCombie, W. and Hannon, G.J. (2009) Hybrid sequence capture and enrichment by microarray followed by selection of discrete genomic intervals on custom-designed microarrays for next-generation sequencing for targeted resequencing: neurofibromatosis massively parallel sequencing. Nat. Protoc., 4, 960–974. type 1 gene as a model. Clin. Chem., 56, 62–72. 11. Bau, S., Schracke, N., Kranzle, M., Wu, H., Stahler, P.F., Hoheisel, J.D., 30. Hoischen, A., Gilissen, C., Arts, P., Wieskamp, N., van der Vliet, W., Beier, M. and Summerer, D. (2009) Targeted next-generation sequencing Vermeer, S., Steehouwer, M., de Vries, P., Meijer, R., Seiqueros, J. et al. by specific capture of multiple genomic loci using low-volume (2010) Massively parallel sequencing of ataxia genes after array-based microfluidic DNA arrays. Anal. Bioanal. Chem., 393, 171–175. enrichment. Hum. Mutat., 31, 494–499. 12. Herman, D.S., Hovingh, G.K., Iartchouk, O., Rehm, H.L., Kucherlapati, R., 31. Raca, G., Jackson, C., Warman, B., Bair, T. and Schimmenti, L.A. (2010) Seidman, J.G. and Seidman, C.E. (2009) Filter-based hybridization capture of Next generation sequencing in research and diagnostics of ocular birth subgenomes enables resequencing and copy-number detection. Nat. Methods, defects. Mol. Genet. Metab., 100, 184–192. 6, 507–510. 32. Brkanac, Z., Spencer, D., Shendure, J., Robertson, P.D., Matsushita, M., 13. Summerer, D., Wu, H., Haase, B., Cheng, Y., Schracke, N., Stahler, C.F., Vu, T., Bird, T.D., Olson, M.V. and Raskind, W.H. (2009) IFRD1 is a Chee, M.S., Stahler, P.F. and Beier, M. (2009) Microarray-based candidate gene for SMNA on chromosome 7q22–q23. Am. J. Hum. multicycle-enrichment of genomic subsets for targeted next-generation Genet., 84, 692–697. sequencing. Genome Res., 19, 1616–1621. 33. Volpi, L., Roversi, G., Colombo, E.A., Leijsten, N., Concolino, D., 14. Lee, H., O’Connor, B.D., Merriman, B., Funari, V.A., Homer, N., Chen, Z., Calabria, A., Mencarelli, M.A., Fimiani, M., Macciardi, F., Pfundt, R. Cohn, D.H. and Nelson, S.F. (2009) Improving the efficiency of genomic loci et al. (2010) Targeted next-generation sequencing appoints c16orf57 as capture using oligonucleotide arrays for high throughput resequencing. BMC clericuzio-type poikiloderma with neutropenia gene. Am. J. Hum. Genet., Genomics, 10,646. 86, 72–76. 15. Gnirke, A., Melnikov, A., Maguire, J., Rogov, P., LeProust, E.M., 34. Nikopoulos, K., Gilissen, C., Hoischen, A., van Nouhuys, C.E., Boonstra, F.N., Brockman, W., Fennell, T., Giannoukos, G., Fisher, S., Russ, C. et al. Blokland, E.A., Arts, P., Wieskamp, N., Strom, T.M., Ayuso, C. et al. (2010) (2009) Solution hybrid selection with ultra-long oligonucleotides for Next-generation sequencing of a 40 Mb linkage interval reveals TSPAN12 massively parallel targeted sequencing. Nat. Biotechnol., 27, 182–189. mutations in patients with familial exudative vitreoretinopathy. Am.J.Hum. 16. Porreca, G.J., Zhang, K., Li, J.B., Xie, B., Austin, D., Vassallo, S.L., Genet., 86, 240–247. LeProust, E.M., Peck, B.J., Emig, C.J., Dahl, F. et al. (2007) Multiplex 35. Rehman, A.U., Morell, R.J., Belyantseva, I.A., Khan, S.Y., Boger, E.T., amplification of large sets of human exons. Nat. Methods, 4, 931–936. Shahzad, M., Ahmed, Z.M., Riazuddin, S., Khan, S.N. and Friedman, T.B. 17. Li, J.B., Gao, Y., Aach, J., Zhang, K., Kryukov, G.V., Xie, B., Ahlford, A., (2010) Targeted capture and next-generation sequencing identifies Yoon, J.K., Rosenbaum, A.M., Zaranek, A.W. et al. (2009) Multiplex padlock C9orf75, encoding taperin, as the mutated gene in nonsyndromic deafness targeted sequencing reveals human hypermutable CpG variations. Genome DFNB79. Am. J. Hum. Genet., 86, 378–388. Res., 19, 1606–1615. 36. Johnston, J.J., Teer, J.K., Cherukuri, P.F., Hansen, N.F., Loftus, S.K., 18. Turner, E.H., Lee, C., Ng, S.B., Nickerson, D.A. and Shendure, J. (2009) Chong, K., Mullikin, J.C. and Biesecker, L.G. (2010) Massively parallel Massively parallel exon capture and library-free resequencing across 16 sequencing of exons on the X chromosome identifies RBM10 as the gene genomes. Nat. Methods, 6, 315–316. that causes a syndromic form of cleft palate. Am. J. Hum. Genet., 86, 19. Deng, J., Shoemaker, R., Xie, B., Gore, A., LeProust, E.M., 743–748. Antosiewicz-Bourget, J., Egli, D., Maherali, N., Park, I.H., Yu, J. et al. 37. Burbano, H.A., Hodges, E., Green, R.E., Briggs, A.W., Krause, J., Meyer, M., (2009) Targeted bisulfite sequencing reveals changes in DNA methylation Good, J.M., Maricic, T., Johnson, P.L., Xuan, Z. et al. (2010) Targeted associated with nuclear reprogramming. Nat. Biotechnol., 27, 353–360. investigation of the Neandertal genome by array-based sequence capture. 20. Krishnakumar, S., Zheng, J., Wilhelmy, J., Faham, M., Mindrinos, M. and Science, 328, 723–725. Davis, R. (2008) A comprehensive assay for targeted multiplex 38. Yi, X., Liang, Y., Huerta-Sanchez, E., Jin, X., Cuo, Z.X.P., Pool, J.E., Xu, X., amplification of human DNA sequences. Proc. Natl Acad. Sci. USA, 105, Jiang, H., Vinckenbosch, N., Korneliussen, T.S. et al. (2010) Sequencing of 9296–9301. 50 human exomes reveals adaptation to high altitude. Science, 329, 75–78. 21. Briggs, A.W., Good, J.M., Green, R.E., Krause, J., Maricic, T., Stenzel, U., 39. Beall, C.M., Cavalleri, G.L., Deng, L., Elston, R.C., Gao, Y., Knight, J., Lalueza-Fox, C., Rudan, P., Brajkovic, D., Kucan, Z. et al. (2009) Targeted Li, C., Li, J.C., Liang, Y., McCormack, M. et al. (2010) Natural selection retrieval and analysis of five Neandertal mtDNA genomes. Science, 325, on EPAS1 (HIF2alpha) associated with low hemoglobin concentration in 318–321. Tibetan highlanders. Proc. Natl Acad. Sci. USA, 107, 11459–11464. 22. Tewhey, R., Warner, J.B., Nakano, M., Libby, B., Medkova, M., David, P.H., 40. Simonson, T.S., Yang, Y., Huff, C.D., Yun, H., Qin, G., Witherspoon, D.J., Kotsopoulos, S.K., Samuels, M.L., Hutchison, J.B., Larson, J.W. et al. (2009) Bai, Z., Lorenzo, F.R., Xing, J., Jorde, L.B. et al. (2010) Genetic evidence for Microdroplet-based PCR enrichment for large-scale targeted sequencing. high-altitude adaptation in Tibet. Science, 329, 72–75. Nat. Biotechnol., 27, 1025–1031. 41. Hodges, E., Smith, A.D., Kendall, J., Xuan, Z., Ravi, K., Rooks, M., 23. Ibrahim, S.F. and van den Engh, G. (2004) High-speed chromosome Zhang, M.Q., Ye, K., Bhattacharjee, A., Brizuela, L. et al. (2009) High sorting. Chromosome Res., 12, 5–14. definition profiling of mammalian DNA methylation by array capture and 24. Weise, A., Timmermann, B., Grabherr, M., Werber, M., Heyn, P., single molecule bisulfite sequencing. Genome Res., 19, 1593–1605. Kosyakova, N., Liehr, T., Neitzel, H., Konrat, K., Bommer, C. et al. 42. Conrad, D.F., Bird, C., Blackburne, B., Lindsay, S., Mamanova, L., Lee, C., (2010) High-throughput sequencing of microdissected chromosomal Turner, D.J. and Hurles, M.E. (2010) Mutation spectrum revealed by regions. Eur. J. Hum. Genet., 18, 457–462. breakpoint sequencing of human germline CNVs. Nat. Genet., 42, 385–391. 25. Choi, M., Scholl, U.I., Ji, W., Liu, T., Tikhonova, I.R., Zumbo, P., Nayir, A., 43. Li, J.B., Levanon, E.Y., Yoon, J.K., Aach, J., Xie, B., Leproust, E., Zhang, K., Bakkaloglu, A., Ozen, S., Sanjad, S. et al. (2009) Genetic diagnosis by whole Gao, Y. and Church, G.M. (2009) Genome-wide identification of human exome capture and massively parallel DNA sequencing. Proc. Natl Acad. Sci. RNA editing sites by parallel DNA capturing and sequencing. Science, 324, USA, 106, 19096–19101. 1210–1213. Human Molecular Genetics, 2010, Vol. 19, Review Issue 2 R151 44. Heap, G.A., Yang, J.H., Downes, K., Healy, B.C., Hunt, K.A., Bockett, N., 45. Levin, J.Z., Berger, M.F., Adiconis, X., Rogov, P., Melnikov, A., Fennell, T., Franke, L., Dubois, P.C., Mein, C.A., Dobson, R.J. et al. (2010) Genome-wide Nusbaum, C., Garraway, L.A. and Gnirke, A. (2009) Targeted next- analysis of allelic expression imbalance in human primary cells by generation sequencing of a cancer transcriptome enhances detection of high-throughput transcriptome resequencing. Hum. Mol. Genet., 19, sequence variants and novel fusion transcripts. Genome Biol., 10, 122–134. R115.

Journal

Human Molecular GeneticsPubmed Central

Published: Aug 12, 2010

There are no references for this article.