Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome

DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome Vol 456 |6 November 2008 |doi:10.1038/nature07485 ARTICLES DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome 1,2,3,4 2,3 2,3 3 3 3 3 Timothy J. Ley *, Elaine R. Mardis *, Li Ding , Bob Fulton , Michael D. McLellan , Ken Chen , David Dooling , 3 3 3 3 3 3 Brian H. Dunford-Shore , Sean McGrath , Matthew Hickenbotham , Lisa Cook , Rachel Abbott , David E. Larson , 3 3 3 3 3 3 3,8 Dan C. Koboldt , Craig Pohl , Scott Smith , Amy Hawkins , Scott Abbott , Devin Locke , LaDeana W. Hillier , 3 3 2,3 3 3 3 Tracie Miner , Lucinda Fulton , Vincent Magrini , Todd Wylie , Jarret Glasscock , Joshua Conyers , 3 3 3 3 8 3 1 Nathan Sander , Xiaoqi Shi , John R. Osborne , Patrick Minx , David Gordon , Asif Chinwalla , Yu Zhao , 1 5 1,4 1,4 3,4,5 6 Rhonda E. Ries , Jacqueline E. Payton , Peter Westervelt , Michael H. Tomasson , Mark Watson , Jack Baty , 4,7 1,4 1,4 4,5 1,4 Jennifer Ivanovich , Sharon Heath , William D. Shannon , Rakesh Nagarajan , Matthew J. Walter , 1,4 1,4 1,4 2,3,4 Daniel C. Link , Timothy A. Graubert , John F. DiPersio & Richard K. Wilson Acute myeloid leukaemia is a highly malignant haematopoietic tumour that affects about 13,000 adults in the United States each year. The treatment of this disease has changed little in the past two decades, because most of the genetic events that initiate the disease remain undiscovered. Whole-genome sequencing is now possible at a reasonable cost and timeframe to use this approach for the unbiased discovery of tumour-specific somatic mutations that alter the protein-coding genes. Here we present the results obtained from sequencing a typical acute myeloid leukaemia genome, and its matched normal counterpart obtained from the same patient’s skin. We discovered ten genes with acquired mutations; two were previously described mutations that are thought to contribute to tumour progression, and eight were new mutations present in virtually all tumour cells at presentation and relapse, the function of which is not yet known. Our study establishes whole-genome sequencing as an unbiased method for discovering cancer-initiating mutations in previously unidentified genes that may respond to targeted therapies. We used massively parallel sequencing technology to sequence the AML refers to a group of clonal haematopoietic malignancies that genomic DNA of tumour and normal skin cells obtained from a patient predominantly affect middle-aged and elderly adults. An estimated with a typical presentation of French–American–British (FAB) subtype 13,000 people will develop AML in the United States in 2008, and M1 acute myeloid leukaemia (AML) with normal cytogenetics. For the 8,800 will die from it . Although the life expectancy from this disease tumour genome, 32.7-fold ‘haploid’ coverage (98 billion bases) was has increased slowly over the past decade, the improvement is pre- dominantly because of improvements in supportive care—not in the obtained, and 13.9-fold coverage (41.8 billion bases) was obtained for the normal skin sample. Of the 2,647,695 well-supported single drugs or approaches used to treat patients. nucleotide variants (SNVs) found in the tumour genome, 2,584,418 For most patients with a ‘sporadic’ presentation of AML, it is not yet (97.6%) were also detected in the patient’s skin genome, limiting the clear whether inherited susceptibility alleles have a role in the patho- number of variants that required further study. For the purposes of this genesis . Furthermore, the nature of the initiating or progression initial study, we restricted our downstream analysis to the coding mutations is for the most part unknown . Recent attempts to identify sequences of annotated genes: we found only eight heterozygous, additional progression mutations by extensively re-sequencing tyro- non-synonymous somatic SNVs in the entire genome. All were new, sine kinase genes yielded very few previously unidentified mutations, 4,5 including mutations in protocadherin/cadherin family members and most were not recurrent . Expression profiling studies have (CDH24 and PCLKC (also known as PCDH24)), G-protein-coupled yielded signatures that correlate with specific cytogenetic subtypes of 6–8 receptors (GPR123 and EBI2 (also known as GPR183)), a protein AML, but have not yet suggested new initiating mutations .Recent phosphatase (PTPRT), a potential guanine nucleotide exchange factor studies using array-based comparative genomic hybridization and/or (KNDC1), a peptide/drug transporter (SLC15A1) and a glutamate single nucleotide polymorphism (SNP) arrays, although identifying 9,10 receptor gene (GRINL1B). We also detected previously described, important gene mutations in acute lymphoblastic leukaemia have recurrent somatic insertions in the FLT3 and NPM1 genes. On the revealed very few recurrent submicroscopic somatic copy number basis of deep readcount data, we determined that all of these mutations variants in AML (M.J.W., manuscript in preparation, and refs 11– (except FLT3) were present in nearly all tumour cells at presentation 13). Together, these studies suggest that we have not yet discovered and again at relapse 11 months later, suggesting that the patient had a most of the relevant mutations that contribute to the pathogenesis of single dominant clone containing all of the mutations. These results AML. We therefore believe that unbiased whole-genome sequencing demonstrate the power of whole-genome sequencing to discover new will be required to identify most of these mutations. Until recently, this cancer-associated mutations. approach has not been feasible because of the high cost of conventional 1 2 3 4 5 6 Department of Medicine, Department of Genetics, The Genome Center at Washington University, Siteman Cancer Center, Department of Pathology and Immunology, Division 7 8 of Biostatistics, and Department of Surgery, Washington University School of Medicine, St. Louis, Missouri 63108, USA. Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA. *These authors contributed equally to this work. Macmillan Publishers Limited. All rights reserved © 2008 NATURE |Vol 456 |6 November 2008 ARTICLES capillary-based approaches and the large numbers of primary tumour To determine whether the tumour cells of 933124 were typical of cells required to yield the necessary genomic DNA. ‘Next-generation’ M1 AML, we compared the expression signatures of 111 de novo AML sequencing approaches, however, have changed this landscape. cases using unsupervised clustering (Ward’s method, see Supple- Our group has pioneered the use of whole-genome re-sequencing mentary Information). The expression profile of patient 933124 clustered with multiple other M1 (and M2) AML cases with normal and variant discovery approaches using the Illumina/Solexa techno- cytogenetics, suggesting that the genetic events underlying the patho- logy with the genome of the nematode worm Caenorhabditis elegans as genesis of this case are similar to those of other cases exhibiting normal a proof-of-principle . This approach has distinct advantages in cytogenetics (Supplementary Fig. 3). reduced cost, a markedly increased data production rate, and a low input requirement of DNA for library construction. In the present Coverage depth of the tumour and skin genomes study, we used a similar approach to sequence the tumour genome Because most of the acquired mutations in cancer genomes have been of a single AML patient and the matched normal genome (derived shown to be heterozygous, the complete sequencing of a cancer gen- from a skin biopsy) of the same patient. After alignment to the human ome requires the detection of both alleles at most positions in the reference genome, sequence variants were discovered in the tumour genome . We therefore designed sequence coverage metrics to define genome and compared to the patient’s normal sequence, to the dbSNP the point at which 90% diploid coverage had been reached. To min- database, and to variants recently reported for two other human gen- 15,16 imize errors associated with any single platform or measurement, omes ; revealing new single nucleotide and small insertion/deletion diploid coverage for this genome was assessed using a set of high- (indel) variants genome-wide. Somatic mutations were detected in quality SNPs derived from two different SNP array platforms, genes not previously implicated in AML pathogenesis, demonstrating Affymetrix 6.0 and Illumina Infinium 550K. For a SNP to be included the need for unbiased whole-genome approaches to discover all muta- in the high-quality set, the following criteria had to be satisfied: (1) tions associated with cancer pathogenesis. identical genotypes were called from both assays at the same genomic positions, and (2) the resulting genotype was heterozygous. For the Rationale for using the FAB M1 AML subtype for sequencing 933124 tumour genome, 46,494 heterozygous SNPs passed the above Of the eight FAB subtypes of AML, M1 AML is one of the most criteria and were defined as high-quality SNPs. For the skin samples, common (,20% of all cases). No specific cytogenetic abnormalities 46,572 high-quality SNPs were defined. or somatic initiating mutations have been identified for this subtype; We performed 98 full runs on the Illumina Genome Analyser to in fact, about half of the patients with de novo M1 AML have normal achieve the targeted level of 90% diploid coverage as determined by 17–19 cytogenetics . The frequency of well-described progression muta- coverage of the high-quality SNP set. Maq was used to perform tions (for example, activating alleles of FLT3, KIT and RAS) is similar alignment, determine consensus, and identify SNVs within the 98 to that of other common FAB subtypes . We therefore decided to billion bases generated from the tumour genome (see Table 1). Maq sequence the genome of tumour cells derived from a patient with M1 predicted a total of 3.81 million SNVs (Maq SNP quality$ 15) in the AML, because so little is known about the molecular pathogenesis of tumour genome, including matching heterozygous genotypes for this common subtype. The criteria used to select the sample are out- 91.2% of the 46,494 high-quality SNPs. When we lowered the Maq lined in Supplementary Information. SNP quality cutoff to 0, 94.06% high-quality SNPs were predicted. Further investigation of Maq alignments revealed coverage for both Case presentation of UPN 933124 alleles at a further 5.38% of the high-quality SNPs, but Maq did not The case presentation is described in detail in the Supplementary predict a SNP or matching heterozygous genotype owing to insuf- Information. In brief, a previously healthy woman in her mid-50s ficient depth or quality of coverage. Extra analysis revealed coverage presented suddenly with fatigue and easy bruisability, and was found at 46,484 of 46,494 high-quality SNPs for at least one allele (that is, to have a peripheral white blood cell count of 105,000 cells per micro- 99.98% haploid coverage for the tumour genome). litre, with 85% myeloblasts. A bone marrow examination revealed We sequenced the genome of normal skin cells from the same 100% myeloblasts with morphological features and cell surface mar- patient to enable the identification of inherited sequence variants kers consistent with FAB M1 AML (Supplementary Fig. 1). in the tumour genome. Our targeted diploid coverage goal for the Cytogenetic analysis of tumour cells revealed a normal 46,XX karyo- skin-derived genome was 80%. We achieved this goal with only 34 type. Although the patient experienced a complete remission with Solexa runs (41.8 billion bases), using improved reagents and longer conventional therapies, she relapsed at 11 months and expired read lengths to attain 82.6% diploid and 84.2% haploid coverage 24 months after her initial diagnosis was made. At relapse, the bone (Table 1). marrow had 78% myeloblasts, and contained a new clonal cytoge- To begin evaluating the quantity and quality of the detected netic abnormality, t(10; 12) (p12; p13). Informed consent for whole- sequence variants in the tumour and skin genomes, we compared genome sequencing was subsequently obtained from her next of kin. the overlap and uniqueness of this genome’s variants with respect to the James D. Watson and J. Craig Venter genomes, and to dbSNP A typical M1 AML diploid genome and expression profile (v127; Fig. 1). Of the 3.68 million single nucleotide variants (SNVs; The tumour sample from patient 933124 contained no somatic copy Maq SNP quality$15, excluding SNVs found on chromosome X) number changes at a resolution of ,5 kb (further confirmed on the predicted by Maq in the tumour genome, 2.36 million were present in NimbleGen 2.1M array platform, data not shown), and no evidence dbSNP, 2.36 million were detected in the skin genome (Fig. 1a), of copy number neutral loss-of-heterozygosity (LOH), indicating 1.50 million were detected in the Venter genome, and 1.58 million that the genome was essentially diploid at this level of resolution were found in the Watson genome (Fig. 1b). Ultimately, 1.70 million (see Supplementary Fig. 2). Further analysis of the 933124-derived SNVs were unique to the 933124 tumour genome. On filtering the tumour and skin samples showed 26 inherited copy number variants 933124 SNVs at different Maq quality values to determine the (that is, detected in both the tumour and skin samples). All but two of stability of results, we observed that the proportion of 933124 these had been previously reported in the Database of Genomic SNVs that also are in dbSNP increases from 63.9% to 69.48% when Variants (see Supplementary Table 1). All of the copy number var- the Maq quality threshold score increases from 15 to 30, as expected. iants detected in this genome were found in at least one other AML Refining the detection of potential somatic mutations patient (89 other cases, mostly Caucasian, have been queried using the same SNP array platform), and all but one were found in at least Because the number of sequence variants initially detected by Maq one of the 160 Caucasian HapMap and Coriell samples that were was high, we developed improved filtering tools to effectively sepa- studied on the same array platform (Supplementary Table 1). rate true variants from false positives. To this end, we generated an Macmillan Publishers Limited. All rights reserved © 2008 ARTICLES NATURE |Vol 456 |6 November 2008 Table 1 | Tumour and skin genome coverage from patient 933124 Tumour Skin Libraries 43 Runs 98 34 Reads obtained 5,858,992,064 2,122,836,148 Reads passing quality filter 3,025,923,365 1,228,177,690 Bases passing quality filter 98,184,511,523 41,783,794,834 Reads aligned by Maq 2,729,957,053 1,080,576,680 Reads unaligned by Maq 295,966,312 138,276,594 SNVs detected with respect to hg18 (no Y) 3,811,115 2,918,446 SNVs (chr 1–22) detected with respect to hg18 3,681,968 (100.0%) 2,830,292 (100.0%) SNVs also present in dbSNP 2,368,458 (64.3%) 2,161,695 (76.4%) SNVs also present in Venter genome 1,499,010 (40.7%) 1,383,431 (48.9%) SNVs also present in Watson genome 1,573,435 (42.7%) 1,456,822 (51.5%) SNVs not in dbSNP/Venter/Watson 1,223,830 (33.2%) 591,131 (20.9%) SNVs not in dbSNP/Venter/Watson/skin 925,200 (25.1%) 2 HQ SNPs 46,494 (100.0%) 46,572 (100.0%) HQ SNPs where reference allele is detected 42,419 (91.2%) 38,454 (82.6%) HQ SNPs where variant allele is detected 43,164 (92.9%) 39,220 (84.2%) HQ SNPs where both alleles are detected 42,415 (91.2%) 38,454 (82.6%) Assessments are shown of the haploid and diploid coverage of the tumour and skin genomes from AML patient 933124. Chr, chromosome; hg18, human genome version 18; HQ, high quality. experimental data set by re-sequencing Maq-predicted SNVs, ran- This approach identified parameters that separated true variants domly selecting a training subset and a test data set, whose annota- from false positives, revealing that SNV-supporting read counts tions and features were submitted to Decision Tree C4.5 (ref. 22). (unique on the basis of read start position and base position in supporting reads), base quality and Maq quality scores are chief determinants for identifying false positives. Implementing rules obtained from the Decision Tree analysis resulted in 91.9% sensitivity and 83.5% specificity for validated SNVs. Identification of somatic mutations in coding sequences 933124 Venter The patient had 3,813,205 sequence variants in her tumour genome, as defined by Maq scores of.15 (Table 1). Of these, 2,647,695 were supported by the Decision Tree analysis in the tumour genome, of which 2,584,418 (97.6%) were also detected in the skin genome (Fig. 2). The detailed algorithm for selecting putative somatic var- iants is described in Supplementary Information. Most of the 63,277 tumour-specific variants we detected were either present in dbSNP or were previously described in the Watson or Venter genomes (31,645), or occurred in non-genic regions (20,440). A total of 11,192 variants were located within the boundaries of annotated Watson 3,813,205 tumour SNVs (Maq15) Skin 2,647,695 well supported SNVs (decision tree) 2,584,418 present Tumour 63,277 tumour-specific SNVs in skin (SNPs) 31,645 in dbSNP/ Watson/Venter 31,632 new SNVs 20,440 in non-genic regions 11,192 SNVs in genic regions 10,735 intronic 216 in UTR 241 SNVs in coding sequence 60 synonymous dbSNP 7 unable to 181 SNVs predicted to alter gene function be validated (non-synonymous and splice junctions) (technical failures) 14 validated 8 validated as somatic 152 validated Figure 1 | Overlap of SNPs detected in 933124 and other genomes. a, Venn as germline as wild type SNVs (acquired mutations) diagram of the overlap between SNPs detected in the 933124 tumour SNVs (SNPs) (false positives) genome and the genomes of J. D. Watson and J. C. Venter. b, Venn Diagram of the overlap among the 933124 tumour genome, the skin genome and Figure 2 | Filters used to identify somatic point mutations in the tumour dbSNP (ver. 127). SNVs were defined with a Maq SNP quality$15. genome. See text for details. UTR, untranslated regions. Macmillan Publishers Limited. All rights reserved © 2008 NATURE |Vol 456 |6 November 2008 ARTICLES genes; 216 of these variants were in untranslated regions, and 10,735 100 were in introns (but not involving splice junctions) and were not Primary tumour explored further in our analysis. Of the coding sequence variants, 60 Relapse tumour were synonymous, and not further evaluated. The remaining 181 Skin variants were either non-synonymous, or were predicted to alter splice site function. By sequencing polymerase chain reaction (PCR)-generated amplicons from the tumour and skin samples * * * (and also from the relapse tumour sample obtained 11 months after * * * * * * * * the original presentation), we determined that 152 of these variants were false positive (that is, wild type) calls, 14 were inherited SNPs, * and eight were somatic mutations in both the original tumour and the relapse sample (Table 2). Seven variants could not be validated, either because the regions involved were repetitive, or because all attempts to obtain PCR amplicons failed. All of the PCR-amplified exons from the eight genes containing validated somatic mutations were sequenced in 187 further cases of AML using samples from our discovery and validation sets ; no further somatic mutations were detected in these genes (data not shown). A description of how we Figure 3 | Summary of Roche/454 FLX readcount data obtained for ten estimated the false negative (12.45%) and false positive (0.06%) rates somatic mutations and two validated SNPs in the primary tumour, relapse for SNVs over the entire genome is presented in Supplementary tumour and skin specimens. The readcount data for the variant alleles in the Information. Using these estimates, we can predict that very few primary tumour sample and relapse tumour sample are statistically different somatic, non-synonymous variants were missed by our analysis of from that of the skin sample for all mutations (P, 0.000001 for all mutations, Fisher’s exact test, denoted by a single asterisk in all cases). Note this deeply covered genome. that the normal skin sample was contaminated with leukaemic cells containing the somatic mutations. The patient’s white blood cell count was Defining mutation frequencies in the tumour sample 105,000 (85% blasts) when the skin punch biopsy was obtained. To better define the percentage of tumour cells that contained each of the discovered somatic mutations, we amplified each mutation- tumour variants to move forward in the discovery pipeline if they containing locus from non-amplified genomic DNA derived from were detected at a low frequency (two or fewer reads) in the skin the de novo and relapse tumour samples, and from the skin biopsy sample, as defined by a binomial test. obtained at presentation. The resulting amplicons were sequenced using the Roche/454 FLX platform, and the frequency of reads con- Detecting insertions and deletions (indels) taining the reference and variant alleles were defined (Fig. 3 and To discover small indels (,6 bp) from sequence reads (32–35 bp Table 3). Control amplicons containing a known heterozygous long), we started with a set of 236 million reads that were not con- SNP in BRCA2 (encoding N372H) and a homozygous SNP in fidently aligned by Maq to the reference genome. We applied TP53 (encoding P72R) were analysed similarly. The BRCA2 SNP Cross_Match and BLAT to identify gapped alignments that are unique yielded ,50% variant frequencies in the tumour and skin samples, in the genome. To detect indels longer than 6 bp, we developed a ‘split whereas nearly 100% of the TP53 alleles were variant in all three reads’ algorithm (see Supplementary Information) that aligns sub- samples, as expected. Remarkably, all eight somatic SNVs were segments of reads independently to the genome, and computes a detected at ,50% frequencies in the primary tumour sample mapping quality for the derived gapped alignment on the basis of (100% blasts), and at ,40% frequencies in the relapse sample the number of hits and the quality of the bases. These efforts resulted (78% blasts; if the variant frequencies are corrected for blast in the identification of 726 putative small indels (1 to 30 bp in size) counts—that is, multiplied by 1.28—the frequencies at relapse also that occur in coding exons, 393 of which (54.2%) were found in were ,50%). The NPMc (cytoplasmic nucleophosmin) mutation dbSNP. After manual review, we selected a set of 28 putative somatic was also detected at a frequency of ,50%, but the FLT3 internal coding indels for validation using PCR-based dye terminator sequen- tandem duplication (ITD) allele was only detected in 35.1% of the cing. Of these putative indels, 22 were validated but were found pre- 454 reads at diagnosis and 31.3% at relapse, suggesting that the sent in both tumour and skin (15 of these were in dbSNP), two were mutation was not present in all tumour cells at diagnosis or relapse. false positive calls, two had no coverage, and two were previously Notably, the variant alleles also were detected at frequencies of validated somatic insertions in NPM1 (4 bp) and FLT3 (30 bp). ,5–13% in the skin sample. In retrospect, it is clear that the skin sample contained contaminating leukaemic cells, because the Discussion patient’s white blood cell count at presentation was 105,000 per microlitre, with 85% blasts. This information was used to inform Here we describe the sequencing and analysis of a primary human the Decision Tree analysis described above: we allowed high-quality cancer genome using next-generation sequencing technology. Our Table 2 | Non-synonymous somatic mutations detected in the AML sample Gene Consequence Type Solexa tumour reads Solexa skin reads Conservation score of Mutations in other AML WT:variant WT:variant mutant base cases* CDH24 Y590X Nonsense 9:916:00.998 0/187 SLC15A1 W77X Nonsense 15:12 19:01.000 0/187 KNDC1 L799F Missense 7:820:0 NA 0/187 PTPRT P1235L Missense 9:13 16:01.000 0/187 GRINL1B R176H Missense 15:10 14:0 NA 0/187 GPR123 T38I Missense 11:11 13:0 NA 0/187 EBI2 A338V Missense 7:12 18:2 1.000 0/187 PCLKC P1004L Missense 19:9 15:1 0.98 0/187 FLT3 ITD Indel 18:12 8:0 NA 51/185 NPM1 CATG ins Indel 36:633:0 NA 43/180 Ins, insertion; WT, wild type. * Patient cohort defined in ref. 23. Macmillan Publishers Limited. All rights reserved © 2008 CDH24 SLC15A1 KNDC1 PTPRT GRINL1B GPR123 EB12 PCLKC FLT3 NPM1 BRCA2 TP53 Variant (%) ARTICLES NATURE |Vol 456 |6 November 2008 Table 3 | 454 Readcount data for somatic mutations and known SNPs Primary AML (100% blasts) Skin Relapse (78% blasts) Gene Consequence Variant Ref Variant (%) Variant Ref Variant (%) Variant Ref Variant (%) CDH24 Y590X 5672 4890 53.70 564 10358 5.16 3108 4599 40.33 SLC15A1 W77X 3817 4962 43.48 875 10773 7.51 4714 7173 39.66 KNDC1 L799F 4640 4848 48.90 770 8972 7.90 3883 6342 37.98 PTPRT P1235L 998 1058 48.54 126 1489 7.80 350 493 41.52 GRINL1B R176H 2211 2674 45.26 318 4461 6.65 1447 2070 41.14 GPR123 T38I 4618 4569 50.27 850 9751 8.02 3660 6057 37.67 EBI2 A338V 12750 15453 45.21 458 10088 4.34 2646 3627 42.18 PCLKC P1004L 992 855 53.71 341 3153 9.76 705 773 47.70 FLT3 ITD 4220 7810 35.08 3475 23159 13.05 3870 8495 31.30 NPM1 CATG ins 1550 1974 43.98 143 2390 5.65 2303 3910 37.07 BRCA2 N372H 778 752 50.85 763 876 46.55 285 303 48.47 TP53 P72R 8989 1 99.99 8161 0 100.00 7914 6 99.92 The differences between variant frequencies in primary or relapse tumour samples and skin were highly significant for all somatic mutations (P, 0.000001, Fisher’s exact test, one tailed). The BRCA2 variant is a known heterozygous SNP in this genome, and the TP53 variant is a known homozygous SNP. patient’s tumour genome was essentially diploid, and contained ten PTPRT, CDH24, PCLKC and SLC15A1). The other four somatic non-synonymous somatic mutations that may be relevant for her mutations occurred in genes not previously implicated in cancer disease. These mutations affect genes participating in several well- pathogenesis, but whose potential functions in metabolic pathways described pathways that are known to contribute to cancer patho- suggest mechanisms by which they could act to promote cancer genesis, but most of these genes would not have been candidates for (including KNDC1, GPR123, EBI2 and GRINL1B). We speculate directed re-sequencing on the basis of our current understanding of about the roles of these mutations for the pathogenesis of this cancer. Hence, these results justify the use of next-generation whole- patient’s disease in Supplementary Information. genome sequencing approaches to reveal somatic mutations in can- The importance of the eight newly defined somatic mutations for cer genomes. AML pathogenesis is not yet known, and will require functional As we demonstrated in our re-sequencing of the genome of the C. validation studies in tissue culture cells and mouse models to assess elegans N2 Bristol strain , and again in this study, massively parallel their relevance. Even though we could not detect recurrent mutations short-read sequencing provides an effective method for examining in the limited AML sample set that we surveyed, several lines of single nucleotide and short indel variants by comparison of the aligned evidence suggest that these mutations may not be random, ‘passen- reads to a reference genome sequence. By sequencing our patient’s ger’ mutations. First, somatic mutations in this genome are extremely tumour genome to a depth of.30-fold coverage, and gauging our rare. The rarity of somatic variants, and the normal diploid structure ability to detect known heterozygous positions across the genome, of the tumour genome, argues strongly against genetic instability or we have produced a sufficient depth and breadth of sequence coverage DNA repair defects in this tumour. Conceptually, this result is further to comprehensively discover somatic genome variants. A slightly lower supported by the very small number of somatic mutations discovered 4,5 coverage of the normal genome from this individual helped to identify in the expressed tyrosine kinases of AML samples ; genetic insta- nearly 98% of potential variants as being inherited, a critical filter that bility does not seem to be a general feature of AML genomes. allowed us to more readily identify the true somatic mutations in this Second, on the basis of the equivalent frequencies of the variant tumour. Our results strongly support the notion that hypothesis- and wild-type alleles for the mutations in the tumour genome (except driven (for example, candidate gene-based) examination of tumour for FLT3 ITD), it is highly probable that all the mutations are het- genomes by PCR-directed or capture-based methods is inherently erozygous, and are present in virtually all of the tumour cells (Fig. 3). limited, and will miss key mutations. A further and important consid- The latter suggests that these mutations may have all been selected for eration is the demand for large amounts of genomic DNA by these and retained because they are important for disease pathogenesis in techniques; this is a serious limitation when precious clinical samples this patient. Alternatively, all may have occurred simultaneously in are being studied. The Illumina/Solexa technology requires only,1 mg the same leukaemia-initiating cell, but only a subset of the mutations of DNA per library, enabling the study of primary tumour DNA rather (or an as-yet undetected mutation) is truly important for pathoge- than requiring the use of tumour cell lines, which may contain genetic nesis (that is, disease ‘drivers’ versus passengers). Although we sug- changes and adaptations required for immortalization and mainten- gest that the latter hypothesis is very unlikely on the basis of our ance in tissue culture conditions. current understanding of tumour progression, many more AML A total of ten non-synonymous somatic mutations were identified genomes will need to be sequenced to resolve this issue. in this patient’s tumour genome. Two are well-known AML-associated Third, the same mutations were detected in tumour cells in the mutations, including an internal tandem duplication of the FLT3 relapse sample at approximately the same frequencies as in the prim- receptor tyrosine kinase gene, which constitutively activates kinase ary sample. All of these mutations were therefore present in the 5,24,25 signalling, and portends a poor prognosis , and a four-base inser- resistant tumour cells that contributed to the patient’s relapse, fur- 26–28 tion in exon 12 of the NPM1 gene (NPMc) . Both of these mutations ther suggesting that a single clone contains all ten mutations. Fourth, are common (25–30%) in AML tumours, and are thought to contri- seven of the ten genes containing somatic mutations were detectably bute to progression of the disease rather than to cause it directly . expressed in the tumour sample. FLT3 and NPM1 messenger RNAs Notably, the frequency of the mutant FLT3 allele in the primary and were highly expressed in this tumour sample, as they are in virtually relapse tumour samples (35.08% and 31.30%, respectively) was all AML samples. We detected mRNA from the CDH24, SLC15A1 significantly less than that of the other nine mutations (P, 0.000001 and EBI2 genes on the Affymetrix expression array, whereas express- for both the primary and relapse samples). These data suggest that the ion of GRINL1B and PCLKC were detected by PCR with reverse FLT3 ITD may not have been present in all tumour cells, and further, transcription (RT–PCR; data not shown). Expression of KNDC1, that it may have been the last mutation acquired. PTPRT and GPR123 was not detected by either approach, but we The other eight somatic mutations that we detected are all single cannot rule out expression of these genes in a small subset of tumour base changes, and none has previously been detected in an AML cells (for example, leukaemia-initiating cells). Furthermore, for the genome. Four of the genes affected, however, are in gene families five point mutations where data are available, the mutated base is that are strongly associated with cancer pathogenesis (including highly conserved across multiple species (Table 2). Macmillan Publishers Limited. All rights reserved © 2008 NATURE |Vol 456 |6 November 2008 ARTICLES leukemia with normal cytogenetics: are we ready for a prognostically prioritized Although we performed whole-genome sequencing on this cancer molecular classification? Blood 109, 431–448 (2007). sample, we restricted our initial validation studies to the 1–2% of the 4. Loriaux, M. M. et al. High-throughput sequence analysis of the tyrosine kinome in genome that encodes genes. This raises the issue of whether sequen- acute myeloid leukemia. Blood 111, 4788–4796 (2008). cing the complementary DNA transcriptome of this tumour would 5. Tomasson, M. H. et al. Somatic mutations and germline sequence variants in the have been a faster, cheaper and more efficient way of finding the expressed tyrosine kinase genes of patients with de novo acute myeloid leukemia. Blood 111, 4797–4808 (2008). mutations. Although this approach will undoubtedly be an import- 6. Schoch, C. et al. Acute myeloid leukemias with reciprocal rearrangements can be ant adjunct to whole-genome sequencing, there are several advan- distinguished by specific gene expression profiles. Proc. Natl Acad. Sci. USA 99, tages to the approach we used: (1) coverage models for whole- 10008–10013 (2002). genome libraries are at present better understood than for cDNA 7. Bullinger, L. et al. Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. N. Engl. J. Med. 350, 1605–1616 libraries, where transcript abundance can vary over many orders of (2004). magnitude; (2) even if the transcriptome had been sequenced, 8. Valk, P. J. et al. Prognostically useful gene-expression profiles in acute myeloid extensive characterization of the normal genome would have been leukemia. N. Engl. J. Med. 350, 1617–1628 (2004). required to distinguish inherited variants from somatic mutations; 9. Mullighan, C. G. et al. Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia. Nature 446, 758–764 (2007). and (3) relevant non-synonymous mutations could be missed by 10. Mullighan, C. G. et al. BCR–ABL1 lymphoblastic leukaemia is characterized by the cDNA sequencing, including mutations that result in RNA instability deletion of Ikaros. Nature 453, 110–114 (2008). (splice variants, nonsense mutations), and/or mutations in genes 11. Raghavan, M. et al. Genome-wide single nucleotide polymorphism analysis expressed at low levels, or in only a small subset of tumour cells. reveals frequent partial uniparental disomy due to somatic recombination in The additional non-coding and non-genic somatic variants in this acute myeloid leukemias. Cancer Res. 65, 375–378 (2005). 12. Paulsson, K. et al. High-resolution genome-wide array-based comparative genome (which we presently estimate at 500–1,000 on the basis of our genome hybridization reveals cryptic chromosome changes in AML and MDS calculated false positive and negative rates for non-synonymous cases with trisomy 8 as the sole cytogenetic aberration. Leukemia 20, 840–846 mutations), will provide a rich source of potentially relevant (2006). sequence changes that will be better understood as more cancer gen- 13. Rucker, F. G. et al. Disclosure of candidate genes in acute myeloid leukemia with complex karyotypes using microarray-based molecular characterization. J. Clin. omes are sequenced. Oncol. 24, 3887–3894 (2006). In summary, we have successfully used a next-generation whole- 14. Hillier, L. W. et al. Whole-genome sequencing and variant discovery in C. elegans. genome sequencing approach to identify new candidate genes that Nat. Methods 5, 183–188 (2008). may be relevant for AML pathogenesis. We cannot overemphasize 15. Wheeler, D. A. et al. The complete genome of an individual by massively parallel the importance of parallel sequencing of the patient’s normal genome DNA sequencing. Nature 452, 872–876 (2008). 16. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, to determine which variants were inherited; the identification of the e254 (2007). true somatic mutations in this tumour genome would not have been 17. Byrd, J. C. et al. Pretreatment cytogenetic abnormalities are predictive of induction feasible without this approach. Furthermore, until hundreds (or per- success, cumulative incidence of relapse, and overall survival in adult patients haps thousands) of normal genomes and other AML tumours are with de novo acute myeloid leukemia: results from Cancer and Leukemia Group B (CALGB 8461). Blood 100, 4325–4336 (2002). sequenced, the contextual relevance of the mutations found in this 18. Grimwade, D. et al. The importance of diagnostic cytogenetics on outcome in genome will be unknown. Nevertheless, the somatic mutations that AML: analysis of 1,612 patients entered into the MRC AML 10 trial. The Medical we did find were neither predicted by the curation of previously Research Council Adult and Children’s Leukaemia Working Parties. Blood 92, defined cancer genes, nor by the study of this tumour using unbiased, 2322–2333 (1998). high-resolution array-based genomic approaches. For AML and 19. Mrozek, K., Heerema, N. A. & Bloomfield, C. D. Cytogenetics in acute leukemia. Blood Rev. 18, 115–136 (2004). other types of cancer, whole-genome sequencing may therefore be 20. Wendl, M. C. & Wilson, R. K. Aspects of coverage in medical DNA sequencing. the only effective means for discovering all of the mutations that are BMC Bioinformatics 9, 239 (2008). relevant for pathogenesis. 21. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res doi:10.1101/gr.078212.108 (in METHODS SUMMARY the press). 22. Quinlan, J. R. C4.5:Programs for Machine Learning 302 (Morgan Kaufmann Sequence end reads (average length for tumour genome, 32 bp, and for skin, Publishers, 1993). 35 bp) were generated from Illumina/Solexa fragment libraries derived from the 23. Link, D. C. et al. Distinct patterns of mutations occurring in de novo AML versus tumour or skin cells of patient 933124, using the Illumina Genome Analyser. The AML arising in the setting of severe congenital neutropenia. Blood 110, 1648–1655 analysed reads were aligned to the human reference genome (NCBI Build 36) (2007). using Maq . Coverage of the tumour and normal genomes was ascertained by 24. Frohling, S. et al. Identification of driver and passenger mutations of FLT3 by high- comparison to the patient’s heterozygous SNPs, established by compiling shared throughput DNA sequence analysis and functional assessment of candidate SNP calls monitored on the Affymetrix 6.0 and Illumina Infinium 550K geno- alleles. Cancer Cell 12, 501–513 (2007). 25. Levis, M. & Small, D. FLT3: ITDoes matter in leukemia. Leukemia 17, 1738–1752 typing platforms. We examined the Maq alignments by Decision Tree analysis to (2003). discover SNVs, as well as to identify copy number variants. Non-aligned reads 26. Falini, B. et al. Cytoplasmic nucleophosmin in acute myelogenous leukemia with a were further analysed for indel discovery. For all putative variants, we attempted normal karyotype. N. Engl. J. Med. 352, 254–266 (2005). validation using custom PCR and capillary sequencing on the ABI 3730 plat- 27. Thiede, C. et al. Prevalence and prognostic impact of NPM1 mutations in 1485 form. All validated somatic mutations were further analysed by Roche/454 adult patients with acute myeloid leukemia (AML). Blood 107, 4011–4020 sequencing of PCR-generated amplicons made from primary genomic DNA (2006). to compare readcounts of wild-type and mutant alleles in the primary tumour, 28. den Besten, W., Kuo, M. L., Williams, R. T. & Sherr, C. J. Myeloid leukemia- skin and relapse tumour samples. A complete description of the AML case associated nucleophosmin mutants perturb p53-dependent and independent sequenced, and the materials and methods used to generate this data set are activities of the Arf tumor suppressor protein. Cell Cycle 4, 1593–1598 (2005). provided in the Supplementary Information. 29. Kelly, L. M. et al. PML/RARa and FLT3-ITD induce an APL-like disease in a mouse model. Proc. Natl Acad. Sci. USA 99, 8283–8288 (2002). Sequence variant deposition in dbGaP. High-quality sequence variants defined by Decision Tree (2,647,695 variants) will be deposited in the dbGaP database Supplementary Information is linked to the online version of the paper at (http://www.ncbi.nlm.nih.gov/sites/entrez?Db5gap) for review by approved www.nature.com/nature. investigators. Acknowledgements We are grateful to our AML patients and their families, and to A. J. Siteman, whose generous and visionary gift provided the main funding source Received 28 May; accepted 16 September 2008. for this study. We thank G. Flance, D. Kipnis and K. Polonsky for their support, and 1. Jemal, A. et al. Cancer statistics, 2008. CA Cancer J. Clin. 58, 71–96 (2008). C. Bloomfield, M. Caligiuri and J. Vardiman from the Cancer and Leukemia Group B 2. Owen, C., Barnett, M. & Fitzgibbon, J. Familial myelodysplasia and acute myeloid for providing important AML samples for validation studies. We also thank the leukaemia—a review. Br. J. Haematol. 140, 123–132 (2008). staff of The Genome Center at Washington University for their support of and their 3. Mrozek, K., Marcucci, G., Paschka, P., Whitman, S. P. & Bloomfield, C. D. Clinical many contributions to this project, and H. Li of the Sanger Institute for assistance relevance of mutations and gene-expression changes in adult acute myeloid with the use of Maq. Further funding was provided by the National Cancer Institute Macmillan Publishers Limited. All rights reserved © 2008 ARTICLES NATURE |Vol 456 |6 November 2008 (T.J.L.), the National Human Genome Research Institute (R.K.W.), and the D.L.: data analysis. L.F.: production data oversight. T.W. and J.G.: data analysis Barnes-Jewish Hospital Foundation (T.J.L.). algorithm development. V.M.: next-generation platform development. J.C. and N.S.: primary next-generation data production. A.C.: analysis oversight for Author Contributions T.J.L. and R.K.W.: project conception and oversight. T.J.L. mutation discovery. Y.Z.: manual review of sequence variants. R.E.R. and M.J.W.: and E.R.M.: project leaders and analysis coordination. L.D.: supervised variant comparative genomic hybridization analyses. R.E.R.: cDNA expression analyses. discovery and characterization, decision tree analysis. D.E.L.: decision tree analysis J.E.P.: gene expression array analysis. P.W., M.W., J.I. and S.H.: clinical data and development. S.S.: automated variant detection by decision tree analysis. B.F.: specimen acquisition/processing/management. R.N.: bioinformatic analysis. J.B. variant validation oversight. B.F., P.M. and D.G.: Consed multiple sequence viewer and W.D.S.: statistical analysis. P.W., M.H.T., T.A.G., J.F.D. and D.C.L.: study development/programming. M.D.M.: auto-analysis and manual review of design, execution and analysis. T.J.L., E.R.M., D.D., D.L., L.W.H., P.W., M.H.T., validation data. K.C.: copy number analysis, variant detection algorithm D.C.L., T.A.G., J.F.D. and R.K.W.: manuscript preparation. development. D.C.K.: indel detection algorithm development. K.C. and L.W.H.: indel detection. D.D.: IT and data management, data analysis automation leader. Author Information The high-quality sequence variants have been deposited in the B.H.D.-S.: variant detection algorithm development. S.M. and M.T.: library dbGaP database (http://www.ncbi.nlm.nih.gov/sites/entrez?Db5gap) under the optimization and construction. L.C.: data generation scheduling and oversight. R.A. accession number phs000159.v1.p1. Reprints and permissions information is and T.M.: variant validation assays. X.S.: variant annotation pipeline development. available at www.nature.com/reprints. This paper is distributed under the terms of D.E.L.: variant annotation. J.R.O.: variant data management and pfam analysis. the Creative Commons Attribution-Non-Commercial-Share Alike licence, and is A.H.: validation assay design. C.P.: LIMS (Laboratory Information Management freely available to all readers at www.nature.com/nature. Correspondence and System) oversight. S.A.: LIMS trouble shooting/facilitation of variant detection. requests for materials should be addressed to E.R.M. (emardis@wustl.edu). Macmillan Publishers Limited. All rights reserved © 2008 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nature Springer Journals

Loading next page...
 
/lp/springer-journals/dna-sequencing-of-a-cytogenetically-normal-acute-myeloid-leukaemia-Jrt9vA0pTJ

References (77)

Publisher
Springer Journals
Copyright
Copyright © 2008 by The Author(s)
Subject
Science, Humanities and Social Sciences, multidisciplinary; Science, Humanities and Social Sciences, multidisciplinary; Science, multidisciplinary
ISSN
0028-0836
eISSN
1476-4687
DOI
10.1038/nature07485
Publisher site
See Article on Publisher Site

Abstract

Vol 456 |6 November 2008 |doi:10.1038/nature07485 ARTICLES DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome 1,2,3,4 2,3 2,3 3 3 3 3 Timothy J. Ley *, Elaine R. Mardis *, Li Ding , Bob Fulton , Michael D. McLellan , Ken Chen , David Dooling , 3 3 3 3 3 3 Brian H. Dunford-Shore , Sean McGrath , Matthew Hickenbotham , Lisa Cook , Rachel Abbott , David E. Larson , 3 3 3 3 3 3 3,8 Dan C. Koboldt , Craig Pohl , Scott Smith , Amy Hawkins , Scott Abbott , Devin Locke , LaDeana W. Hillier , 3 3 2,3 3 3 3 Tracie Miner , Lucinda Fulton , Vincent Magrini , Todd Wylie , Jarret Glasscock , Joshua Conyers , 3 3 3 3 8 3 1 Nathan Sander , Xiaoqi Shi , John R. Osborne , Patrick Minx , David Gordon , Asif Chinwalla , Yu Zhao , 1 5 1,4 1,4 3,4,5 6 Rhonda E. Ries , Jacqueline E. Payton , Peter Westervelt , Michael H. Tomasson , Mark Watson , Jack Baty , 4,7 1,4 1,4 4,5 1,4 Jennifer Ivanovich , Sharon Heath , William D. Shannon , Rakesh Nagarajan , Matthew J. Walter , 1,4 1,4 1,4 2,3,4 Daniel C. Link , Timothy A. Graubert , John F. DiPersio & Richard K. Wilson Acute myeloid leukaemia is a highly malignant haematopoietic tumour that affects about 13,000 adults in the United States each year. The treatment of this disease has changed little in the past two decades, because most of the genetic events that initiate the disease remain undiscovered. Whole-genome sequencing is now possible at a reasonable cost and timeframe to use this approach for the unbiased discovery of tumour-specific somatic mutations that alter the protein-coding genes. Here we present the results obtained from sequencing a typical acute myeloid leukaemia genome, and its matched normal counterpart obtained from the same patient’s skin. We discovered ten genes with acquired mutations; two were previously described mutations that are thought to contribute to tumour progression, and eight were new mutations present in virtually all tumour cells at presentation and relapse, the function of which is not yet known. Our study establishes whole-genome sequencing as an unbiased method for discovering cancer-initiating mutations in previously unidentified genes that may respond to targeted therapies. We used massively parallel sequencing technology to sequence the AML refers to a group of clonal haematopoietic malignancies that genomic DNA of tumour and normal skin cells obtained from a patient predominantly affect middle-aged and elderly adults. An estimated with a typical presentation of French–American–British (FAB) subtype 13,000 people will develop AML in the United States in 2008, and M1 acute myeloid leukaemia (AML) with normal cytogenetics. For the 8,800 will die from it . Although the life expectancy from this disease tumour genome, 32.7-fold ‘haploid’ coverage (98 billion bases) was has increased slowly over the past decade, the improvement is pre- dominantly because of improvements in supportive care—not in the obtained, and 13.9-fold coverage (41.8 billion bases) was obtained for the normal skin sample. Of the 2,647,695 well-supported single drugs or approaches used to treat patients. nucleotide variants (SNVs) found in the tumour genome, 2,584,418 For most patients with a ‘sporadic’ presentation of AML, it is not yet (97.6%) were also detected in the patient’s skin genome, limiting the clear whether inherited susceptibility alleles have a role in the patho- number of variants that required further study. For the purposes of this genesis . Furthermore, the nature of the initiating or progression initial study, we restricted our downstream analysis to the coding mutations is for the most part unknown . Recent attempts to identify sequences of annotated genes: we found only eight heterozygous, additional progression mutations by extensively re-sequencing tyro- non-synonymous somatic SNVs in the entire genome. All were new, sine kinase genes yielded very few previously unidentified mutations, 4,5 including mutations in protocadherin/cadherin family members and most were not recurrent . Expression profiling studies have (CDH24 and PCLKC (also known as PCDH24)), G-protein-coupled yielded signatures that correlate with specific cytogenetic subtypes of 6–8 receptors (GPR123 and EBI2 (also known as GPR183)), a protein AML, but have not yet suggested new initiating mutations .Recent phosphatase (PTPRT), a potential guanine nucleotide exchange factor studies using array-based comparative genomic hybridization and/or (KNDC1), a peptide/drug transporter (SLC15A1) and a glutamate single nucleotide polymorphism (SNP) arrays, although identifying 9,10 receptor gene (GRINL1B). We also detected previously described, important gene mutations in acute lymphoblastic leukaemia have recurrent somatic insertions in the FLT3 and NPM1 genes. On the revealed very few recurrent submicroscopic somatic copy number basis of deep readcount data, we determined that all of these mutations variants in AML (M.J.W., manuscript in preparation, and refs 11– (except FLT3) were present in nearly all tumour cells at presentation 13). Together, these studies suggest that we have not yet discovered and again at relapse 11 months later, suggesting that the patient had a most of the relevant mutations that contribute to the pathogenesis of single dominant clone containing all of the mutations. These results AML. We therefore believe that unbiased whole-genome sequencing demonstrate the power of whole-genome sequencing to discover new will be required to identify most of these mutations. Until recently, this cancer-associated mutations. approach has not been feasible because of the high cost of conventional 1 2 3 4 5 6 Department of Medicine, Department of Genetics, The Genome Center at Washington University, Siteman Cancer Center, Department of Pathology and Immunology, Division 7 8 of Biostatistics, and Department of Surgery, Washington University School of Medicine, St. Louis, Missouri 63108, USA. Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA. *These authors contributed equally to this work. Macmillan Publishers Limited. All rights reserved © 2008 NATURE |Vol 456 |6 November 2008 ARTICLES capillary-based approaches and the large numbers of primary tumour To determine whether the tumour cells of 933124 were typical of cells required to yield the necessary genomic DNA. ‘Next-generation’ M1 AML, we compared the expression signatures of 111 de novo AML sequencing approaches, however, have changed this landscape. cases using unsupervised clustering (Ward’s method, see Supple- Our group has pioneered the use of whole-genome re-sequencing mentary Information). The expression profile of patient 933124 clustered with multiple other M1 (and M2) AML cases with normal and variant discovery approaches using the Illumina/Solexa techno- cytogenetics, suggesting that the genetic events underlying the patho- logy with the genome of the nematode worm Caenorhabditis elegans as genesis of this case are similar to those of other cases exhibiting normal a proof-of-principle . This approach has distinct advantages in cytogenetics (Supplementary Fig. 3). reduced cost, a markedly increased data production rate, and a low input requirement of DNA for library construction. In the present Coverage depth of the tumour and skin genomes study, we used a similar approach to sequence the tumour genome Because most of the acquired mutations in cancer genomes have been of a single AML patient and the matched normal genome (derived shown to be heterozygous, the complete sequencing of a cancer gen- from a skin biopsy) of the same patient. After alignment to the human ome requires the detection of both alleles at most positions in the reference genome, sequence variants were discovered in the tumour genome . We therefore designed sequence coverage metrics to define genome and compared to the patient’s normal sequence, to the dbSNP the point at which 90% diploid coverage had been reached. To min- database, and to variants recently reported for two other human gen- 15,16 imize errors associated with any single platform or measurement, omes ; revealing new single nucleotide and small insertion/deletion diploid coverage for this genome was assessed using a set of high- (indel) variants genome-wide. Somatic mutations were detected in quality SNPs derived from two different SNP array platforms, genes not previously implicated in AML pathogenesis, demonstrating Affymetrix 6.0 and Illumina Infinium 550K. For a SNP to be included the need for unbiased whole-genome approaches to discover all muta- in the high-quality set, the following criteria had to be satisfied: (1) tions associated with cancer pathogenesis. identical genotypes were called from both assays at the same genomic positions, and (2) the resulting genotype was heterozygous. For the Rationale for using the FAB M1 AML subtype for sequencing 933124 tumour genome, 46,494 heterozygous SNPs passed the above Of the eight FAB subtypes of AML, M1 AML is one of the most criteria and were defined as high-quality SNPs. For the skin samples, common (,20% of all cases). No specific cytogenetic abnormalities 46,572 high-quality SNPs were defined. or somatic initiating mutations have been identified for this subtype; We performed 98 full runs on the Illumina Genome Analyser to in fact, about half of the patients with de novo M1 AML have normal achieve the targeted level of 90% diploid coverage as determined by 17–19 cytogenetics . The frequency of well-described progression muta- coverage of the high-quality SNP set. Maq was used to perform tions (for example, activating alleles of FLT3, KIT and RAS) is similar alignment, determine consensus, and identify SNVs within the 98 to that of other common FAB subtypes . We therefore decided to billion bases generated from the tumour genome (see Table 1). Maq sequence the genome of tumour cells derived from a patient with M1 predicted a total of 3.81 million SNVs (Maq SNP quality$ 15) in the AML, because so little is known about the molecular pathogenesis of tumour genome, including matching heterozygous genotypes for this common subtype. The criteria used to select the sample are out- 91.2% of the 46,494 high-quality SNPs. When we lowered the Maq lined in Supplementary Information. SNP quality cutoff to 0, 94.06% high-quality SNPs were predicted. Further investigation of Maq alignments revealed coverage for both Case presentation of UPN 933124 alleles at a further 5.38% of the high-quality SNPs, but Maq did not The case presentation is described in detail in the Supplementary predict a SNP or matching heterozygous genotype owing to insuf- Information. In brief, a previously healthy woman in her mid-50s ficient depth or quality of coverage. Extra analysis revealed coverage presented suddenly with fatigue and easy bruisability, and was found at 46,484 of 46,494 high-quality SNPs for at least one allele (that is, to have a peripheral white blood cell count of 105,000 cells per micro- 99.98% haploid coverage for the tumour genome). litre, with 85% myeloblasts. A bone marrow examination revealed We sequenced the genome of normal skin cells from the same 100% myeloblasts with morphological features and cell surface mar- patient to enable the identification of inherited sequence variants kers consistent with FAB M1 AML (Supplementary Fig. 1). in the tumour genome. Our targeted diploid coverage goal for the Cytogenetic analysis of tumour cells revealed a normal 46,XX karyo- skin-derived genome was 80%. We achieved this goal with only 34 type. Although the patient experienced a complete remission with Solexa runs (41.8 billion bases), using improved reagents and longer conventional therapies, she relapsed at 11 months and expired read lengths to attain 82.6% diploid and 84.2% haploid coverage 24 months after her initial diagnosis was made. At relapse, the bone (Table 1). marrow had 78% myeloblasts, and contained a new clonal cytoge- To begin evaluating the quantity and quality of the detected netic abnormality, t(10; 12) (p12; p13). Informed consent for whole- sequence variants in the tumour and skin genomes, we compared genome sequencing was subsequently obtained from her next of kin. the overlap and uniqueness of this genome’s variants with respect to the James D. Watson and J. Craig Venter genomes, and to dbSNP A typical M1 AML diploid genome and expression profile (v127; Fig. 1). Of the 3.68 million single nucleotide variants (SNVs; The tumour sample from patient 933124 contained no somatic copy Maq SNP quality$15, excluding SNVs found on chromosome X) number changes at a resolution of ,5 kb (further confirmed on the predicted by Maq in the tumour genome, 2.36 million were present in NimbleGen 2.1M array platform, data not shown), and no evidence dbSNP, 2.36 million were detected in the skin genome (Fig. 1a), of copy number neutral loss-of-heterozygosity (LOH), indicating 1.50 million were detected in the Venter genome, and 1.58 million that the genome was essentially diploid at this level of resolution were found in the Watson genome (Fig. 1b). Ultimately, 1.70 million (see Supplementary Fig. 2). Further analysis of the 933124-derived SNVs were unique to the 933124 tumour genome. On filtering the tumour and skin samples showed 26 inherited copy number variants 933124 SNVs at different Maq quality values to determine the (that is, detected in both the tumour and skin samples). All but two of stability of results, we observed that the proportion of 933124 these had been previously reported in the Database of Genomic SNVs that also are in dbSNP increases from 63.9% to 69.48% when Variants (see Supplementary Table 1). All of the copy number var- the Maq quality threshold score increases from 15 to 30, as expected. iants detected in this genome were found in at least one other AML Refining the detection of potential somatic mutations patient (89 other cases, mostly Caucasian, have been queried using the same SNP array platform), and all but one were found in at least Because the number of sequence variants initially detected by Maq one of the 160 Caucasian HapMap and Coriell samples that were was high, we developed improved filtering tools to effectively sepa- studied on the same array platform (Supplementary Table 1). rate true variants from false positives. To this end, we generated an Macmillan Publishers Limited. All rights reserved © 2008 ARTICLES NATURE |Vol 456 |6 November 2008 Table 1 | Tumour and skin genome coverage from patient 933124 Tumour Skin Libraries 43 Runs 98 34 Reads obtained 5,858,992,064 2,122,836,148 Reads passing quality filter 3,025,923,365 1,228,177,690 Bases passing quality filter 98,184,511,523 41,783,794,834 Reads aligned by Maq 2,729,957,053 1,080,576,680 Reads unaligned by Maq 295,966,312 138,276,594 SNVs detected with respect to hg18 (no Y) 3,811,115 2,918,446 SNVs (chr 1–22) detected with respect to hg18 3,681,968 (100.0%) 2,830,292 (100.0%) SNVs also present in dbSNP 2,368,458 (64.3%) 2,161,695 (76.4%) SNVs also present in Venter genome 1,499,010 (40.7%) 1,383,431 (48.9%) SNVs also present in Watson genome 1,573,435 (42.7%) 1,456,822 (51.5%) SNVs not in dbSNP/Venter/Watson 1,223,830 (33.2%) 591,131 (20.9%) SNVs not in dbSNP/Venter/Watson/skin 925,200 (25.1%) 2 HQ SNPs 46,494 (100.0%) 46,572 (100.0%) HQ SNPs where reference allele is detected 42,419 (91.2%) 38,454 (82.6%) HQ SNPs where variant allele is detected 43,164 (92.9%) 39,220 (84.2%) HQ SNPs where both alleles are detected 42,415 (91.2%) 38,454 (82.6%) Assessments are shown of the haploid and diploid coverage of the tumour and skin genomes from AML patient 933124. Chr, chromosome; hg18, human genome version 18; HQ, high quality. experimental data set by re-sequencing Maq-predicted SNVs, ran- This approach identified parameters that separated true variants domly selecting a training subset and a test data set, whose annota- from false positives, revealing that SNV-supporting read counts tions and features were submitted to Decision Tree C4.5 (ref. 22). (unique on the basis of read start position and base position in supporting reads), base quality and Maq quality scores are chief determinants for identifying false positives. Implementing rules obtained from the Decision Tree analysis resulted in 91.9% sensitivity and 83.5% specificity for validated SNVs. Identification of somatic mutations in coding sequences 933124 Venter The patient had 3,813,205 sequence variants in her tumour genome, as defined by Maq scores of.15 (Table 1). Of these, 2,647,695 were supported by the Decision Tree analysis in the tumour genome, of which 2,584,418 (97.6%) were also detected in the skin genome (Fig. 2). The detailed algorithm for selecting putative somatic var- iants is described in Supplementary Information. Most of the 63,277 tumour-specific variants we detected were either present in dbSNP or were previously described in the Watson or Venter genomes (31,645), or occurred in non-genic regions (20,440). A total of 11,192 variants were located within the boundaries of annotated Watson 3,813,205 tumour SNVs (Maq15) Skin 2,647,695 well supported SNVs (decision tree) 2,584,418 present Tumour 63,277 tumour-specific SNVs in skin (SNPs) 31,645 in dbSNP/ Watson/Venter 31,632 new SNVs 20,440 in non-genic regions 11,192 SNVs in genic regions 10,735 intronic 216 in UTR 241 SNVs in coding sequence 60 synonymous dbSNP 7 unable to 181 SNVs predicted to alter gene function be validated (non-synonymous and splice junctions) (technical failures) 14 validated 8 validated as somatic 152 validated Figure 1 | Overlap of SNPs detected in 933124 and other genomes. a, Venn as germline as wild type SNVs (acquired mutations) diagram of the overlap between SNPs detected in the 933124 tumour SNVs (SNPs) (false positives) genome and the genomes of J. D. Watson and J. C. Venter. b, Venn Diagram of the overlap among the 933124 tumour genome, the skin genome and Figure 2 | Filters used to identify somatic point mutations in the tumour dbSNP (ver. 127). SNVs were defined with a Maq SNP quality$15. genome. See text for details. UTR, untranslated regions. Macmillan Publishers Limited. All rights reserved © 2008 NATURE |Vol 456 |6 November 2008 ARTICLES genes; 216 of these variants were in untranslated regions, and 10,735 100 were in introns (but not involving splice junctions) and were not Primary tumour explored further in our analysis. Of the coding sequence variants, 60 Relapse tumour were synonymous, and not further evaluated. The remaining 181 Skin variants were either non-synonymous, or were predicted to alter splice site function. By sequencing polymerase chain reaction (PCR)-generated amplicons from the tumour and skin samples * * * (and also from the relapse tumour sample obtained 11 months after * * * * * * * * the original presentation), we determined that 152 of these variants were false positive (that is, wild type) calls, 14 were inherited SNPs, * and eight were somatic mutations in both the original tumour and the relapse sample (Table 2). Seven variants could not be validated, either because the regions involved were repetitive, or because all attempts to obtain PCR amplicons failed. All of the PCR-amplified exons from the eight genes containing validated somatic mutations were sequenced in 187 further cases of AML using samples from our discovery and validation sets ; no further somatic mutations were detected in these genes (data not shown). A description of how we Figure 3 | Summary of Roche/454 FLX readcount data obtained for ten estimated the false negative (12.45%) and false positive (0.06%) rates somatic mutations and two validated SNPs in the primary tumour, relapse for SNVs over the entire genome is presented in Supplementary tumour and skin specimens. The readcount data for the variant alleles in the Information. Using these estimates, we can predict that very few primary tumour sample and relapse tumour sample are statistically different somatic, non-synonymous variants were missed by our analysis of from that of the skin sample for all mutations (P, 0.000001 for all mutations, Fisher’s exact test, denoted by a single asterisk in all cases). Note this deeply covered genome. that the normal skin sample was contaminated with leukaemic cells containing the somatic mutations. The patient’s white blood cell count was Defining mutation frequencies in the tumour sample 105,000 (85% blasts) when the skin punch biopsy was obtained. To better define the percentage of tumour cells that contained each of the discovered somatic mutations, we amplified each mutation- tumour variants to move forward in the discovery pipeline if they containing locus from non-amplified genomic DNA derived from were detected at a low frequency (two or fewer reads) in the skin the de novo and relapse tumour samples, and from the skin biopsy sample, as defined by a binomial test. obtained at presentation. The resulting amplicons were sequenced using the Roche/454 FLX platform, and the frequency of reads con- Detecting insertions and deletions (indels) taining the reference and variant alleles were defined (Fig. 3 and To discover small indels (,6 bp) from sequence reads (32–35 bp Table 3). Control amplicons containing a known heterozygous long), we started with a set of 236 million reads that were not con- SNP in BRCA2 (encoding N372H) and a homozygous SNP in fidently aligned by Maq to the reference genome. We applied TP53 (encoding P72R) were analysed similarly. The BRCA2 SNP Cross_Match and BLAT to identify gapped alignments that are unique yielded ,50% variant frequencies in the tumour and skin samples, in the genome. To detect indels longer than 6 bp, we developed a ‘split whereas nearly 100% of the TP53 alleles were variant in all three reads’ algorithm (see Supplementary Information) that aligns sub- samples, as expected. Remarkably, all eight somatic SNVs were segments of reads independently to the genome, and computes a detected at ,50% frequencies in the primary tumour sample mapping quality for the derived gapped alignment on the basis of (100% blasts), and at ,40% frequencies in the relapse sample the number of hits and the quality of the bases. These efforts resulted (78% blasts; if the variant frequencies are corrected for blast in the identification of 726 putative small indels (1 to 30 bp in size) counts—that is, multiplied by 1.28—the frequencies at relapse also that occur in coding exons, 393 of which (54.2%) were found in were ,50%). The NPMc (cytoplasmic nucleophosmin) mutation dbSNP. After manual review, we selected a set of 28 putative somatic was also detected at a frequency of ,50%, but the FLT3 internal coding indels for validation using PCR-based dye terminator sequen- tandem duplication (ITD) allele was only detected in 35.1% of the cing. Of these putative indels, 22 were validated but were found pre- 454 reads at diagnosis and 31.3% at relapse, suggesting that the sent in both tumour and skin (15 of these were in dbSNP), two were mutation was not present in all tumour cells at diagnosis or relapse. false positive calls, two had no coverage, and two were previously Notably, the variant alleles also were detected at frequencies of validated somatic insertions in NPM1 (4 bp) and FLT3 (30 bp). ,5–13% in the skin sample. In retrospect, it is clear that the skin sample contained contaminating leukaemic cells, because the Discussion patient’s white blood cell count at presentation was 105,000 per microlitre, with 85% blasts. This information was used to inform Here we describe the sequencing and analysis of a primary human the Decision Tree analysis described above: we allowed high-quality cancer genome using next-generation sequencing technology. Our Table 2 | Non-synonymous somatic mutations detected in the AML sample Gene Consequence Type Solexa tumour reads Solexa skin reads Conservation score of Mutations in other AML WT:variant WT:variant mutant base cases* CDH24 Y590X Nonsense 9:916:00.998 0/187 SLC15A1 W77X Nonsense 15:12 19:01.000 0/187 KNDC1 L799F Missense 7:820:0 NA 0/187 PTPRT P1235L Missense 9:13 16:01.000 0/187 GRINL1B R176H Missense 15:10 14:0 NA 0/187 GPR123 T38I Missense 11:11 13:0 NA 0/187 EBI2 A338V Missense 7:12 18:2 1.000 0/187 PCLKC P1004L Missense 19:9 15:1 0.98 0/187 FLT3 ITD Indel 18:12 8:0 NA 51/185 NPM1 CATG ins Indel 36:633:0 NA 43/180 Ins, insertion; WT, wild type. * Patient cohort defined in ref. 23. Macmillan Publishers Limited. All rights reserved © 2008 CDH24 SLC15A1 KNDC1 PTPRT GRINL1B GPR123 EB12 PCLKC FLT3 NPM1 BRCA2 TP53 Variant (%) ARTICLES NATURE |Vol 456 |6 November 2008 Table 3 | 454 Readcount data for somatic mutations and known SNPs Primary AML (100% blasts) Skin Relapse (78% blasts) Gene Consequence Variant Ref Variant (%) Variant Ref Variant (%) Variant Ref Variant (%) CDH24 Y590X 5672 4890 53.70 564 10358 5.16 3108 4599 40.33 SLC15A1 W77X 3817 4962 43.48 875 10773 7.51 4714 7173 39.66 KNDC1 L799F 4640 4848 48.90 770 8972 7.90 3883 6342 37.98 PTPRT P1235L 998 1058 48.54 126 1489 7.80 350 493 41.52 GRINL1B R176H 2211 2674 45.26 318 4461 6.65 1447 2070 41.14 GPR123 T38I 4618 4569 50.27 850 9751 8.02 3660 6057 37.67 EBI2 A338V 12750 15453 45.21 458 10088 4.34 2646 3627 42.18 PCLKC P1004L 992 855 53.71 341 3153 9.76 705 773 47.70 FLT3 ITD 4220 7810 35.08 3475 23159 13.05 3870 8495 31.30 NPM1 CATG ins 1550 1974 43.98 143 2390 5.65 2303 3910 37.07 BRCA2 N372H 778 752 50.85 763 876 46.55 285 303 48.47 TP53 P72R 8989 1 99.99 8161 0 100.00 7914 6 99.92 The differences between variant frequencies in primary or relapse tumour samples and skin were highly significant for all somatic mutations (P, 0.000001, Fisher’s exact test, one tailed). The BRCA2 variant is a known heterozygous SNP in this genome, and the TP53 variant is a known homozygous SNP. patient’s tumour genome was essentially diploid, and contained ten PTPRT, CDH24, PCLKC and SLC15A1). The other four somatic non-synonymous somatic mutations that may be relevant for her mutations occurred in genes not previously implicated in cancer disease. These mutations affect genes participating in several well- pathogenesis, but whose potential functions in metabolic pathways described pathways that are known to contribute to cancer patho- suggest mechanisms by which they could act to promote cancer genesis, but most of these genes would not have been candidates for (including KNDC1, GPR123, EBI2 and GRINL1B). We speculate directed re-sequencing on the basis of our current understanding of about the roles of these mutations for the pathogenesis of this cancer. Hence, these results justify the use of next-generation whole- patient’s disease in Supplementary Information. genome sequencing approaches to reveal somatic mutations in can- The importance of the eight newly defined somatic mutations for cer genomes. AML pathogenesis is not yet known, and will require functional As we demonstrated in our re-sequencing of the genome of the C. validation studies in tissue culture cells and mouse models to assess elegans N2 Bristol strain , and again in this study, massively parallel their relevance. Even though we could not detect recurrent mutations short-read sequencing provides an effective method for examining in the limited AML sample set that we surveyed, several lines of single nucleotide and short indel variants by comparison of the aligned evidence suggest that these mutations may not be random, ‘passen- reads to a reference genome sequence. By sequencing our patient’s ger’ mutations. First, somatic mutations in this genome are extremely tumour genome to a depth of.30-fold coverage, and gauging our rare. The rarity of somatic variants, and the normal diploid structure ability to detect known heterozygous positions across the genome, of the tumour genome, argues strongly against genetic instability or we have produced a sufficient depth and breadth of sequence coverage DNA repair defects in this tumour. Conceptually, this result is further to comprehensively discover somatic genome variants. A slightly lower supported by the very small number of somatic mutations discovered 4,5 coverage of the normal genome from this individual helped to identify in the expressed tyrosine kinases of AML samples ; genetic insta- nearly 98% of potential variants as being inherited, a critical filter that bility does not seem to be a general feature of AML genomes. allowed us to more readily identify the true somatic mutations in this Second, on the basis of the equivalent frequencies of the variant tumour. Our results strongly support the notion that hypothesis- and wild-type alleles for the mutations in the tumour genome (except driven (for example, candidate gene-based) examination of tumour for FLT3 ITD), it is highly probable that all the mutations are het- genomes by PCR-directed or capture-based methods is inherently erozygous, and are present in virtually all of the tumour cells (Fig. 3). limited, and will miss key mutations. A further and important consid- The latter suggests that these mutations may have all been selected for eration is the demand for large amounts of genomic DNA by these and retained because they are important for disease pathogenesis in techniques; this is a serious limitation when precious clinical samples this patient. Alternatively, all may have occurred simultaneously in are being studied. The Illumina/Solexa technology requires only,1 mg the same leukaemia-initiating cell, but only a subset of the mutations of DNA per library, enabling the study of primary tumour DNA rather (or an as-yet undetected mutation) is truly important for pathoge- than requiring the use of tumour cell lines, which may contain genetic nesis (that is, disease ‘drivers’ versus passengers). Although we sug- changes and adaptations required for immortalization and mainten- gest that the latter hypothesis is very unlikely on the basis of our ance in tissue culture conditions. current understanding of tumour progression, many more AML A total of ten non-synonymous somatic mutations were identified genomes will need to be sequenced to resolve this issue. in this patient’s tumour genome. Two are well-known AML-associated Third, the same mutations were detected in tumour cells in the mutations, including an internal tandem duplication of the FLT3 relapse sample at approximately the same frequencies as in the prim- receptor tyrosine kinase gene, which constitutively activates kinase ary sample. All of these mutations were therefore present in the 5,24,25 signalling, and portends a poor prognosis , and a four-base inser- resistant tumour cells that contributed to the patient’s relapse, fur- 26–28 tion in exon 12 of the NPM1 gene (NPMc) . Both of these mutations ther suggesting that a single clone contains all ten mutations. Fourth, are common (25–30%) in AML tumours, and are thought to contri- seven of the ten genes containing somatic mutations were detectably bute to progression of the disease rather than to cause it directly . expressed in the tumour sample. FLT3 and NPM1 messenger RNAs Notably, the frequency of the mutant FLT3 allele in the primary and were highly expressed in this tumour sample, as they are in virtually relapse tumour samples (35.08% and 31.30%, respectively) was all AML samples. We detected mRNA from the CDH24, SLC15A1 significantly less than that of the other nine mutations (P, 0.000001 and EBI2 genes on the Affymetrix expression array, whereas express- for both the primary and relapse samples). These data suggest that the ion of GRINL1B and PCLKC were detected by PCR with reverse FLT3 ITD may not have been present in all tumour cells, and further, transcription (RT–PCR; data not shown). Expression of KNDC1, that it may have been the last mutation acquired. PTPRT and GPR123 was not detected by either approach, but we The other eight somatic mutations that we detected are all single cannot rule out expression of these genes in a small subset of tumour base changes, and none has previously been detected in an AML cells (for example, leukaemia-initiating cells). Furthermore, for the genome. Four of the genes affected, however, are in gene families five point mutations where data are available, the mutated base is that are strongly associated with cancer pathogenesis (including highly conserved across multiple species (Table 2). Macmillan Publishers Limited. All rights reserved © 2008 NATURE |Vol 456 |6 November 2008 ARTICLES leukemia with normal cytogenetics: are we ready for a prognostically prioritized Although we performed whole-genome sequencing on this cancer molecular classification? Blood 109, 431–448 (2007). sample, we restricted our initial validation studies to the 1–2% of the 4. Loriaux, M. M. et al. High-throughput sequence analysis of the tyrosine kinome in genome that encodes genes. This raises the issue of whether sequen- acute myeloid leukemia. Blood 111, 4788–4796 (2008). cing the complementary DNA transcriptome of this tumour would 5. Tomasson, M. H. et al. Somatic mutations and germline sequence variants in the have been a faster, cheaper and more efficient way of finding the expressed tyrosine kinase genes of patients with de novo acute myeloid leukemia. Blood 111, 4797–4808 (2008). mutations. Although this approach will undoubtedly be an import- 6. Schoch, C. et al. Acute myeloid leukemias with reciprocal rearrangements can be ant adjunct to whole-genome sequencing, there are several advan- distinguished by specific gene expression profiles. Proc. Natl Acad. Sci. USA 99, tages to the approach we used: (1) coverage models for whole- 10008–10013 (2002). genome libraries are at present better understood than for cDNA 7. Bullinger, L. et al. Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. N. Engl. J. Med. 350, 1605–1616 libraries, where transcript abundance can vary over many orders of (2004). magnitude; (2) even if the transcriptome had been sequenced, 8. Valk, P. J. et al. Prognostically useful gene-expression profiles in acute myeloid extensive characterization of the normal genome would have been leukemia. N. Engl. J. Med. 350, 1617–1628 (2004). required to distinguish inherited variants from somatic mutations; 9. Mullighan, C. G. et al. Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia. Nature 446, 758–764 (2007). and (3) relevant non-synonymous mutations could be missed by 10. Mullighan, C. G. et al. BCR–ABL1 lymphoblastic leukaemia is characterized by the cDNA sequencing, including mutations that result in RNA instability deletion of Ikaros. Nature 453, 110–114 (2008). (splice variants, nonsense mutations), and/or mutations in genes 11. Raghavan, M. et al. Genome-wide single nucleotide polymorphism analysis expressed at low levels, or in only a small subset of tumour cells. reveals frequent partial uniparental disomy due to somatic recombination in The additional non-coding and non-genic somatic variants in this acute myeloid leukemias. Cancer Res. 65, 375–378 (2005). 12. Paulsson, K. et al. High-resolution genome-wide array-based comparative genome (which we presently estimate at 500–1,000 on the basis of our genome hybridization reveals cryptic chromosome changes in AML and MDS calculated false positive and negative rates for non-synonymous cases with trisomy 8 as the sole cytogenetic aberration. Leukemia 20, 840–846 mutations), will provide a rich source of potentially relevant (2006). sequence changes that will be better understood as more cancer gen- 13. Rucker, F. G. et al. Disclosure of candidate genes in acute myeloid leukemia with complex karyotypes using microarray-based molecular characterization. J. Clin. omes are sequenced. Oncol. 24, 3887–3894 (2006). In summary, we have successfully used a next-generation whole- 14. Hillier, L. W. et al. Whole-genome sequencing and variant discovery in C. elegans. genome sequencing approach to identify new candidate genes that Nat. Methods 5, 183–188 (2008). may be relevant for AML pathogenesis. We cannot overemphasize 15. Wheeler, D. A. et al. The complete genome of an individual by massively parallel the importance of parallel sequencing of the patient’s normal genome DNA sequencing. Nature 452, 872–876 (2008). 16. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, to determine which variants were inherited; the identification of the e254 (2007). true somatic mutations in this tumour genome would not have been 17. Byrd, J. C. et al. Pretreatment cytogenetic abnormalities are predictive of induction feasible without this approach. Furthermore, until hundreds (or per- success, cumulative incidence of relapse, and overall survival in adult patients haps thousands) of normal genomes and other AML tumours are with de novo acute myeloid leukemia: results from Cancer and Leukemia Group B (CALGB 8461). Blood 100, 4325–4336 (2002). sequenced, the contextual relevance of the mutations found in this 18. Grimwade, D. et al. The importance of diagnostic cytogenetics on outcome in genome will be unknown. Nevertheless, the somatic mutations that AML: analysis of 1,612 patients entered into the MRC AML 10 trial. The Medical we did find were neither predicted by the curation of previously Research Council Adult and Children’s Leukaemia Working Parties. Blood 92, defined cancer genes, nor by the study of this tumour using unbiased, 2322–2333 (1998). high-resolution array-based genomic approaches. For AML and 19. Mrozek, K., Heerema, N. A. & Bloomfield, C. D. Cytogenetics in acute leukemia. Blood Rev. 18, 115–136 (2004). other types of cancer, whole-genome sequencing may therefore be 20. Wendl, M. C. & Wilson, R. K. Aspects of coverage in medical DNA sequencing. the only effective means for discovering all of the mutations that are BMC Bioinformatics 9, 239 (2008). relevant for pathogenesis. 21. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res doi:10.1101/gr.078212.108 (in METHODS SUMMARY the press). 22. Quinlan, J. R. C4.5:Programs for Machine Learning 302 (Morgan Kaufmann Sequence end reads (average length for tumour genome, 32 bp, and for skin, Publishers, 1993). 35 bp) were generated from Illumina/Solexa fragment libraries derived from the 23. Link, D. C. et al. Distinct patterns of mutations occurring in de novo AML versus tumour or skin cells of patient 933124, using the Illumina Genome Analyser. The AML arising in the setting of severe congenital neutropenia. Blood 110, 1648–1655 analysed reads were aligned to the human reference genome (NCBI Build 36) (2007). using Maq . Coverage of the tumour and normal genomes was ascertained by 24. Frohling, S. et al. Identification of driver and passenger mutations of FLT3 by high- comparison to the patient’s heterozygous SNPs, established by compiling shared throughput DNA sequence analysis and functional assessment of candidate SNP calls monitored on the Affymetrix 6.0 and Illumina Infinium 550K geno- alleles. Cancer Cell 12, 501–513 (2007). 25. Levis, M. & Small, D. FLT3: ITDoes matter in leukemia. Leukemia 17, 1738–1752 typing platforms. We examined the Maq alignments by Decision Tree analysis to (2003). discover SNVs, as well as to identify copy number variants. Non-aligned reads 26. Falini, B. et al. Cytoplasmic nucleophosmin in acute myelogenous leukemia with a were further analysed for indel discovery. For all putative variants, we attempted normal karyotype. N. Engl. J. Med. 352, 254–266 (2005). validation using custom PCR and capillary sequencing on the ABI 3730 plat- 27. Thiede, C. et al. Prevalence and prognostic impact of NPM1 mutations in 1485 form. All validated somatic mutations were further analysed by Roche/454 adult patients with acute myeloid leukemia (AML). Blood 107, 4011–4020 sequencing of PCR-generated amplicons made from primary genomic DNA (2006). to compare readcounts of wild-type and mutant alleles in the primary tumour, 28. den Besten, W., Kuo, M. L., Williams, R. T. & Sherr, C. J. Myeloid leukemia- skin and relapse tumour samples. A complete description of the AML case associated nucleophosmin mutants perturb p53-dependent and independent sequenced, and the materials and methods used to generate this data set are activities of the Arf tumor suppressor protein. Cell Cycle 4, 1593–1598 (2005). provided in the Supplementary Information. 29. Kelly, L. M. et al. PML/RARa and FLT3-ITD induce an APL-like disease in a mouse model. Proc. Natl Acad. Sci. USA 99, 8283–8288 (2002). Sequence variant deposition in dbGaP. High-quality sequence variants defined by Decision Tree (2,647,695 variants) will be deposited in the dbGaP database Supplementary Information is linked to the online version of the paper at (http://www.ncbi.nlm.nih.gov/sites/entrez?Db5gap) for review by approved www.nature.com/nature. investigators. Acknowledgements We are grateful to our AML patients and their families, and to A. J. Siteman, whose generous and visionary gift provided the main funding source Received 28 May; accepted 16 September 2008. for this study. We thank G. Flance, D. Kipnis and K. Polonsky for their support, and 1. Jemal, A. et al. Cancer statistics, 2008. CA Cancer J. Clin. 58, 71–96 (2008). C. Bloomfield, M. Caligiuri and J. Vardiman from the Cancer and Leukemia Group B 2. Owen, C., Barnett, M. & Fitzgibbon, J. Familial myelodysplasia and acute myeloid for providing important AML samples for validation studies. We also thank the leukaemia—a review. Br. J. Haematol. 140, 123–132 (2008). staff of The Genome Center at Washington University for their support of and their 3. Mrozek, K., Marcucci, G., Paschka, P., Whitman, S. P. & Bloomfield, C. D. Clinical many contributions to this project, and H. Li of the Sanger Institute for assistance relevance of mutations and gene-expression changes in adult acute myeloid with the use of Maq. Further funding was provided by the National Cancer Institute Macmillan Publishers Limited. All rights reserved © 2008 ARTICLES NATURE |Vol 456 |6 November 2008 (T.J.L.), the National Human Genome Research Institute (R.K.W.), and the D.L.: data analysis. L.F.: production data oversight. T.W. and J.G.: data analysis Barnes-Jewish Hospital Foundation (T.J.L.). algorithm development. V.M.: next-generation platform development. J.C. and N.S.: primary next-generation data production. A.C.: analysis oversight for Author Contributions T.J.L. and R.K.W.: project conception and oversight. T.J.L. mutation discovery. Y.Z.: manual review of sequence variants. R.E.R. and M.J.W.: and E.R.M.: project leaders and analysis coordination. L.D.: supervised variant comparative genomic hybridization analyses. R.E.R.: cDNA expression analyses. discovery and characterization, decision tree analysis. D.E.L.: decision tree analysis J.E.P.: gene expression array analysis. P.W., M.W., J.I. and S.H.: clinical data and development. S.S.: automated variant detection by decision tree analysis. B.F.: specimen acquisition/processing/management. R.N.: bioinformatic analysis. J.B. variant validation oversight. B.F., P.M. and D.G.: Consed multiple sequence viewer and W.D.S.: statistical analysis. P.W., M.H.T., T.A.G., J.F.D. and D.C.L.: study development/programming. M.D.M.: auto-analysis and manual review of design, execution and analysis. T.J.L., E.R.M., D.D., D.L., L.W.H., P.W., M.H.T., validation data. K.C.: copy number analysis, variant detection algorithm D.C.L., T.A.G., J.F.D. and R.K.W.: manuscript preparation. development. D.C.K.: indel detection algorithm development. K.C. and L.W.H.: indel detection. D.D.: IT and data management, data analysis automation leader. Author Information The high-quality sequence variants have been deposited in the B.H.D.-S.: variant detection algorithm development. S.M. and M.T.: library dbGaP database (http://www.ncbi.nlm.nih.gov/sites/entrez?Db5gap) under the optimization and construction. L.C.: data generation scheduling and oversight. R.A. accession number phs000159.v1.p1. Reprints and permissions information is and T.M.: variant validation assays. X.S.: variant annotation pipeline development. available at www.nature.com/reprints. This paper is distributed under the terms of D.E.L.: variant annotation. J.R.O.: variant data management and pfam analysis. the Creative Commons Attribution-Non-Commercial-Share Alike licence, and is A.H.: validation assay design. C.P.: LIMS (Laboratory Information Management freely available to all readers at www.nature.com/nature. Correspondence and System) oversight. S.A.: LIMS trouble shooting/facilitation of variant detection. requests for materials should be addressed to E.R.M. (emardis@wustl.edu). Macmillan Publishers Limited. All rights reserved © 2008

Journal

NatureSpringer Journals

Published: Nov 6, 2008

There are no references for this article.