Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Molecular Markers for Sweet Sorghum Based on Microarray Expression Data

Molecular Markers for Sweet Sorghum Based on Microarray Expression Data Rice (2009) 2:129–142 DOI 10.1007/s12284-009-9029-8 Molecular Markers for Sweet Sorghum Based on Microarray Expression Data Martín Calviño & Mihai Miclaus & Rémy Bruggmann & Joachim Messing Received: 7 May 2009 /Accepted: 22 July 2009 /Published online: 15 August 2009 Springer Science + Business Media, LLC 2009 Abstract Using an Affymetrix sugarcane genechip, we (Varshney et al. 2005). Single-nucleotide polymorphisms previously identified 154 genes differentially expressed (SNPs) have become the marker of choice because of their between grain and sweet sorghum. Although many of these abundance and uniform distribution throughout the ge- genes have functions related to sugar and cell wall nome (Gupta et al. 2008;Varshney etal. 2005;Zhu and metabolism, dissection of the trait requires genetic analysis. Salmeron 2007). Around 90% of the genetic variation in Therefore, it would be advantageous to use microarray data any organism is attributed to SNPs (Varshney et al. 2005; for generation of genetic markers, shown in other species as Zhu and Salmeron 2007). They are discovered from single-feature polymorphisms (SFPs). As a test case, we genomic or expressed sequence tag sequences available used the GeSNP software to screen for SFPs between grain in databases or through sequencing of candidate genes, and sweet sorghum. Based on this screen, out of 58 PCR products, or even whole genomes (Varshney et al. candidate genes, 30 had single-nucleotide polymorphisms 2005;Zhu andSalmeron 2007). (SNPs) from which 19 had validated SFPs. The degree of Recent studies have described the use of transcript nucleotide polymorphism found between grain and sweet abundance data from RNA hybridizations to Affymetrix sorghum was in the order of one SNP per 248 base pairs, microarrays to discover genetic polymorphisms that can be with chromosome 8 being highly polymorphic. Indeed, utilized as markers for genotyping in mapping populations molecular markers could be developed for a third of the (Borevitz and Chory 2004; Gupta et al. 2008; Hazen and candidate genes, giving us a high rate of return by this Kay 2003; Shiu and Borevitz 2008; Zhu and Salmeron method. 2007). In an Affymetrix chip, each gene is represented by 11 different 25-bp oligonucleotides that cover features of Keywords Microarray analysis Single-feature the transcribed region of that gene (exons and 3′ untrans- polymorphism (SFP) Single-nucleotide polymorphism lated regions). Each of these features is described as a . . . . (SNP) Stem sugar Biofuel Sweet sorghum Sugarcane perfect match (PM) and mismatch (MM) oligonucleotide. The PM exactly matches the sequence of a standard genotype, whereas the MM differs from the PM by a single Introduction base substitution at the central, 13th position (Borevitz and Chory 2004; Hazen and Kay 2003; Zhu and Salmeron The development of molecular markers is essential for 2007). marker-assisted selection in plant breeding as well as to A new aspect of this approach is to discover sequence understand crop domestication and plant evolution polymorphisms in cultivars or variants of species, where one of them has been sequenced but where no sequence information is yet available from the other ones. Here, the : : : M. Calviño M. Miclaus R. Bruggmann J. Messing (*) hybridization data from microarrays not only measure Waksman Institute of Microbiology, Rutgers University, differential gene expression but also can yield information 190 Frelinghuysen Road, on sequence variation between two inbred lines. If two Piscataway, NJ 08854-8020, USA genotypes differ only in the amount of mRNA in a e-mail: messing@waksman.rutgers.edu 130 Rice (2009) 2:129–142 particular tissue, this should result in a relatively constant bioethanol per acreage than sugarcane, requires high input difference in hybridization throughout the 11 features. On costs, and is a major food and feed source. A crop that the other hand, if the two genotypes contain a genetic bridges between the two is the close relative, sorghum. polymorphism within a gene that coincides with one of Sorghum tolerates harsher environmental conditions than the particular features, this will produce differential sugarcane and maize, has a higher disease resistance than hybridization for that single feature. Such differences maize, and has a high stem sugar variant, sweet sorghum, have been described as single-feature polymorphisms which has potential yields of bioethanol like sugarcane. (SFPs) (Borevitz and Chory 2004;Borevitzetal. 2003; Moreover, sweet sorghum can be crossed with grain Hazen and Kay 2003;Zhu andSalmeron 2007). Thus, sorghum so that genetic analysis could uncover key expression microarrays hybridized with RNA are able to regulatory factors that would increase sugar and decrease provide us not only with phenotypic (variation in gene lignocellulose in the biomass. Therefore, sorghum could be expression) but also with genotypic (marker) data (Zhu used to identify both SFPs and ELPs linked to high sugar and Salmeron 2007). If two genotypes differ in the content. expression level of a particular gene, we can consider it We have recently reported the hybridization of RNAs as an expression level polymorphism or (ELP). Both ELPs derived from the stems of grain and sweet sorghum onto the and SFPs are dominant markers and can be mapped as sugarcane Affymetrix genechip (Calviño et al. 2008). A alleles in segregating populations (genetical genomics), previous study demonstrated that cross-species hybridiza- and ELPs can be considered as traits to determine tion did not affect the reproducibility of the microarray expression quantitative trait loci (eQTLs) (Coram et al. experiment (Cáceres et al. 2003). Moreover, an Affymetrix 2008;Jansenand Nap 2001). soybean genome array has been used to identify SFPs in the In Arabidopsis, SFPs have been used for several closely related species cowpea (Das et al. 2008). purposes such as mapping clock mutations through bulked Here, we have asked the question whether we could use segregant analysis (Hazen et al. 2005), the identification of the sugarcane chip analysis to extend the cross-species genes for flowering QTLs (Werner et al. 2005), high- concept in SFP discovery in the grasses. We report the density haplotyping of recombinant inbred lines (RILs) identification of SFPs in 58 sorghum genes by using the (West et al. 2006), and natural variation in genome-wide recently developed software GeSNP (Greenhall et al. 2007). DNA polymorphism (Borevitz et al. 2007). In plant species These genes were described in our previous study to be of agronomic importance, SFPs have been utilized to differentially expressed between grain and sweet sorghum identify genome-wide molecular markers in barley and rice (Calviño et al. 2008). The utility of GeSNP has been (Kumar et al. 2007; Potokina et al. 2008; Rostoks et al. successfully tested for SFP discovery in mice, humans, and 2005) as well as markers linked to Yr5 stripe rust resistance chimpanzees (Greenhall et al. 2007), but there is no report in wheat (Coram et al. 2008). However, an impediment to on plants yet. In order to experimentally validate the SFPs SFP discovery in crop plants based on DNA hybridization identified in sorghum, we sequenced fragments from 58 to Affymetrix expression arrays could be the size of gene genes and found SNPs in 30 of them, out of which 19 genes families (Borevitz et al. 2003; Varshney et al. 2005; Zhu had a validated SFP. Furthermore, we develop molecular and Salmeron 2007). Because the coding regions of many markers based on the SNPs found. The high experimental gene clusters that arose by tandem gene amplification are validation rate of SNPs of 50% of the candidate genes quite conserved, hybridization-based approaches would not shows the potential of this method for the development of be sufficient to distinguish between allelic and paralogous molecular markers and, in principle, the applicability to any copies (Xu and Messing 2008). Therefore, one would have trait of interest. to limit this analysis to low-copy genes. On the other hand, this approach does not aim at identifying candidate genes directly but rather linked genetic markers. Results An area where gene discovery has become of general interest is the utilization of biomass for the production of SFP discovery and validation from differentially expressed alternative fuels. Because desirable traits for biofuel crops genes in sorghum are very complex and involve many genes from different pathways, it becomes necessary to take genetic approaches Previously, we reported the use of an Affymetrix genechip to identify key genes so that molecular breeding can be from sugarcane to identify differentially expressed genes in employed to make performance improvements. The most the stem of grain and sweet sorghum (Calviño et al. 2008). successful biofuel crop today is sugarcane. However, it Such a cross-species hybridization (CSH) approach allowed cannot be grown in moderate climate. Maize, which is a us to identify 154 genes harboring expression level poly- major biofuel crop in the USA, has a much lower yield of morphisms between grain and sweet sorghum. In order to Rice (2009) 2:129–142 131 Table 1 Sorghum Genes with SFPs Predicted by the GeSNP discover single-feature polymorphisms within these genes Software as well, we uploaded the sugarcane Affymetrix CEL files previously obtained into the GeSNP software. Indeed, we Gene ID #SFPs #Validated SFPs #SNPs Sequence length found that, from 154 genes, 57 harbored a SFP with a t Ch1 value ≥7 (Fig. 1 and Table 1). Based on existing data Sb01g005770 1 0 0 378 (Greenhall et al. 2007), we adopted a t value of 7 or higher Sb01g049890 1 1 2 401 as a threshold. Chromosomes 1, 2, and 3 had the highest Sb01g002050 1 0 0 429 number of genes displaying both ELPs and SFPs, whereas Sb01g033060 1 0 0 429 chromosomes 5 and 6 had the lowest number of ELPs and Sb01g013710 3 0 2 214 SFPs, respectively (Fig. 1). Sb01g043060 2 0 4 418 In order to validate the SFPs discovered and calculate Sb01g046550 2 0 0 318 the SFP discovery rate (SDR) of the GeSNP software, we cloned and sequenced the fragments from 57 genes Sb01g003700 1 0 0 455 Sb01g011740 1 0 0 233 harboring both ELPs and SFPs in addition to one gene harboring only SFPs (see below) from sweet sorghum Rio Sb01g006220 1 0 0 292 and aligned the sequences against the BTx623 reference Sb01g009520 2 0 0 404 genome. The software predicted a total of 125 SFPs (on Sb01g016110 5 0 0 397 average ∼2 per gene), and we could experimentally validate Sb01g044810 6 0 5 502 32 of them (Table 1). We calculated the SDR as 25.6% Ch2 ð SDR ¼½ ValidatedSFPs = TotalSFPs 100Þ. As expected, Sb02g006330 2 1 2 191 the SDR was dependent on the t value, with the lowest Sb02g000780 1 1 2 273 SDR (less than 10%) at t values between 7 and 10 and the Sb02g005440 1 0 0 464 highest SDR (80%) with t values from 22 to 25, respectively Sb02g036870 2 0 0 225 (Fig. 2a). Sb02g022510 1 0 0 552 Besides SFPs identified in genes that are differentially Sb02g006420 4 2 5 731 expressed, the GeSNP software also detected SFPs in Sb02g009980 3 2 2 363 genes that did not show differential expression under our Sb02g032470 2 0 1 438 experimental conditions (data not shown). Considering Ch3 the high success rate of SNPs discovered in genes having Sb03g039090 6 4 2 405 both SFPs and ELPs, we extended our screen to genes Sb03g037370 1 1 2 311 that have predicted SFPs with t values of 22 to 25 but no Sb03g009900 2 0 0 517 ELP. This analysis allowed us to identify 35 sugarcane Sb03g037360 2 0 0 400 probe pairs that matched the sorghum genome sequence Sb03g013840 4 0 0 139 Sb03g012420 3 2 1 144 Sb03g007840 1 0 2 355 Sb03g037870 6 0 0 333 Sb03g045390 1 0 0 558 Sb03g027710 1 0 1 341 Sb03g003190 2 0 0 454 Ch4 Sb04g028300 1 0 0 494 Sb04g027910 2 0 0 485 Sb04g021610 1 0 0 209 Sb04g037170 1 1 2 346 Sb04g019020 8 3 6 235 Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Ch9 Ch10 Sb04g005210 1 1 1 236 Sorghum chromosomes Ch5 ELPs SFPs t-value ≥7 Validated SFPs Sb05g001680 2 1 3 153 Ch6 Fig. 1 Histogram showing the proportion of ELPs and SFPs between BTx623 and Rio for each sorghum chromosome. The number of genes Sb06g015180 2 0 3 314 with ELPs previously reported by Calviño et al. 2008 were plotted for Sb06g026710 1 0 0 277 each chromosome along with the number of SFPs found in this study. Sb06g029500 2 0 0 486 Only SFPs with t values ≥7 were taken into consideration. Frequency 132 Rice (2009) 2:129–142 Table 1 (continued) not counted), 19 of them recognized a SNP between the sixth and the 13th positions. Gene ID #SFPs #Validated SFPs #SNPs Sequence length With regard to genes involved in our traits of interest, Ch7 that is, sugar accumulation and cell wall metabolism, we Sb07g001320 7 0 0 473 validated SFPs for five of them (Figs. 5 and 3). The Sb07g005930 1 1 2 436 SFPs in the cellulose synthase 1 and dolichyl-diphospho- Ch8 oligosaccharide genes was based on a SNP, whereas the Sb08g008320 1 1 7 447 SFP in the LysM gene was due to a 13-bp indel (Fig. 5a, b). Sb08g016302 1 0 3 268 This indel allowed us to develop an allele-specific PCR Sb08g020760 1 0 3 488 marker (Fig. 5d). In the case of the 4-coumarate coenzyme A Sb08g015010 4 0 0 484 ligase gene, the SFP was based on a mis-spliced intron in Sb08g002250 6 5 4 316 Rio (Fig. 5c). Sb08g002660 1 0 0 345 To calculate the number of SNPs per total sequence Ch9 length, we determined the genome size of the Rio line by Sb09g000820 1 1 2 394 flow cytometry. The Rio line appeared to have the same Sb09g023620 1 0 0 434 genome size than the sequenced BTx623 (data not shown). Based on 87 SNPs in 21,612 bp of sequence Sb09g006050 2 2 3 268 from both parental lines, we concluded that there is an Sb09g005280 2 1 1 527 average of one SNP every 248 base pairs of sequence Sb09g029170 1 0 10 406 between BTx623 and Rio. Taking in consideration that Ch10 the genome size is in the order of 730 Mbp (Paterson et Sb10g002230 1 0 2 398 al. 2009), we suggest that 2,938,800 SNPs could exist Sb10g007380 1 1 2 374 between grain sorghum BTx623 and sweet sorghum Rio Sb10g004540 1 0 0 255 Total 125 32 87 21,612 SFPs with t values ≥7 SFP Discovery Rate (SDR) vs t-value and have a high probability of representing SNPs in genes that have no ELPs between BTx623 and Rio but were expressed in the stem (see Table 2). For example, one of the sugarcane probe pairs (Sof.3814.1.S1_at) 50 matched a sorghum gene coding for fructose bispho- spate aldolase. Since the protein product of this gene has a role in the sucrose and starch metabolic pathway (our trait of interest), we cloned and sequenced the 7..10 11..14 15..18 19..21 22..25 >25 fragment containing the SFPs. As it is shown in Fig. 3, t-value we found six SNPs, two of which were recognized by Frequency distribution of t-values for validated SFPs three sugarcane probe pairs. This result indicates that our approach is able to efficiently detect SNPs. From the 58 genes that were sequenced, 19 genes (∼33%) had a validated SFP, and 11 genes (19%) harbored SNPs outside the probe pairs at different location than the one predicted by GeSNP. Therefore, the total SNP detection rate was ∼52%. A list of genes with validated SFPs as well as the nature of the nucleotide change/s is provided in Table 3. 7..10 11..14 15..18 19..21 22..25 >25 Most of the validated SFPs had probe pairs with t values t-value from 15 to 18 and greater than 25 (Fig. 2b). Since the SFP validation depends on the SNP position along the probe Fig. 2 The SFP discovery rate of GeSNP is dependent on the t value. The percentage of SFPs in sorghum genes that were validated through pair (Rostoks et al. 2005), we analyzed the SNP position sequencing (and thus represented true SNPs between BTx623 and from the edge of the sugarcane probe pair for those genes Rio) was plotted against their respective t values (a). For the validated with validated SFPs (Fig. 4). We found that, from a total of SFPs, we calculated the frequency distribution of their respective t values (b). 22 probe pairs (probes that recognized the same SNP were #Validated SFPs SDR % Rice (2009) 2:129–142 133 Table 2 Sugarcane Probe Pairs with t Values of 22–25 That Identify Sorghum Transcripts with SFPs but not ELPs Sugarcane probe set Probe pair # Sorghum bicolor ID Position Function t value=22 Sof.4093.2.S1_at 6 NGH Ch1_8313833..8313816 Sof.4567.1.S1_at 8 Sb01g044810 Ch1_67980922..67980946 MADS-box transcription factor Sof.5184.2.S1_a_at 6 Sb03g001160 Ch3_991187..991163 Similar to Os02g0294700 protein SofAffx.1284.1.S1_s_at 3 Sb03g008870 Ch3_9656668..9656644 Unknown Sof.5348.1.S1_at 11 Sb03g003510 Ch3_3731533..3731509 Ubiquitin-conjugating enzyme E2 Sof.2770.1.S1_at 4 Sb03g041770 Ch3_69253777..69253759 Unknown Sof.3851.1.S1_at 10 Sb05g004130 Ch5_4878250..4878268 60S ribosomal protein L3 Sof.2692.1.S1_at 5 Sb08g002250 Ch8_2360780..2360756 Cytochrome P450 Sof.4985.2.S1_a_at 10 Sb08g018480 Ch8_48581627..48581646 ATP-citrate synthase SofAffx.1129.1.S1_at 2 Sb08g021850 Ch8_53598165..53598144 Serine/threonine protein phosphatase SofAffx.1129.1.S1_at 9 Sb08g021850 Ch8_53598029..53598005 Serine/threonine protein phosphatase Sof.4246.1.S1_a_at 11 Sb09g005270 Ch9_6772194..6772216 Unknown t value=23 Sof.2535.1.A1_at 6 Sb02g011130 Ch2_18051363..18051363 Similar to putative RES protein Sof.1282.2.S1_a_at 11 NGH Ch2_57946767..57946743 Sof.1664.2.S1_a_at 1 Sb03g033760 Ch3_62018464..62018488 Putative BURP domain-containing protein SofAffx.1284.1.S1_x_at 2 Sb03g008870 Ch3_9656190..9656166 Unknown Sof.497.2.S1_at 7 Sb07g027480 Ch7_62509159..62509135 3-Hydroxy-3-methylglutaryl-coA reductase Sof.1190.1.S1_at 8 Sb07g005930 Ch7_8393958..8393934 Unknown Sof.2692.1.S1_at 6 Sb08g002250 Ch8_2360760..2360736 Cytochrome P450 Sof.355.1.S1_at 8 Sb09g005570 Ch9_7345144..7345120 Heat shock protein t value=24 Sof.4310.1.S1_at 3 Sb01g028500 Ch1_49703504..49703480 Senescence-associated protein like Sof.4030.1.A1_at 10 Sb02g003450 Ch2_3915697..3915680 Similar to B0616E02-H0507E05.5 protein Sof.4972.1.S1_a_at 9 NGH Ch3_17046891..17046867 Sof.1835.1.S1_at 3 Sb03g033140 Ch3_61527980..61527956 Putative nuclear RNA binding protein A Sof.1003.1.S1_at 2 Sb05g002580 Ch5_2717665..2717641 Cytochrome P450 Sof.1694.1.A1_at 9 Sb06g033460 Ch6_61437575..61437596 Similar to H0913C04.1 protein Sof.3020.2.A1_at 4 Sb09g002960 Ch9_3216665..3216682 Aspartic proteinase t value=25 Sof.2803.1.S1_at 11 Sb01g043050 Ch1_66375993..66375971 Unknown Sof.1537.1.S1_at 7 Sb03g011270 Ch3_12484656..12484632 Mg-protoporphyrin IX monomethyl ester cyclase Sof.2992.1.A1_at 6 Sb04g037920 Ch4_67480989..67481008 Similar to Os04g0137500 Sof.1443.1.S1_at 7 Sb04g010990 Ch4_15758311..15758334 Unknown Sof.3814.1.S1_at 11 Sb04g019020 Ch4_44439307..44439289 Fructose bisphosphate aldolase Sof.3699.1.A1_at 4 Sb07g005850 Ch7_8311400..8311376 Equilibrative nucleoside transporter 1 Sof.2286.1.A1_at 2 Sb09g025350 Ch9_54815478..54815502 Similar to Os05g051300 Sof.1994.1.S1_x_at 7 Sb10g005375 Ch10_4802664..4802640 NGH Non-genic hit and that at least 0.4% of the genome could be poly- (4 SNPs/Kbp) (Fig. 6a). However, if we consider the morphic between the two lines. We also looked at the frequency of probe pairs with t values between 22 and 25 SNP density per sorghum chromosome in order to see if for each sorghum chromosome as it is shown in Fig. 6b, there is any difference among them. Surprisingly, we chromosome 3 had the highest number of probes. On the found that the level of polymorphism is higher for other hand, chromosome 8 had the second highest number chromosomes 8 and 9 and lower for chromosome 3 of probes with t values between 22 and 25 together with a compared to the average SNP density per Kb of sequence high SNP density (Fig. 6a, b). This might suggest an 134 Rice (2009) 2:129–142 Fructose bisphosphate aldolase BTx623 Rio 123456789 10 11 -200 Sugarcane probe pair Sof.3814.1.S1_at Query: Rio #8 Subject: Btx623 Sb04g019020 Ch_4: 44439290..44439522 #9 #11 Fig. 3 SFP validation for fructose bisphosphate aldolase. A fragment and11werevalidated. The blue lines represent the sugarcane probe from the gene fructose bisphosphate aldolase was cloned and sequenced pairs that are identical to either the Rio sequence (probe pairs #8 and #9) from both BTx623 and Rio and SNPs predicted by the probe pairs #8, 9, or identical to the BTx623 sequence (probe pair #11). unusual level of polymorphism for this chromosome for 18 (Table 4). We utilized the Single Nucleotide between BTx623 and Rio. However, we have not Amplified Polymorphism (SNAP) technique to develop sufficient data (genes sequenced) to test whether the markers based on SNPs (Drenkard et al. 2000), as it is SNP density differences among the chromosomes are shown for the gene alanine aminotransferase (Fig. 7). These statistically significant. markers were tested also in other grain and sweet sorghum Sorghum genes harboring validated SFPs allowed us to lines to see whether the SNPs were conserved or not investigate if such nucleotide substitutions were conserved (Table 4). In fact, we found a marker within the gene or not within grain sorghum BTx623, sweet sorghum Rio, Sb09g029170 that distinguished the grain sorghums from and sugarcane. Indeed, we found that from 22 SNPs the sweet sorghums cultivars used in this study. The protein discovered through 29 validated SFPs (one sugarcane probe product encoded by this gene is a putative ketol-acid pair can recognize more than one SNP), 15 of them were reductoisomerase enzyme that is involved in the biosyn- conserved between BTx623 and sugarcane, whereas only thesis of valine, leucine, and isoleucine amino acids (www. phytozome.net/cgi-bin/gbrowse/sorghum/). SNAP markers eight SNPs were conserved between Rio and sugarcane (Table 3). were also developed for the cellulose synthase 1 and dolichyl-diphospho-oligosaccharide genes (Fig. 5d). Development of molecular markers based on validated SFPs It has been suggested that Dale and Della sweet sorghums share a common genetic background (Ritter et The identification of SNPs between BTx623 and Rio al. 2007). In agreement with this, we found that from provided a direct way to develop molecular markers that ten SNAP markers that gave a PCR product in both can be used in mapping populations. From 58 candidate lines, they always represented the same allele (Table 4). genes, we were able to develop allele-specific PCR markers In addition, the sweet sorghum lines Top 76-6 and Simon Avg. scaled PM-MM Rice (2009) 2:129–142 135 Table 3 Nucleotide Change Conservation for Validated SFPs Between BTx623, Rio, and Sugarcane S. bicolor gene Position Sugarcane probe set Probe pair # t value BTx623-Rio-Sc SNP Sb02g006330 Ch2_7909203..7909180 Sof.1519.2.S1_at 8 23 C–T–C Sb02g000780 Ch2_628587..628568 Sof.1326.1.S1_a_at 5 15.2 A–G–G Sb02g006420 Ch2_8048752..8048728 Sof.2471.1.S1_at 5 34.1 C–A–C Ch2_8048741..8048717 6 19.8 Same Sb02g009980 Ch2_14533601..14533625 SofAffx.868.1.S1_s_at 9 13.7 A–T–A/C–T–C Ch2_14533610..14533630 10 12.9 Same Sb03g037370 Ch3_65336537..65336560 SofAffx.772.1.S1_s_at 7 19.1 C–G–C Sb03g012420 Ch3_14371043..14371019 Sof.2629.3.S1_a_at 8 38.2 C–T–C Ch3_14371036..14371016 9 19.4 Same Sb03g039090 Ch3_66876720..66876744 Sof.5269.1.S1_at 6 8.1 T–A–T/C–A–C Ch3_66876724..66876748 7 12 Same Ch3_66876727..66876751 8 17.1 Same Ch3_66876730..66876754 9 16.1 Same Ch3_66876734..66876758 10 45.8 Same Sb04g019020 Ch4_44439369..44439345 Sof.3814.1.S1_at 8 21.9 C–T–T Ch4_44439366..44439342 9 15.3 Same Ch4_44439307..44439289 11 25.5 T–G–T Sb04g037170 Ch4_66851287..66851311 Sof.151.1.S1_at 8 19.4 G–C–G Sb05g001680 Ch5_1816812..1816788 Sof.1902.1.S1_s_at 6 33.1 A–G–G Sb07g005930 Ch7_8393958..8393934 Sof.1190.1.S1_at 8 23.3 T–G–T Sb08g008320 Ch8_15917006..15917030 SofAffx.1412.1.A1_s_at 2 15.1 T–C–C Sb08g002250 Ch8_2360967..2360943 Sof.2692.1.S1_at 2 16.8 A–G–A Ch8_2360780..2360756 5 22.1 A–G–G Ch8_2360760..2360736 6 23.6 T–C–C Sb09g006050 Ch9_8732113..8732094 SofAffx.1438.1.A1_s_at 3 14.9 C–G–C Ch9_8732054..8732030 7 82.5 C–A–C Sb09g000820 Ch9_624173..624197 Sof.808.1.S1_at 8 29 G–C–G Sb09g005280 Ch9_6782917..6782941 Sof.5033.1.S1_at 9 15.1 A–G–G Sb10g007380 Ch10_7220153..7220177 SofAffx.287.1.S1_at 7 14 T–C–C Same means that a different probe pair recognizes the same SNP Sc Sugarcane SNP position on validated SFPs 4.5 have been identified as attractive contrasting pairs for mapping purposes based on their difference not only in 3.5 genetic distance (D) but also in sugar content (measured as Brix degree) (Ali et al. 2008). In our work, we identified 2.5 six SNAP markers within the genes Sb01g044810, Sb03g027710, Sb04g0037170, Sb08 g008320, 1.5 Sb09g006050, and Sb10g002230, respectively, which were polymorphic between Top 76-6 and Simon. These 0.5 markers will be useful for mapping purposes when these 123456789 10 11 12 13 lines are used as parents. SNP position from the edge of the sugarcane probe pair Fig. 4 The position of the SNP along the 25mer in the probe pair influences the SFP validation. The position of the SNP from the edge Discussion of the sugarcane probe pair was scored for each validated SFP. Most of the SNPs locate within positions 6 and 13 along the 25mer. If two A significant proportion of the phenotypic variation in or more SNPs were located on a single probe pair, their positions any organism can be attributed to polymorphisms at the along the 25mer were not counted and thus not included in the graph. 136 Rice (2009) 2:129–142 Cellulose synthase 1 Sb09g005280 LysM Sb01g049890 BTx623 800 BTx623 Rio Rio 0 200 123 45 6789 10 11 -200 123456789 10 11 Sugarcane probe pair Sof.3731.1.A1_at Sugarcane probe pair Sof.5033.1.S1_at 4-Coumarate coenzyme A ligase Sb04g005210 Dolichyl-diphospho-oligosaccharide Sb02g006330 BTx623 123 45 6789 10 11 -1000 Rio -2000 -3000 -4000 -5000 -6000 BTx623 -7000 Rio -8000 123456789 10 11 Sugarcane probe pair Sof.4734.1.S1_at Sugarcane probe pair Sof.1519.2.S1_at Fig. 5 GeSNP prediction of SFPs in sorghum genes related to biofuel and Rio, respectively (b). In Rio, the third intron of the gene 4- traits. The hybridization intensity between the perfect match and the coumarate coenzyme A ligase is mis-spliced and detected in the mismatch oligonucleotides was averaged and scaled (GeSNP software sugarcane probe pair #2 (c). Molecular markers for the genes lysM, output) and plotted against each sugarcane probe pair. Graphs are cellulose synthase 1, and dolichyl-diphospho-oligosaccharide were shown for four genes related to biofuel traits that have SFPs with t generated based on allele-specific PCR (d). In the case of lysM, a values of ≥7 and that were previously reported to be differentially primer spanning the 13-bp deletion in BTx623 was used to selectively expressed between grain sorghum BTx623 and sweet sorghum Rio amplify the allele from Rio. In the case of cellulose synthase 1 and (a). The SFP present in lysM identified a 13-bp indel, whereas the dolichyl-diphospho-oligosaccharide, primer pairs specific for the SNP SFPs present in cellulose synthase 1 and dolichyl-disphospho- in question were generated by the WebSNAPER software and tested oligosaccharide identified an A/G and G/A SNP between BTx623 empirically. DNA level. Thus, these DNA polymorphisms can be probe pair that is detected by the difference in hybridiza- used for genotyping, molecular mapping, and marker- tion affinity (Borevitz et al. 2003). In addition, SFPs assisted selection applications. The association of a present in a transcribed gene may be the underlying cause particular trait of interest with a DNA polymorphism is of the difference in a phenotype of interest. In most of the essential for breeding purposes. Microarrays have been cases, SNPs are the cause of SFPs as have been used to identify abundant DNA polymorphisms through- demonstrated by sequence analysis (Borevitz et al. 2003; out the genome (Gupta et al. 2008; Hazen and Kay Rostoks et al. 2005). 2003). In particular, ELPs and SFPs can be identified from Here, the goal was to identify SFPs from an Affymetrix RNA hybridization studies. SFPs are detected by oligo- sugarcane genechip dataset of closely related species nucleotide arrays and represent DNA polymorphisms (Calviño et al. 2008). The Affymetrix sugarcane genechip between genotypes within an individual oligonucleotide was used to survey the SFPs with the GeSNP software Avg. scaled PM-MM Avg. scaled PM-MM Rice (2009) 2:129–142 137 b d LysM A-specific primer Cellulose synthase 1 G-specific primer G-specific primer Dolichyl-diphospho-oligosaccharide A-specific primer 4-Coumarate coenzyme A ligase Sb04g005210 Ch4_5069094..5069216 Rio mis-spliced exon / intron CGCCGCCGTCGTGTCGTAAGTTGCTCATCGATACCGCCACAGCGCAGCCTGCGCGCTGCCAGTTTCTTAGGTCAACTGAATTCTGAA Probe pair #2 intron / exon AAACTTCTCCGTCTCTAACCTCAGAATGAAGGA Probe pair #2 Fig. 5 (continued). between two sorghum cultivars that differ in the accumu- could be due to the cross-hybridization of paralogous gene lation of fermentable sugars in their stems, with the targets to individual probes, which may affect the specificity objective to develop genetic markers for mapping purposes. of the SFP calling. This problem would also arise from using This is the first report to our knowledge of the use of next-generation sequencing for SNP detection. Nevertheless, GeSNP to identify SFPs within closely related grass species we could show that the use of expression analysis in and the development of molecular markers based on conjunction with GeSNP is an efficient and inexpensive validated SFPs. way to develop new molecular markers. The sugarcane probe pairs with t values between 22 and We cloned and sequenced gene fragments harboring SFPs with t values equal or higher than 7 from 58 sweet 25 had the highest SDR (80%) found in our study. One of sorghum genes comprising 125 SFPs in total. In this study, these probe pair sets matched a sorghum gene coding for we found a SFP discovery rate of 25.6%, which is sufficient fructose bisphosphate aldolase (cytoplasmic isozyme) and for most applications. Still, there are several possibilities to the identified SFP was confirmed through DNA sequence increase the SDR. First, the number of biological replicates analysis (Fig. 3). This gene codes for a glycolytic enzyme suggested for using the GeSNP software is 4 or more. In that catalyzes the cleavage of fructose 1,6 bisphosphate to contrast, we had only three replicates for both grain and glyceraldehyde 3-phosphate and dihydroxyacetone phos- sweet sorghum. Second, the cross-species hybridization of phate (Tsutsumi et al. 1994). sorghum RNAs to probe sets of the sugarcane array is not as One third (33%) of the 58 genes that we have sensitive as intra species hybridization. Third, false positives sequenced have a validated SFP. In addition, we could BTx623 BTx623 BTx623 Rio Rio Rio Heilong Heilong Heilong IS 9738C IS 9738C IS 9738C SC1063C SC1063C SC1063C Dale Dale Dale Della Della Della M81-E M81-E M81-E Top 76-6 Top 76-6 Simon Top 76-6 Simon Simon 138 Rice (2009) 2:129–142 SNP density per sorghum chromosome a detect SNPs in 19% of all sequenced genes at a different position than indicated by GeSNP. This is attributable to the fact that the probe pair set does only cover a part of the gene, which implies that any SNP outside this region is not reported by GeSNP. We estimated the average SNP density between BTx623 and Rio to one SNP every 248 bp. This is probably an underestimation because the sugarcane probe sets were designed from genic regions and are, therefore, more conserved than other regions in the genome. Ch1 Ch2 Ch3 Ch4 Ch8 Ch9 Although the sorghum chromosomes 1, 2, and 3 had the highest numbers for both ELPs and SFPs, chromo- Sugarcane probe pairs with t-values 22..25 that match genes in sorghum chromosomes somes 8 and 9 were the most polymorphic ones, measured as the number of SNPs per Kb sequence (Figs. 1 and 6). Our data are in agreement with a previous report by Ritter et al. (2007) in which amplified fragment-length polymor- phism markers on chromosome 8 could unambiguously distinguish grain from sweet sorghum lines (Ritter et al. 2007). Furthermore, sugar content QTLs have been located in this chromosome with a RIL derived from a dwarf derivative of Rio as one of the parents. In addition, we found that a marker within the gene Sb09g029170 Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Ch9 Ch10 coding for a putative ketol-acid reductoisomerase could discriminate the grain sorghums from the sweet sorghum Fig. 6 SNP density per sorghum chromosomes. The number of SNPs per kb of sequence was calculated based on the number of genes lines used in this study (Table 4). This enzyme is the sequenced belonging to a given chromosome. Only those chromo- second in the biosynthesis of branched amino acids valine, somes with five or more genes sequenced are represented (a). leucine, and isoleucine (Leung and Guddat 2009). When Frequency distribution along sorghum chromosomes of sugarcane the SNPs found through validated SFPs were compared probe pairs with t values between 22 and 25 (b). between BTx623, Rio, and sugarcane, we found that SNPs Table 4 Primer Sequences of SNAP Markers within Sorghum Genes S. bicolor gene ID Allele WebSNAPER primer sequence PCR product size (bp) Allele presence Sb01g043060 T F: GTAATATACTGACGCCAAAAGAGGCGGATT 306 BT R: TCAACTGCTGTTGTCGAGGACATTGG A F: TGTAATATACTGACGCCAAAAGAGGCGACTT 307 Ri-Top R: TCAACTGCTGTTGTCGAGGACATTGG Sb01g044810 C F: CAATCCTGCTCCCCAATCCAGACC 334 BT-Da-De-Sim R: GATTACGAGATCAGCGGTCTGGAAAGAAA T F: GCAATCCTGCTCCCCAATCCAGACT 335 Ri-He-IS-SC-M81 R: GATTACGAGATCAGCGGTCTGGAAAGAAA Top Sb02g000780 A F: TGGAGCAATACGAGGGCTACTCCAAA 118 BT R: AATCTTCAGAAACGCTCCATTTGTGCTG G F: TGGAGCAATACGAGGGCTACTCCATG 118 Ri-He-IS-SC-Da-De R: AATCTTCAGAAACGCTCCATTTGTGCTG M81-Top-Sim Sb02g006330 G F: TGTGGTACAGGTACACAAGCGAGAACATG 115 BT-IS-Da-De-M81 R: CCTTACAGGCATAACGAGTATGAGAGATTCATAACA A F: CTTATTTGTGGTACAGGTACACAAGCGAGAATAAA 121 Ri-Top-Sim R: CCTTACAGGCATAACGAGTATGAGAGATTCATAACA Sb03g012420 C F: GAAGCATTCTTTCCGATACAATATGGCCTATC 164 BT-He-SC-M81-Top R: TTCGATTAAAGGATTGTTGATGAAACTAGGGG Sim T F: GAAGCATTCTTTCCGATACAATATGGCCTACT 164 Ri-IS-Da SNP/Kb sequence # Sugarcane probe pairs Rice (2009) 2:129–142 139 Table 4 (continued) S. bicolor gene ID Allele WebSNAPER primer sequence PCR product size (bp) Allele presence R: TTCGATTAAAGGATTGTTGATGAAACTAGGGG Sb03g007840 C F: CCATAAATGTCATTGTGGAGACATCCGTTC 161 BT-He-IS-SC-M81 R: TGGAACGTCAAAACATTGACCGGAA Top T F: AAATGTCATTGTGGAGACATCCGGGT 157 Ri-Da-Sim R: TGGAACGTCAAAACATTGACCGGAA Sb03g027710 T F: GGTCATCGGTGATGGTGGAGAACCT 343 BT R: GGGAATTCGATTATGTCCATCACACCC G F: AGGTCATCGGTGATGGTGGAGATCTG 344 Ri-Da-Sim R: GGGAATTCGATTATGTCCATCACACCC Sb03g039090 C F: CGAACCCAACAACCTGTAACAATAAGCACTAC 326 BT-Da-De-Top-Sim R: GGAATTCGATTATCTCGGGGCTCATCTAC A F: GAACCCAACAACCTGTAACAATAAGCAGAAA 325 Ri-M81 R: GGAATTCGATTATCTCGGGGCTCATCTAC Sb04g0037170 G F: CACAAGCGACTTGAAACTGCGCTG 131 BT-IS-SC-Top R: GGCTTGACAACTGCTTCAACCTCTGC C F: CACAAGCGACTTGAAACTGCACCC 131 Ri-He-Da-De-M81 R: GGCTTGACAACTGCTTCAACCTCTGC Sim Sb07g005930 T F: CAGTTCTCCAATCCTTTCCTCTGTGGTCT 146 BT-He-SC-Da-M81 R: GTGAGAAGCGTGGGATGCTCATCAG G F: GTTCTCCAATCCTTTCCTCTGTGGTCG 144 Ri-IS-Top-Sim R: GTGAGAAGCGTGGGATGCTCATCAG Sb08g020760 C F: CAGAGGAAGCCCTTACACAGATCCGAC 1,400 BT-M81 R: TACCCACAGGTCTGGAAAGGGCAAG T F: CAGAGGAAGCCCTTACACAGATCCGAT 416 Ri-He-IS-SC-Top R: TACCCACAGGTCTGGAAAGGGCAAG Sim Sb08g008320 T F: GCAGTGGAAGGACATCATTGCCCAT 174 BT-He-Da-M81-Sim R: CTCTTCCGGGACGCGACGTTC C F: CAGTGGAAGGACATCATTGCCGTC 173 Ri-IS-SC-Top R: CTCTTCCGGGACGCGACGTTC Sb09g005280 A F: GCAGCACCGTCACCGGCACTA 142 BT R: GAGGCTCAATCAAGATCGTCTGCCC G F: CAGCACCGTCACCGGCATCG 141 Ri-He-IS-SC-Da-De R: GAGGCTCAATCAAGATCGTCTGCCC M81-Top-Sim Sb09g029170 C F: CTACTCTGAGATCATCAACGAGAGCGTGAAC 124 BT-He-SC-IS R: CCTAGATCCCAGGCGAGCCGTC T F: CTACTCTGAGATCATCAACGAGAGCGTGTTT 124 RI-Da-De-M81-Top R: CCTAGATCCCAGGCGAGCCGTC Sim Sb09g000820 G F: TCGAGAGCGATGCCTTCTGACATTG R: CCATATCTCCAGCCATCTTCAATGTTGTG 128 BT-Top A F: CGAGAGCGATGCCTTCTGACAGCA 130 Ri R: CCATATCTCCAGCCATCTTCAATGTTGTG Sb09g006050 C F: ATAGAAGGCAGAATGAACGCTGGAAAGC 105 BT-Top R: GGGCAAGCAGGCCTGGAACTTC A F: AGAAGGCAGAATGAACGCTGGACTGA 103 Ri-He-IS-SC-Da-De R: GGGCAAGCAGGCCTGGAACTTC M81-Sim Sb10g007380 T F: GAACTACAGACATGCACAAGGATAGCAGGTT 561 BT-Top R: ATTGCATTCAGGAAGCTCGCTCGA C F: GAACTACAGACATGCACAAGGATAGCAGAGC 561 Ri-He-IS-SC-Da-De R: ATTGCATTCAGGAAGCTCGCTCGA M81 140 Rice (2009) 2:129–142 Table 4 (continued) S. bicolor gene ID Allele WebSNAPER primer sequence PCR product size (bp) Allele presence Sb10g002230 G F: CTTCAATCCGACAACCAAGTCGCTG 197 BT-He-IS-Top R: CTGGAACTGCAATGCGGCCATT A F: GCTTCAATCCGACAACCAAGTCGCTA 197 Ri-SC-Da-De-M81 R: CTGGAACTGCAATGCGGCCATT Sim BT BTx623, Ri Rio, He Heilong, IS IS 9738C, SC SC 1063C, Da Dale, De Della, M81 M81-E, Top Top76-6, Sim Simon Only the cultivars that gave a PCR product were scored. If a cultivar was heterozygous for a particular allele, it was not scored between BTx623 and sugarcane are twice as high as sequence resources such as Miscanthus and switchgrass, between Rio and sugarcane. further extending the use of microarrays of one species for Allelic genetic diversity among sweet sorghum cultivars related ones. has previously been investigated based on simple sequence repeat markers (Ali et al. 2008). This study described the correlations between allelic diversity and the degree of stem Materials and methods sugar. Indeed, one could envision a simpler approach, using the microarray described here by hybridizing stem-derived Plant material RNAs from these lines to the sugarcane genechip, and identify both ELPs and SFPs for subsequent mapping of The grain sorghum lines Heilong (accession number PI sugar content QTLs. Furthermore, the SNPs identified in our 563518), IS 9738C (PI 595715), and SC 1063C (PI study provided us with the opportunity to develop molecular 595741) were obtained from the National Plant Germplasm markers within genes. So far, there is no report of SNP-based System (NPGS), USDA. The other lines used in this study molecular markers in transcribed genes in sorghum. The were previously described (Calviño et al. 2008). Two-week- SFPs generated from transcriptome studies are also useful for old seedlings were harvested for the extraction of genomic the development of markers in those species that lack DNA. ab A-specific primer Alanine aminotransferase Sb02g000780 BTx623 Forward: TGGAGCAATACGAGGGCTACTCCAAA Rio BTx623 ATTCATGGAGCAATACGAGGGCTACTCCAGAATGTGAACAA Rio ATTCATGGAGCAATACGAGGGCTACTCCAGGATGTGAACAA Forward: TGGAGCAATACGAGGGCTACTCCATG G-specific primer Grain sorghums Sweet sorghums 123456789 10 11 -100 Sugarcane probe pair Sof.1326.1.S1_a_at A G A G Fig. 7 Development of a molecular marker for alanine aminotrans- S1_a_at was validated through sequencing (a). Specific primers for ferase based on SFP discovery and the SNAP technique. The SFP either A or G nucleotides were designed with WebSNAPER (b) and detected by the probe pair #5 in the sugarcane probe set Sof.1326.1. tested through PCR in ten sorghum lines (c). Btx623 IS 9738C Heilong SC 1063C Dale Rio Della Simon M81-E T 7 op 6-6 Avg. scaled PM-MM Rice (2009) 2:129–142 141 SFP discovery and validation from Affymetrix transcript were selected. The primer sequences used to distinguish data SNPs are provided in Table 4. Genomic DNA from 2-week-old seedlings was extracted The microarray analysis for differentially expressed tran- with the PrepEase Genomic DNA Isolation kit from USB. scripts in stems of grain and sweet sorghum with a Several concentrations of genomic DNA were tested, and sugarcane genechip was previously described (Calviño et 50 ng was used for testing the SNAP primer pairs through al. 2008). The CEL files from the microarray work were PCR. The conditions used for PCR reaction were as uploaded into the publicly available GeSNP software at follows: 94°C for 2 min, then 30× [94°C 30 s, 64°C 30 s, http://porifera.ucsd.edu/∼cabney/cgi-bin/geSNP.cgi, and an 72°C 30 min] and a final extension at 72°C for 2 min. excel file was obtained with all the probe sets in the array Acknowledgments The research described in this manuscript was harboring an SFP together with their respective t values. supported by the Selman A. Waksman Chair in Molecular Genetics to The excel file also contained the average hybridization JM and by the sponsorship from the International Institute of Education intensity between the PM and MM probe pairs (average (IIE), and the Fulbright Commission in Uruguay to MC. We thank Wenqin Wang and Todd Michael for their assistance in the measure- scaled PM–MM) as well as their variance values that were ment of BTx623 and Rio genome sizes through flow cytometry. converted to standard deviations. These values were used to generate the graphs displaying differences in hybridization intensity between BTx623 and Rio along the 11 sugarcane References probe pairs for a given probe set. From the transcripts previously described as being differen- Ali M, Rajewski J, Baenziger P, Gill K, Eskridge K, Dweikat I. tially expressed between grain sorghum BTx623 and sweet Assessment of genetic diversity and relationship among a sorghum Rio, we selected those harboring SFPs with t values ≥7 collection of US sweet sorghum germplasm by SSR markers. Mol Breed. 2008;21:497–509. for further validation through sequencing. In total, we Borevitz JO, Chory J. Genomics tools for QTL analysis and gene sequenced gene fragments corresponding to 58 different genes. discovery. Curr Opin Plant Biol. 2004;7:132–6. Total RNA from Rio stem tissue was extracted at the Borevitz JO, Liang D, Plouffe D, Chang HS, Zhu T, Weigel D, et al. time of flowering from three independent plants. RNA Large-scale identification of single-feature polymorphisms in complex genomes. Genome Res. 2003;13:513–23. extraction was performed with the RNeasy Plant Mini Kit Borevitz JO, Hazen SP, Michael TP, Morris GP, Baxter IR, Hu TT, et al. from QIAGEN. cDNA synthesis was performed for each of Genome-wide patterns of single-feature polymorphism in the three samples from 1 μg of total RNA with the Arabidopsis thaliana. Proc Natl Acad Sci USA. 2007;104: SuperScript III First-Strand Synthesis kit from Invitrogen. 12057–62. Cáceres M, Lachuer J, Zapala MA, Redmond JC, Kudo L, Geschwind cDNAs from Rio were pooled respectively and used for the DH, et al. Elevated gene expression levels distinguish human amplification of genes with SFPs. from non-human primate brains. Proc Natl Acad Sci USA. The reverse transcription polymerase chain reaction 2003;100:13030–5. products were checked by agarose gel electrophoresis in Calviño M, Bruggmann R, Messing J. Screen of genes linked to high- sugar content in stems by comparative genomics. Rice. order to verify that a single band amplification product 2008;1:166–76. from each gene was present. The PCR products were Coram TE, Settles ML, Wang M, Chen X. Surveying expression level purified with the QIAquick PCR Purification kit from polymorphism and single-feature polymorphism in near-isogenic Qiagen and cloned into the pGEM-T easy vector from wheat lines differing for the Yr5 stripe rust resistance locus. Theor Appl Genet. 2008;117:401–11. Promega. Twelve clones per gene were sequenced in order Das S, Bhat PR, Sudhakar C, Ehlers JD, Wanamaker S, Roberts PA, et to identify any sequencing or reverse transcriptase errors. al. Detection and validation of single feature polymorphisms in The consensus sequence for each gene was then used to cowpea (Vigna unguiculata L. Walp) using a soybean genome find SNPs between BTx623 and Rio. array. BMC Genomics. 2008;9:107. Drenkard E, Richter BG, Rozen S, Stutius ML, Angell NA, Mindrinos M, et al. A simple procedure for the analysis of single nucleotide Development of molecular markers using WebSNAPER polymorphisms facilitates map-based cloning in Arabidopsis. software Plant Physiol. 2000;124:1483–92. Greenhall JA, Zapala MA, Cáceres M, Libiger O, Barlow C, Schork NJ, et al. Detecting genetic variation in microarray expression Once a SNP was identified between BTx623 and Rio for data. Genome Res. 2007;17:1228–35. a particular gene of interest, the sequence harboring the Gupta PK, Rustgi S, Mir RR. Array-based high-throughput DNA SNP in question was uploaded into the publicly available markers for crop improvement. Heredity. 2008;101:5–18. WebSNAPER software (http://pga.mgh.harvard.edu/cgi- Hazen SP, Borevitz JO, Harmon FG, Pruneda-Paz JL, Schultz TF, Yanovsky MJ, et al. Rapid array mapping of circadian clock and bin/snap3/websnaper3.cgi). TheSNAPprocedurehas developmental mutations in Arabidopsis. Plant Physiol. been previously described (Drenkard et al. 2000). Several 2005;138:990–7. primer pairs per SNP were tested, and the ones that Hazen SP, Kay SA. Gene arrays are not just for measuring gene successfully distinguished the SNP in one line or the other expression. Trends Plant Sci. 2003;8:413–6. 142 Rice (2009) 2:129–142 Jansen RC, Nap JP. Genetical genomics: the added value from Shiu SH, Borevitz JO. The next generation of microarray research: segregation. Trends Genet. 2001;17:388–91. applications in evolutionary and ecological genomics. Heredity. Kumar R, Qiu J, Joshi T, Valliyodan B, Xu D, Nguyen HT. Single 2008;100:141–9. feature polymorphism discovery in rice. PLoS ONE. 2007;2: Tsutsumi K, Kagaya Y, Hidaka S, Suzuki J, Tokairin Y, Hirai T, et al. e284. Structural analysis of the chloroplastic and cytoplasmic aldolase- Leung EW, Guddat LW. Conformational changes in a plant ketol-acid encoding genes implicated the occurrence of multiple loci in rice. reductoisomerase upon Mg(2+) and NADPH binding as revealed Gene. 1994;141:215–20. by two crystal structures. J Mol Biol. 2009. doi:10.1016/j. Varshney RK, Graner A, Sorrells ME. Genomics-assisted breeding for jmb.2009.04.012. crop improvement. Trends Plant Sci. 2005;10:621–30. Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Werner JD, Borevitz JO, Warthmann N, Trainer GT. Quantitative trait Gundlach H, et al. The Sorghum bicolor genome and the locus mapping and DNA array hybridization identify an FLM diversification of grasses. Nature. 2009;457:551–6. deletion as a cause for natural flowering-time variation. Proc Natl Potokina E, Druka A, Luo Z, Wise R, Waugh R, Kearsey M. Gene Acad Sci USA. 2005;102:2460–5. expression quantitative trait locus analysis of 16 000 barley genes West MA, van Leeuwen H, Kozik A, Kliebenstein DJ, Doerge RW, St reveals a complex pattern of genome-wide transcriptional Clair DA, et al. High-density haplotyping with microarray-based regulation. Plant J. 2008;53:90–101. expression and single feature polymorphism markers in Arabi- Ritter KB, McIntyre CL, Godwin ID, Jordan DR, Chapman SC. An dopsis. Genome Res. 2006;16:787–95. assessment of the genetic relationship between sweet and grain Xu JH, Messing J. Organization of the prolamin gene family provides sorghums, within Sorghum bicolor ssp. bicolor (L.) Moench, insight into the evolution of the maize genome and gene using AFLP markers. Euphytica. 2007;157:161–76. duplications in grass species. Proc Natl Acad Sci USA. Rostoks N, Borevitz JO, Hedley PE, Russell J, Mudie S, Morris J, et 2008;105:14330–5. al. Single-feature polymorphism discovery in the barley tran- Zhu T, Salmeron J. High-definition genome profiling for genetic scriptome. Genome Biol. 2005;6:R54. marker discovery. Trends Plant Sci. 2007;12:1360–85. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Rice Springer Journals

Molecular Markers for Sweet Sorghum Based on Microarray Expression Data

Loading next page...
 
/lp/springer-journals/molecular-markers-for-sweet-sorghum-based-on-microarray-expression-baN09VTv0n

References (27)

Publisher
Springer Journals
Copyright
Copyright © 2009 by Springer Science + Business Media, LLC
Subject
Life Sciences; Plant Sciences; Plant Genetics & Genomics; Plant Breeding/Biotechnology; Agriculture; Plant Ecology
ISSN
1939-8425
eISSN
1939-8433
DOI
10.1007/s12284-009-9029-8
Publisher site
See Article on Publisher Site

Abstract

Rice (2009) 2:129–142 DOI 10.1007/s12284-009-9029-8 Molecular Markers for Sweet Sorghum Based on Microarray Expression Data Martín Calviño & Mihai Miclaus & Rémy Bruggmann & Joachim Messing Received: 7 May 2009 /Accepted: 22 July 2009 /Published online: 15 August 2009 Springer Science + Business Media, LLC 2009 Abstract Using an Affymetrix sugarcane genechip, we (Varshney et al. 2005). Single-nucleotide polymorphisms previously identified 154 genes differentially expressed (SNPs) have become the marker of choice because of their between grain and sweet sorghum. Although many of these abundance and uniform distribution throughout the ge- genes have functions related to sugar and cell wall nome (Gupta et al. 2008;Varshney etal. 2005;Zhu and metabolism, dissection of the trait requires genetic analysis. Salmeron 2007). Around 90% of the genetic variation in Therefore, it would be advantageous to use microarray data any organism is attributed to SNPs (Varshney et al. 2005; for generation of genetic markers, shown in other species as Zhu and Salmeron 2007). They are discovered from single-feature polymorphisms (SFPs). As a test case, we genomic or expressed sequence tag sequences available used the GeSNP software to screen for SFPs between grain in databases or through sequencing of candidate genes, and sweet sorghum. Based on this screen, out of 58 PCR products, or even whole genomes (Varshney et al. candidate genes, 30 had single-nucleotide polymorphisms 2005;Zhu andSalmeron 2007). (SNPs) from which 19 had validated SFPs. The degree of Recent studies have described the use of transcript nucleotide polymorphism found between grain and sweet abundance data from RNA hybridizations to Affymetrix sorghum was in the order of one SNP per 248 base pairs, microarrays to discover genetic polymorphisms that can be with chromosome 8 being highly polymorphic. Indeed, utilized as markers for genotyping in mapping populations molecular markers could be developed for a third of the (Borevitz and Chory 2004; Gupta et al. 2008; Hazen and candidate genes, giving us a high rate of return by this Kay 2003; Shiu and Borevitz 2008; Zhu and Salmeron method. 2007). In an Affymetrix chip, each gene is represented by 11 different 25-bp oligonucleotides that cover features of Keywords Microarray analysis Single-feature the transcribed region of that gene (exons and 3′ untrans- polymorphism (SFP) Single-nucleotide polymorphism lated regions). Each of these features is described as a . . . . (SNP) Stem sugar Biofuel Sweet sorghum Sugarcane perfect match (PM) and mismatch (MM) oligonucleotide. The PM exactly matches the sequence of a standard genotype, whereas the MM differs from the PM by a single Introduction base substitution at the central, 13th position (Borevitz and Chory 2004; Hazen and Kay 2003; Zhu and Salmeron The development of molecular markers is essential for 2007). marker-assisted selection in plant breeding as well as to A new aspect of this approach is to discover sequence understand crop domestication and plant evolution polymorphisms in cultivars or variants of species, where one of them has been sequenced but where no sequence information is yet available from the other ones. Here, the : : : M. Calviño M. Miclaus R. Bruggmann J. Messing (*) hybridization data from microarrays not only measure Waksman Institute of Microbiology, Rutgers University, differential gene expression but also can yield information 190 Frelinghuysen Road, on sequence variation between two inbred lines. If two Piscataway, NJ 08854-8020, USA genotypes differ only in the amount of mRNA in a e-mail: messing@waksman.rutgers.edu 130 Rice (2009) 2:129–142 particular tissue, this should result in a relatively constant bioethanol per acreage than sugarcane, requires high input difference in hybridization throughout the 11 features. On costs, and is a major food and feed source. A crop that the other hand, if the two genotypes contain a genetic bridges between the two is the close relative, sorghum. polymorphism within a gene that coincides with one of Sorghum tolerates harsher environmental conditions than the particular features, this will produce differential sugarcane and maize, has a higher disease resistance than hybridization for that single feature. Such differences maize, and has a high stem sugar variant, sweet sorghum, have been described as single-feature polymorphisms which has potential yields of bioethanol like sugarcane. (SFPs) (Borevitz and Chory 2004;Borevitzetal. 2003; Moreover, sweet sorghum can be crossed with grain Hazen and Kay 2003;Zhu andSalmeron 2007). Thus, sorghum so that genetic analysis could uncover key expression microarrays hybridized with RNA are able to regulatory factors that would increase sugar and decrease provide us not only with phenotypic (variation in gene lignocellulose in the biomass. Therefore, sorghum could be expression) but also with genotypic (marker) data (Zhu used to identify both SFPs and ELPs linked to high sugar and Salmeron 2007). If two genotypes differ in the content. expression level of a particular gene, we can consider it We have recently reported the hybridization of RNAs as an expression level polymorphism or (ELP). Both ELPs derived from the stems of grain and sweet sorghum onto the and SFPs are dominant markers and can be mapped as sugarcane Affymetrix genechip (Calviño et al. 2008). A alleles in segregating populations (genetical genomics), previous study demonstrated that cross-species hybridiza- and ELPs can be considered as traits to determine tion did not affect the reproducibility of the microarray expression quantitative trait loci (eQTLs) (Coram et al. experiment (Cáceres et al. 2003). Moreover, an Affymetrix 2008;Jansenand Nap 2001). soybean genome array has been used to identify SFPs in the In Arabidopsis, SFPs have been used for several closely related species cowpea (Das et al. 2008). purposes such as mapping clock mutations through bulked Here, we have asked the question whether we could use segregant analysis (Hazen et al. 2005), the identification of the sugarcane chip analysis to extend the cross-species genes for flowering QTLs (Werner et al. 2005), high- concept in SFP discovery in the grasses. We report the density haplotyping of recombinant inbred lines (RILs) identification of SFPs in 58 sorghum genes by using the (West et al. 2006), and natural variation in genome-wide recently developed software GeSNP (Greenhall et al. 2007). DNA polymorphism (Borevitz et al. 2007). In plant species These genes were described in our previous study to be of agronomic importance, SFPs have been utilized to differentially expressed between grain and sweet sorghum identify genome-wide molecular markers in barley and rice (Calviño et al. 2008). The utility of GeSNP has been (Kumar et al. 2007; Potokina et al. 2008; Rostoks et al. successfully tested for SFP discovery in mice, humans, and 2005) as well as markers linked to Yr5 stripe rust resistance chimpanzees (Greenhall et al. 2007), but there is no report in wheat (Coram et al. 2008). However, an impediment to on plants yet. In order to experimentally validate the SFPs SFP discovery in crop plants based on DNA hybridization identified in sorghum, we sequenced fragments from 58 to Affymetrix expression arrays could be the size of gene genes and found SNPs in 30 of them, out of which 19 genes families (Borevitz et al. 2003; Varshney et al. 2005; Zhu had a validated SFP. Furthermore, we develop molecular and Salmeron 2007). Because the coding regions of many markers based on the SNPs found. The high experimental gene clusters that arose by tandem gene amplification are validation rate of SNPs of 50% of the candidate genes quite conserved, hybridization-based approaches would not shows the potential of this method for the development of be sufficient to distinguish between allelic and paralogous molecular markers and, in principle, the applicability to any copies (Xu and Messing 2008). Therefore, one would have trait of interest. to limit this analysis to low-copy genes. On the other hand, this approach does not aim at identifying candidate genes directly but rather linked genetic markers. Results An area where gene discovery has become of general interest is the utilization of biomass for the production of SFP discovery and validation from differentially expressed alternative fuels. Because desirable traits for biofuel crops genes in sorghum are very complex and involve many genes from different pathways, it becomes necessary to take genetic approaches Previously, we reported the use of an Affymetrix genechip to identify key genes so that molecular breeding can be from sugarcane to identify differentially expressed genes in employed to make performance improvements. The most the stem of grain and sweet sorghum (Calviño et al. 2008). successful biofuel crop today is sugarcane. However, it Such a cross-species hybridization (CSH) approach allowed cannot be grown in moderate climate. Maize, which is a us to identify 154 genes harboring expression level poly- major biofuel crop in the USA, has a much lower yield of morphisms between grain and sweet sorghum. In order to Rice (2009) 2:129–142 131 Table 1 Sorghum Genes with SFPs Predicted by the GeSNP discover single-feature polymorphisms within these genes Software as well, we uploaded the sugarcane Affymetrix CEL files previously obtained into the GeSNP software. Indeed, we Gene ID #SFPs #Validated SFPs #SNPs Sequence length found that, from 154 genes, 57 harbored a SFP with a t Ch1 value ≥7 (Fig. 1 and Table 1). Based on existing data Sb01g005770 1 0 0 378 (Greenhall et al. 2007), we adopted a t value of 7 or higher Sb01g049890 1 1 2 401 as a threshold. Chromosomes 1, 2, and 3 had the highest Sb01g002050 1 0 0 429 number of genes displaying both ELPs and SFPs, whereas Sb01g033060 1 0 0 429 chromosomes 5 and 6 had the lowest number of ELPs and Sb01g013710 3 0 2 214 SFPs, respectively (Fig. 1). Sb01g043060 2 0 4 418 In order to validate the SFPs discovered and calculate Sb01g046550 2 0 0 318 the SFP discovery rate (SDR) of the GeSNP software, we cloned and sequenced the fragments from 57 genes Sb01g003700 1 0 0 455 Sb01g011740 1 0 0 233 harboring both ELPs and SFPs in addition to one gene harboring only SFPs (see below) from sweet sorghum Rio Sb01g006220 1 0 0 292 and aligned the sequences against the BTx623 reference Sb01g009520 2 0 0 404 genome. The software predicted a total of 125 SFPs (on Sb01g016110 5 0 0 397 average ∼2 per gene), and we could experimentally validate Sb01g044810 6 0 5 502 32 of them (Table 1). We calculated the SDR as 25.6% Ch2 ð SDR ¼½ ValidatedSFPs = TotalSFPs 100Þ. As expected, Sb02g006330 2 1 2 191 the SDR was dependent on the t value, with the lowest Sb02g000780 1 1 2 273 SDR (less than 10%) at t values between 7 and 10 and the Sb02g005440 1 0 0 464 highest SDR (80%) with t values from 22 to 25, respectively Sb02g036870 2 0 0 225 (Fig. 2a). Sb02g022510 1 0 0 552 Besides SFPs identified in genes that are differentially Sb02g006420 4 2 5 731 expressed, the GeSNP software also detected SFPs in Sb02g009980 3 2 2 363 genes that did not show differential expression under our Sb02g032470 2 0 1 438 experimental conditions (data not shown). Considering Ch3 the high success rate of SNPs discovered in genes having Sb03g039090 6 4 2 405 both SFPs and ELPs, we extended our screen to genes Sb03g037370 1 1 2 311 that have predicted SFPs with t values of 22 to 25 but no Sb03g009900 2 0 0 517 ELP. This analysis allowed us to identify 35 sugarcane Sb03g037360 2 0 0 400 probe pairs that matched the sorghum genome sequence Sb03g013840 4 0 0 139 Sb03g012420 3 2 1 144 Sb03g007840 1 0 2 355 Sb03g037870 6 0 0 333 Sb03g045390 1 0 0 558 Sb03g027710 1 0 1 341 Sb03g003190 2 0 0 454 Ch4 Sb04g028300 1 0 0 494 Sb04g027910 2 0 0 485 Sb04g021610 1 0 0 209 Sb04g037170 1 1 2 346 Sb04g019020 8 3 6 235 Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Ch9 Ch10 Sb04g005210 1 1 1 236 Sorghum chromosomes Ch5 ELPs SFPs t-value ≥7 Validated SFPs Sb05g001680 2 1 3 153 Ch6 Fig. 1 Histogram showing the proportion of ELPs and SFPs between BTx623 and Rio for each sorghum chromosome. The number of genes Sb06g015180 2 0 3 314 with ELPs previously reported by Calviño et al. 2008 were plotted for Sb06g026710 1 0 0 277 each chromosome along with the number of SFPs found in this study. Sb06g029500 2 0 0 486 Only SFPs with t values ≥7 were taken into consideration. Frequency 132 Rice (2009) 2:129–142 Table 1 (continued) not counted), 19 of them recognized a SNP between the sixth and the 13th positions. Gene ID #SFPs #Validated SFPs #SNPs Sequence length With regard to genes involved in our traits of interest, Ch7 that is, sugar accumulation and cell wall metabolism, we Sb07g001320 7 0 0 473 validated SFPs for five of them (Figs. 5 and 3). The Sb07g005930 1 1 2 436 SFPs in the cellulose synthase 1 and dolichyl-diphospho- Ch8 oligosaccharide genes was based on a SNP, whereas the Sb08g008320 1 1 7 447 SFP in the LysM gene was due to a 13-bp indel (Fig. 5a, b). Sb08g016302 1 0 3 268 This indel allowed us to develop an allele-specific PCR Sb08g020760 1 0 3 488 marker (Fig. 5d). In the case of the 4-coumarate coenzyme A Sb08g015010 4 0 0 484 ligase gene, the SFP was based on a mis-spliced intron in Sb08g002250 6 5 4 316 Rio (Fig. 5c). Sb08g002660 1 0 0 345 To calculate the number of SNPs per total sequence Ch9 length, we determined the genome size of the Rio line by Sb09g000820 1 1 2 394 flow cytometry. The Rio line appeared to have the same Sb09g023620 1 0 0 434 genome size than the sequenced BTx623 (data not shown). Based on 87 SNPs in 21,612 bp of sequence Sb09g006050 2 2 3 268 from both parental lines, we concluded that there is an Sb09g005280 2 1 1 527 average of one SNP every 248 base pairs of sequence Sb09g029170 1 0 10 406 between BTx623 and Rio. Taking in consideration that Ch10 the genome size is in the order of 730 Mbp (Paterson et Sb10g002230 1 0 2 398 al. 2009), we suggest that 2,938,800 SNPs could exist Sb10g007380 1 1 2 374 between grain sorghum BTx623 and sweet sorghum Rio Sb10g004540 1 0 0 255 Total 125 32 87 21,612 SFPs with t values ≥7 SFP Discovery Rate (SDR) vs t-value and have a high probability of representing SNPs in genes that have no ELPs between BTx623 and Rio but were expressed in the stem (see Table 2). For example, one of the sugarcane probe pairs (Sof.3814.1.S1_at) 50 matched a sorghum gene coding for fructose bispho- spate aldolase. Since the protein product of this gene has a role in the sucrose and starch metabolic pathway (our trait of interest), we cloned and sequenced the 7..10 11..14 15..18 19..21 22..25 >25 fragment containing the SFPs. As it is shown in Fig. 3, t-value we found six SNPs, two of which were recognized by Frequency distribution of t-values for validated SFPs three sugarcane probe pairs. This result indicates that our approach is able to efficiently detect SNPs. From the 58 genes that were sequenced, 19 genes (∼33%) had a validated SFP, and 11 genes (19%) harbored SNPs outside the probe pairs at different location than the one predicted by GeSNP. Therefore, the total SNP detection rate was ∼52%. A list of genes with validated SFPs as well as the nature of the nucleotide change/s is provided in Table 3. 7..10 11..14 15..18 19..21 22..25 >25 Most of the validated SFPs had probe pairs with t values t-value from 15 to 18 and greater than 25 (Fig. 2b). Since the SFP validation depends on the SNP position along the probe Fig. 2 The SFP discovery rate of GeSNP is dependent on the t value. The percentage of SFPs in sorghum genes that were validated through pair (Rostoks et al. 2005), we analyzed the SNP position sequencing (and thus represented true SNPs between BTx623 and from the edge of the sugarcane probe pair for those genes Rio) was plotted against their respective t values (a). For the validated with validated SFPs (Fig. 4). We found that, from a total of SFPs, we calculated the frequency distribution of their respective t values (b). 22 probe pairs (probes that recognized the same SNP were #Validated SFPs SDR % Rice (2009) 2:129–142 133 Table 2 Sugarcane Probe Pairs with t Values of 22–25 That Identify Sorghum Transcripts with SFPs but not ELPs Sugarcane probe set Probe pair # Sorghum bicolor ID Position Function t value=22 Sof.4093.2.S1_at 6 NGH Ch1_8313833..8313816 Sof.4567.1.S1_at 8 Sb01g044810 Ch1_67980922..67980946 MADS-box transcription factor Sof.5184.2.S1_a_at 6 Sb03g001160 Ch3_991187..991163 Similar to Os02g0294700 protein SofAffx.1284.1.S1_s_at 3 Sb03g008870 Ch3_9656668..9656644 Unknown Sof.5348.1.S1_at 11 Sb03g003510 Ch3_3731533..3731509 Ubiquitin-conjugating enzyme E2 Sof.2770.1.S1_at 4 Sb03g041770 Ch3_69253777..69253759 Unknown Sof.3851.1.S1_at 10 Sb05g004130 Ch5_4878250..4878268 60S ribosomal protein L3 Sof.2692.1.S1_at 5 Sb08g002250 Ch8_2360780..2360756 Cytochrome P450 Sof.4985.2.S1_a_at 10 Sb08g018480 Ch8_48581627..48581646 ATP-citrate synthase SofAffx.1129.1.S1_at 2 Sb08g021850 Ch8_53598165..53598144 Serine/threonine protein phosphatase SofAffx.1129.1.S1_at 9 Sb08g021850 Ch8_53598029..53598005 Serine/threonine protein phosphatase Sof.4246.1.S1_a_at 11 Sb09g005270 Ch9_6772194..6772216 Unknown t value=23 Sof.2535.1.A1_at 6 Sb02g011130 Ch2_18051363..18051363 Similar to putative RES protein Sof.1282.2.S1_a_at 11 NGH Ch2_57946767..57946743 Sof.1664.2.S1_a_at 1 Sb03g033760 Ch3_62018464..62018488 Putative BURP domain-containing protein SofAffx.1284.1.S1_x_at 2 Sb03g008870 Ch3_9656190..9656166 Unknown Sof.497.2.S1_at 7 Sb07g027480 Ch7_62509159..62509135 3-Hydroxy-3-methylglutaryl-coA reductase Sof.1190.1.S1_at 8 Sb07g005930 Ch7_8393958..8393934 Unknown Sof.2692.1.S1_at 6 Sb08g002250 Ch8_2360760..2360736 Cytochrome P450 Sof.355.1.S1_at 8 Sb09g005570 Ch9_7345144..7345120 Heat shock protein t value=24 Sof.4310.1.S1_at 3 Sb01g028500 Ch1_49703504..49703480 Senescence-associated protein like Sof.4030.1.A1_at 10 Sb02g003450 Ch2_3915697..3915680 Similar to B0616E02-H0507E05.5 protein Sof.4972.1.S1_a_at 9 NGH Ch3_17046891..17046867 Sof.1835.1.S1_at 3 Sb03g033140 Ch3_61527980..61527956 Putative nuclear RNA binding protein A Sof.1003.1.S1_at 2 Sb05g002580 Ch5_2717665..2717641 Cytochrome P450 Sof.1694.1.A1_at 9 Sb06g033460 Ch6_61437575..61437596 Similar to H0913C04.1 protein Sof.3020.2.A1_at 4 Sb09g002960 Ch9_3216665..3216682 Aspartic proteinase t value=25 Sof.2803.1.S1_at 11 Sb01g043050 Ch1_66375993..66375971 Unknown Sof.1537.1.S1_at 7 Sb03g011270 Ch3_12484656..12484632 Mg-protoporphyrin IX monomethyl ester cyclase Sof.2992.1.A1_at 6 Sb04g037920 Ch4_67480989..67481008 Similar to Os04g0137500 Sof.1443.1.S1_at 7 Sb04g010990 Ch4_15758311..15758334 Unknown Sof.3814.1.S1_at 11 Sb04g019020 Ch4_44439307..44439289 Fructose bisphosphate aldolase Sof.3699.1.A1_at 4 Sb07g005850 Ch7_8311400..8311376 Equilibrative nucleoside transporter 1 Sof.2286.1.A1_at 2 Sb09g025350 Ch9_54815478..54815502 Similar to Os05g051300 Sof.1994.1.S1_x_at 7 Sb10g005375 Ch10_4802664..4802640 NGH Non-genic hit and that at least 0.4% of the genome could be poly- (4 SNPs/Kbp) (Fig. 6a). However, if we consider the morphic between the two lines. We also looked at the frequency of probe pairs with t values between 22 and 25 SNP density per sorghum chromosome in order to see if for each sorghum chromosome as it is shown in Fig. 6b, there is any difference among them. Surprisingly, we chromosome 3 had the highest number of probes. On the found that the level of polymorphism is higher for other hand, chromosome 8 had the second highest number chromosomes 8 and 9 and lower for chromosome 3 of probes with t values between 22 and 25 together with a compared to the average SNP density per Kb of sequence high SNP density (Fig. 6a, b). This might suggest an 134 Rice (2009) 2:129–142 Fructose bisphosphate aldolase BTx623 Rio 123456789 10 11 -200 Sugarcane probe pair Sof.3814.1.S1_at Query: Rio #8 Subject: Btx623 Sb04g019020 Ch_4: 44439290..44439522 #9 #11 Fig. 3 SFP validation for fructose bisphosphate aldolase. A fragment and11werevalidated. The blue lines represent the sugarcane probe from the gene fructose bisphosphate aldolase was cloned and sequenced pairs that are identical to either the Rio sequence (probe pairs #8 and #9) from both BTx623 and Rio and SNPs predicted by the probe pairs #8, 9, or identical to the BTx623 sequence (probe pair #11). unusual level of polymorphism for this chromosome for 18 (Table 4). We utilized the Single Nucleotide between BTx623 and Rio. However, we have not Amplified Polymorphism (SNAP) technique to develop sufficient data (genes sequenced) to test whether the markers based on SNPs (Drenkard et al. 2000), as it is SNP density differences among the chromosomes are shown for the gene alanine aminotransferase (Fig. 7). These statistically significant. markers were tested also in other grain and sweet sorghum Sorghum genes harboring validated SFPs allowed us to lines to see whether the SNPs were conserved or not investigate if such nucleotide substitutions were conserved (Table 4). In fact, we found a marker within the gene or not within grain sorghum BTx623, sweet sorghum Rio, Sb09g029170 that distinguished the grain sorghums from and sugarcane. Indeed, we found that from 22 SNPs the sweet sorghums cultivars used in this study. The protein discovered through 29 validated SFPs (one sugarcane probe product encoded by this gene is a putative ketol-acid pair can recognize more than one SNP), 15 of them were reductoisomerase enzyme that is involved in the biosyn- conserved between BTx623 and sugarcane, whereas only thesis of valine, leucine, and isoleucine amino acids (www. phytozome.net/cgi-bin/gbrowse/sorghum/). SNAP markers eight SNPs were conserved between Rio and sugarcane (Table 3). were also developed for the cellulose synthase 1 and dolichyl-diphospho-oligosaccharide genes (Fig. 5d). Development of molecular markers based on validated SFPs It has been suggested that Dale and Della sweet sorghums share a common genetic background (Ritter et The identification of SNPs between BTx623 and Rio al. 2007). In agreement with this, we found that from provided a direct way to develop molecular markers that ten SNAP markers that gave a PCR product in both can be used in mapping populations. From 58 candidate lines, they always represented the same allele (Table 4). genes, we were able to develop allele-specific PCR markers In addition, the sweet sorghum lines Top 76-6 and Simon Avg. scaled PM-MM Rice (2009) 2:129–142 135 Table 3 Nucleotide Change Conservation for Validated SFPs Between BTx623, Rio, and Sugarcane S. bicolor gene Position Sugarcane probe set Probe pair # t value BTx623-Rio-Sc SNP Sb02g006330 Ch2_7909203..7909180 Sof.1519.2.S1_at 8 23 C–T–C Sb02g000780 Ch2_628587..628568 Sof.1326.1.S1_a_at 5 15.2 A–G–G Sb02g006420 Ch2_8048752..8048728 Sof.2471.1.S1_at 5 34.1 C–A–C Ch2_8048741..8048717 6 19.8 Same Sb02g009980 Ch2_14533601..14533625 SofAffx.868.1.S1_s_at 9 13.7 A–T–A/C–T–C Ch2_14533610..14533630 10 12.9 Same Sb03g037370 Ch3_65336537..65336560 SofAffx.772.1.S1_s_at 7 19.1 C–G–C Sb03g012420 Ch3_14371043..14371019 Sof.2629.3.S1_a_at 8 38.2 C–T–C Ch3_14371036..14371016 9 19.4 Same Sb03g039090 Ch3_66876720..66876744 Sof.5269.1.S1_at 6 8.1 T–A–T/C–A–C Ch3_66876724..66876748 7 12 Same Ch3_66876727..66876751 8 17.1 Same Ch3_66876730..66876754 9 16.1 Same Ch3_66876734..66876758 10 45.8 Same Sb04g019020 Ch4_44439369..44439345 Sof.3814.1.S1_at 8 21.9 C–T–T Ch4_44439366..44439342 9 15.3 Same Ch4_44439307..44439289 11 25.5 T–G–T Sb04g037170 Ch4_66851287..66851311 Sof.151.1.S1_at 8 19.4 G–C–G Sb05g001680 Ch5_1816812..1816788 Sof.1902.1.S1_s_at 6 33.1 A–G–G Sb07g005930 Ch7_8393958..8393934 Sof.1190.1.S1_at 8 23.3 T–G–T Sb08g008320 Ch8_15917006..15917030 SofAffx.1412.1.A1_s_at 2 15.1 T–C–C Sb08g002250 Ch8_2360967..2360943 Sof.2692.1.S1_at 2 16.8 A–G–A Ch8_2360780..2360756 5 22.1 A–G–G Ch8_2360760..2360736 6 23.6 T–C–C Sb09g006050 Ch9_8732113..8732094 SofAffx.1438.1.A1_s_at 3 14.9 C–G–C Ch9_8732054..8732030 7 82.5 C–A–C Sb09g000820 Ch9_624173..624197 Sof.808.1.S1_at 8 29 G–C–G Sb09g005280 Ch9_6782917..6782941 Sof.5033.1.S1_at 9 15.1 A–G–G Sb10g007380 Ch10_7220153..7220177 SofAffx.287.1.S1_at 7 14 T–C–C Same means that a different probe pair recognizes the same SNP Sc Sugarcane SNP position on validated SFPs 4.5 have been identified as attractive contrasting pairs for mapping purposes based on their difference not only in 3.5 genetic distance (D) but also in sugar content (measured as Brix degree) (Ali et al. 2008). In our work, we identified 2.5 six SNAP markers within the genes Sb01g044810, Sb03g027710, Sb04g0037170, Sb08 g008320, 1.5 Sb09g006050, and Sb10g002230, respectively, which were polymorphic between Top 76-6 and Simon. These 0.5 markers will be useful for mapping purposes when these 123456789 10 11 12 13 lines are used as parents. SNP position from the edge of the sugarcane probe pair Fig. 4 The position of the SNP along the 25mer in the probe pair influences the SFP validation. The position of the SNP from the edge Discussion of the sugarcane probe pair was scored for each validated SFP. Most of the SNPs locate within positions 6 and 13 along the 25mer. If two A significant proportion of the phenotypic variation in or more SNPs were located on a single probe pair, their positions any organism can be attributed to polymorphisms at the along the 25mer were not counted and thus not included in the graph. 136 Rice (2009) 2:129–142 Cellulose synthase 1 Sb09g005280 LysM Sb01g049890 BTx623 800 BTx623 Rio Rio 0 200 123 45 6789 10 11 -200 123456789 10 11 Sugarcane probe pair Sof.3731.1.A1_at Sugarcane probe pair Sof.5033.1.S1_at 4-Coumarate coenzyme A ligase Sb04g005210 Dolichyl-diphospho-oligosaccharide Sb02g006330 BTx623 123 45 6789 10 11 -1000 Rio -2000 -3000 -4000 -5000 -6000 BTx623 -7000 Rio -8000 123456789 10 11 Sugarcane probe pair Sof.4734.1.S1_at Sugarcane probe pair Sof.1519.2.S1_at Fig. 5 GeSNP prediction of SFPs in sorghum genes related to biofuel and Rio, respectively (b). In Rio, the third intron of the gene 4- traits. The hybridization intensity between the perfect match and the coumarate coenzyme A ligase is mis-spliced and detected in the mismatch oligonucleotides was averaged and scaled (GeSNP software sugarcane probe pair #2 (c). Molecular markers for the genes lysM, output) and plotted against each sugarcane probe pair. Graphs are cellulose synthase 1, and dolichyl-diphospho-oligosaccharide were shown for four genes related to biofuel traits that have SFPs with t generated based on allele-specific PCR (d). In the case of lysM, a values of ≥7 and that were previously reported to be differentially primer spanning the 13-bp deletion in BTx623 was used to selectively expressed between grain sorghum BTx623 and sweet sorghum Rio amplify the allele from Rio. In the case of cellulose synthase 1 and (a). The SFP present in lysM identified a 13-bp indel, whereas the dolichyl-diphospho-oligosaccharide, primer pairs specific for the SNP SFPs present in cellulose synthase 1 and dolichyl-disphospho- in question were generated by the WebSNAPER software and tested oligosaccharide identified an A/G and G/A SNP between BTx623 empirically. DNA level. Thus, these DNA polymorphisms can be probe pair that is detected by the difference in hybridiza- used for genotyping, molecular mapping, and marker- tion affinity (Borevitz et al. 2003). In addition, SFPs assisted selection applications. The association of a present in a transcribed gene may be the underlying cause particular trait of interest with a DNA polymorphism is of the difference in a phenotype of interest. In most of the essential for breeding purposes. Microarrays have been cases, SNPs are the cause of SFPs as have been used to identify abundant DNA polymorphisms through- demonstrated by sequence analysis (Borevitz et al. 2003; out the genome (Gupta et al. 2008; Hazen and Kay Rostoks et al. 2005). 2003). In particular, ELPs and SFPs can be identified from Here, the goal was to identify SFPs from an Affymetrix RNA hybridization studies. SFPs are detected by oligo- sugarcane genechip dataset of closely related species nucleotide arrays and represent DNA polymorphisms (Calviño et al. 2008). The Affymetrix sugarcane genechip between genotypes within an individual oligonucleotide was used to survey the SFPs with the GeSNP software Avg. scaled PM-MM Avg. scaled PM-MM Rice (2009) 2:129–142 137 b d LysM A-specific primer Cellulose synthase 1 G-specific primer G-specific primer Dolichyl-diphospho-oligosaccharide A-specific primer 4-Coumarate coenzyme A ligase Sb04g005210 Ch4_5069094..5069216 Rio mis-spliced exon / intron CGCCGCCGTCGTGTCGTAAGTTGCTCATCGATACCGCCACAGCGCAGCCTGCGCGCTGCCAGTTTCTTAGGTCAACTGAATTCTGAA Probe pair #2 intron / exon AAACTTCTCCGTCTCTAACCTCAGAATGAAGGA Probe pair #2 Fig. 5 (continued). between two sorghum cultivars that differ in the accumu- could be due to the cross-hybridization of paralogous gene lation of fermentable sugars in their stems, with the targets to individual probes, which may affect the specificity objective to develop genetic markers for mapping purposes. of the SFP calling. This problem would also arise from using This is the first report to our knowledge of the use of next-generation sequencing for SNP detection. Nevertheless, GeSNP to identify SFPs within closely related grass species we could show that the use of expression analysis in and the development of molecular markers based on conjunction with GeSNP is an efficient and inexpensive validated SFPs. way to develop new molecular markers. The sugarcane probe pairs with t values between 22 and We cloned and sequenced gene fragments harboring SFPs with t values equal or higher than 7 from 58 sweet 25 had the highest SDR (80%) found in our study. One of sorghum genes comprising 125 SFPs in total. In this study, these probe pair sets matched a sorghum gene coding for we found a SFP discovery rate of 25.6%, which is sufficient fructose bisphosphate aldolase (cytoplasmic isozyme) and for most applications. Still, there are several possibilities to the identified SFP was confirmed through DNA sequence increase the SDR. First, the number of biological replicates analysis (Fig. 3). This gene codes for a glycolytic enzyme suggested for using the GeSNP software is 4 or more. In that catalyzes the cleavage of fructose 1,6 bisphosphate to contrast, we had only three replicates for both grain and glyceraldehyde 3-phosphate and dihydroxyacetone phos- sweet sorghum. Second, the cross-species hybridization of phate (Tsutsumi et al. 1994). sorghum RNAs to probe sets of the sugarcane array is not as One third (33%) of the 58 genes that we have sensitive as intra species hybridization. Third, false positives sequenced have a validated SFP. In addition, we could BTx623 BTx623 BTx623 Rio Rio Rio Heilong Heilong Heilong IS 9738C IS 9738C IS 9738C SC1063C SC1063C SC1063C Dale Dale Dale Della Della Della M81-E M81-E M81-E Top 76-6 Top 76-6 Simon Top 76-6 Simon Simon 138 Rice (2009) 2:129–142 SNP density per sorghum chromosome a detect SNPs in 19% of all sequenced genes at a different position than indicated by GeSNP. This is attributable to the fact that the probe pair set does only cover a part of the gene, which implies that any SNP outside this region is not reported by GeSNP. We estimated the average SNP density between BTx623 and Rio to one SNP every 248 bp. This is probably an underestimation because the sugarcane probe sets were designed from genic regions and are, therefore, more conserved than other regions in the genome. Ch1 Ch2 Ch3 Ch4 Ch8 Ch9 Although the sorghum chromosomes 1, 2, and 3 had the highest numbers for both ELPs and SFPs, chromo- Sugarcane probe pairs with t-values 22..25 that match genes in sorghum chromosomes somes 8 and 9 were the most polymorphic ones, measured as the number of SNPs per Kb sequence (Figs. 1 and 6). Our data are in agreement with a previous report by Ritter et al. (2007) in which amplified fragment-length polymor- phism markers on chromosome 8 could unambiguously distinguish grain from sweet sorghum lines (Ritter et al. 2007). Furthermore, sugar content QTLs have been located in this chromosome with a RIL derived from a dwarf derivative of Rio as one of the parents. In addition, we found that a marker within the gene Sb09g029170 Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Ch9 Ch10 coding for a putative ketol-acid reductoisomerase could discriminate the grain sorghums from the sweet sorghum Fig. 6 SNP density per sorghum chromosomes. The number of SNPs per kb of sequence was calculated based on the number of genes lines used in this study (Table 4). This enzyme is the sequenced belonging to a given chromosome. Only those chromo- second in the biosynthesis of branched amino acids valine, somes with five or more genes sequenced are represented (a). leucine, and isoleucine (Leung and Guddat 2009). When Frequency distribution along sorghum chromosomes of sugarcane the SNPs found through validated SFPs were compared probe pairs with t values between 22 and 25 (b). between BTx623, Rio, and sugarcane, we found that SNPs Table 4 Primer Sequences of SNAP Markers within Sorghum Genes S. bicolor gene ID Allele WebSNAPER primer sequence PCR product size (bp) Allele presence Sb01g043060 T F: GTAATATACTGACGCCAAAAGAGGCGGATT 306 BT R: TCAACTGCTGTTGTCGAGGACATTGG A F: TGTAATATACTGACGCCAAAAGAGGCGACTT 307 Ri-Top R: TCAACTGCTGTTGTCGAGGACATTGG Sb01g044810 C F: CAATCCTGCTCCCCAATCCAGACC 334 BT-Da-De-Sim R: GATTACGAGATCAGCGGTCTGGAAAGAAA T F: GCAATCCTGCTCCCCAATCCAGACT 335 Ri-He-IS-SC-M81 R: GATTACGAGATCAGCGGTCTGGAAAGAAA Top Sb02g000780 A F: TGGAGCAATACGAGGGCTACTCCAAA 118 BT R: AATCTTCAGAAACGCTCCATTTGTGCTG G F: TGGAGCAATACGAGGGCTACTCCATG 118 Ri-He-IS-SC-Da-De R: AATCTTCAGAAACGCTCCATTTGTGCTG M81-Top-Sim Sb02g006330 G F: TGTGGTACAGGTACACAAGCGAGAACATG 115 BT-IS-Da-De-M81 R: CCTTACAGGCATAACGAGTATGAGAGATTCATAACA A F: CTTATTTGTGGTACAGGTACACAAGCGAGAATAAA 121 Ri-Top-Sim R: CCTTACAGGCATAACGAGTATGAGAGATTCATAACA Sb03g012420 C F: GAAGCATTCTTTCCGATACAATATGGCCTATC 164 BT-He-SC-M81-Top R: TTCGATTAAAGGATTGTTGATGAAACTAGGGG Sim T F: GAAGCATTCTTTCCGATACAATATGGCCTACT 164 Ri-IS-Da SNP/Kb sequence # Sugarcane probe pairs Rice (2009) 2:129–142 139 Table 4 (continued) S. bicolor gene ID Allele WebSNAPER primer sequence PCR product size (bp) Allele presence R: TTCGATTAAAGGATTGTTGATGAAACTAGGGG Sb03g007840 C F: CCATAAATGTCATTGTGGAGACATCCGTTC 161 BT-He-IS-SC-M81 R: TGGAACGTCAAAACATTGACCGGAA Top T F: AAATGTCATTGTGGAGACATCCGGGT 157 Ri-Da-Sim R: TGGAACGTCAAAACATTGACCGGAA Sb03g027710 T F: GGTCATCGGTGATGGTGGAGAACCT 343 BT R: GGGAATTCGATTATGTCCATCACACCC G F: AGGTCATCGGTGATGGTGGAGATCTG 344 Ri-Da-Sim R: GGGAATTCGATTATGTCCATCACACCC Sb03g039090 C F: CGAACCCAACAACCTGTAACAATAAGCACTAC 326 BT-Da-De-Top-Sim R: GGAATTCGATTATCTCGGGGCTCATCTAC A F: GAACCCAACAACCTGTAACAATAAGCAGAAA 325 Ri-M81 R: GGAATTCGATTATCTCGGGGCTCATCTAC Sb04g0037170 G F: CACAAGCGACTTGAAACTGCGCTG 131 BT-IS-SC-Top R: GGCTTGACAACTGCTTCAACCTCTGC C F: CACAAGCGACTTGAAACTGCACCC 131 Ri-He-Da-De-M81 R: GGCTTGACAACTGCTTCAACCTCTGC Sim Sb07g005930 T F: CAGTTCTCCAATCCTTTCCTCTGTGGTCT 146 BT-He-SC-Da-M81 R: GTGAGAAGCGTGGGATGCTCATCAG G F: GTTCTCCAATCCTTTCCTCTGTGGTCG 144 Ri-IS-Top-Sim R: GTGAGAAGCGTGGGATGCTCATCAG Sb08g020760 C F: CAGAGGAAGCCCTTACACAGATCCGAC 1,400 BT-M81 R: TACCCACAGGTCTGGAAAGGGCAAG T F: CAGAGGAAGCCCTTACACAGATCCGAT 416 Ri-He-IS-SC-Top R: TACCCACAGGTCTGGAAAGGGCAAG Sim Sb08g008320 T F: GCAGTGGAAGGACATCATTGCCCAT 174 BT-He-Da-M81-Sim R: CTCTTCCGGGACGCGACGTTC C F: CAGTGGAAGGACATCATTGCCGTC 173 Ri-IS-SC-Top R: CTCTTCCGGGACGCGACGTTC Sb09g005280 A F: GCAGCACCGTCACCGGCACTA 142 BT R: GAGGCTCAATCAAGATCGTCTGCCC G F: CAGCACCGTCACCGGCATCG 141 Ri-He-IS-SC-Da-De R: GAGGCTCAATCAAGATCGTCTGCCC M81-Top-Sim Sb09g029170 C F: CTACTCTGAGATCATCAACGAGAGCGTGAAC 124 BT-He-SC-IS R: CCTAGATCCCAGGCGAGCCGTC T F: CTACTCTGAGATCATCAACGAGAGCGTGTTT 124 RI-Da-De-M81-Top R: CCTAGATCCCAGGCGAGCCGTC Sim Sb09g000820 G F: TCGAGAGCGATGCCTTCTGACATTG R: CCATATCTCCAGCCATCTTCAATGTTGTG 128 BT-Top A F: CGAGAGCGATGCCTTCTGACAGCA 130 Ri R: CCATATCTCCAGCCATCTTCAATGTTGTG Sb09g006050 C F: ATAGAAGGCAGAATGAACGCTGGAAAGC 105 BT-Top R: GGGCAAGCAGGCCTGGAACTTC A F: AGAAGGCAGAATGAACGCTGGACTGA 103 Ri-He-IS-SC-Da-De R: GGGCAAGCAGGCCTGGAACTTC M81-Sim Sb10g007380 T F: GAACTACAGACATGCACAAGGATAGCAGGTT 561 BT-Top R: ATTGCATTCAGGAAGCTCGCTCGA C F: GAACTACAGACATGCACAAGGATAGCAGAGC 561 Ri-He-IS-SC-Da-De R: ATTGCATTCAGGAAGCTCGCTCGA M81 140 Rice (2009) 2:129–142 Table 4 (continued) S. bicolor gene ID Allele WebSNAPER primer sequence PCR product size (bp) Allele presence Sb10g002230 G F: CTTCAATCCGACAACCAAGTCGCTG 197 BT-He-IS-Top R: CTGGAACTGCAATGCGGCCATT A F: GCTTCAATCCGACAACCAAGTCGCTA 197 Ri-SC-Da-De-M81 R: CTGGAACTGCAATGCGGCCATT Sim BT BTx623, Ri Rio, He Heilong, IS IS 9738C, SC SC 1063C, Da Dale, De Della, M81 M81-E, Top Top76-6, Sim Simon Only the cultivars that gave a PCR product were scored. If a cultivar was heterozygous for a particular allele, it was not scored between BTx623 and sugarcane are twice as high as sequence resources such as Miscanthus and switchgrass, between Rio and sugarcane. further extending the use of microarrays of one species for Allelic genetic diversity among sweet sorghum cultivars related ones. has previously been investigated based on simple sequence repeat markers (Ali et al. 2008). This study described the correlations between allelic diversity and the degree of stem Materials and methods sugar. Indeed, one could envision a simpler approach, using the microarray described here by hybridizing stem-derived Plant material RNAs from these lines to the sugarcane genechip, and identify both ELPs and SFPs for subsequent mapping of The grain sorghum lines Heilong (accession number PI sugar content QTLs. Furthermore, the SNPs identified in our 563518), IS 9738C (PI 595715), and SC 1063C (PI study provided us with the opportunity to develop molecular 595741) were obtained from the National Plant Germplasm markers within genes. So far, there is no report of SNP-based System (NPGS), USDA. The other lines used in this study molecular markers in transcribed genes in sorghum. The were previously described (Calviño et al. 2008). Two-week- SFPs generated from transcriptome studies are also useful for old seedlings were harvested for the extraction of genomic the development of markers in those species that lack DNA. ab A-specific primer Alanine aminotransferase Sb02g000780 BTx623 Forward: TGGAGCAATACGAGGGCTACTCCAAA Rio BTx623 ATTCATGGAGCAATACGAGGGCTACTCCAGAATGTGAACAA Rio ATTCATGGAGCAATACGAGGGCTACTCCAGGATGTGAACAA Forward: TGGAGCAATACGAGGGCTACTCCATG G-specific primer Grain sorghums Sweet sorghums 123456789 10 11 -100 Sugarcane probe pair Sof.1326.1.S1_a_at A G A G Fig. 7 Development of a molecular marker for alanine aminotrans- S1_a_at was validated through sequencing (a). Specific primers for ferase based on SFP discovery and the SNAP technique. The SFP either A or G nucleotides were designed with WebSNAPER (b) and detected by the probe pair #5 in the sugarcane probe set Sof.1326.1. tested through PCR in ten sorghum lines (c). Btx623 IS 9738C Heilong SC 1063C Dale Rio Della Simon M81-E T 7 op 6-6 Avg. scaled PM-MM Rice (2009) 2:129–142 141 SFP discovery and validation from Affymetrix transcript were selected. The primer sequences used to distinguish data SNPs are provided in Table 4. Genomic DNA from 2-week-old seedlings was extracted The microarray analysis for differentially expressed tran- with the PrepEase Genomic DNA Isolation kit from USB. scripts in stems of grain and sweet sorghum with a Several concentrations of genomic DNA were tested, and sugarcane genechip was previously described (Calviño et 50 ng was used for testing the SNAP primer pairs through al. 2008). The CEL files from the microarray work were PCR. The conditions used for PCR reaction were as uploaded into the publicly available GeSNP software at follows: 94°C for 2 min, then 30× [94°C 30 s, 64°C 30 s, http://porifera.ucsd.edu/∼cabney/cgi-bin/geSNP.cgi, and an 72°C 30 min] and a final extension at 72°C for 2 min. excel file was obtained with all the probe sets in the array Acknowledgments The research described in this manuscript was harboring an SFP together with their respective t values. supported by the Selman A. Waksman Chair in Molecular Genetics to The excel file also contained the average hybridization JM and by the sponsorship from the International Institute of Education intensity between the PM and MM probe pairs (average (IIE), and the Fulbright Commission in Uruguay to MC. We thank Wenqin Wang and Todd Michael for their assistance in the measure- scaled PM–MM) as well as their variance values that were ment of BTx623 and Rio genome sizes through flow cytometry. converted to standard deviations. These values were used to generate the graphs displaying differences in hybridization intensity between BTx623 and Rio along the 11 sugarcane References probe pairs for a given probe set. From the transcripts previously described as being differen- Ali M, Rajewski J, Baenziger P, Gill K, Eskridge K, Dweikat I. tially expressed between grain sorghum BTx623 and sweet Assessment of genetic diversity and relationship among a sorghum Rio, we selected those harboring SFPs with t values ≥7 collection of US sweet sorghum germplasm by SSR markers. Mol Breed. 2008;21:497–509. for further validation through sequencing. In total, we Borevitz JO, Chory J. Genomics tools for QTL analysis and gene sequenced gene fragments corresponding to 58 different genes. discovery. Curr Opin Plant Biol. 2004;7:132–6. Total RNA from Rio stem tissue was extracted at the Borevitz JO, Liang D, Plouffe D, Chang HS, Zhu T, Weigel D, et al. time of flowering from three independent plants. RNA Large-scale identification of single-feature polymorphisms in complex genomes. Genome Res. 2003;13:513–23. extraction was performed with the RNeasy Plant Mini Kit Borevitz JO, Hazen SP, Michael TP, Morris GP, Baxter IR, Hu TT, et al. from QIAGEN. cDNA synthesis was performed for each of Genome-wide patterns of single-feature polymorphism in the three samples from 1 μg of total RNA with the Arabidopsis thaliana. Proc Natl Acad Sci USA. 2007;104: SuperScript III First-Strand Synthesis kit from Invitrogen. 12057–62. Cáceres M, Lachuer J, Zapala MA, Redmond JC, Kudo L, Geschwind cDNAs from Rio were pooled respectively and used for the DH, et al. Elevated gene expression levels distinguish human amplification of genes with SFPs. from non-human primate brains. Proc Natl Acad Sci USA. The reverse transcription polymerase chain reaction 2003;100:13030–5. products were checked by agarose gel electrophoresis in Calviño M, Bruggmann R, Messing J. Screen of genes linked to high- sugar content in stems by comparative genomics. Rice. order to verify that a single band amplification product 2008;1:166–76. from each gene was present. The PCR products were Coram TE, Settles ML, Wang M, Chen X. Surveying expression level purified with the QIAquick PCR Purification kit from polymorphism and single-feature polymorphism in near-isogenic Qiagen and cloned into the pGEM-T easy vector from wheat lines differing for the Yr5 stripe rust resistance locus. Theor Appl Genet. 2008;117:401–11. Promega. Twelve clones per gene were sequenced in order Das S, Bhat PR, Sudhakar C, Ehlers JD, Wanamaker S, Roberts PA, et to identify any sequencing or reverse transcriptase errors. al. Detection and validation of single feature polymorphisms in The consensus sequence for each gene was then used to cowpea (Vigna unguiculata L. Walp) using a soybean genome find SNPs between BTx623 and Rio. array. BMC Genomics. 2008;9:107. Drenkard E, Richter BG, Rozen S, Stutius ML, Angell NA, Mindrinos M, et al. A simple procedure for the analysis of single nucleotide Development of molecular markers using WebSNAPER polymorphisms facilitates map-based cloning in Arabidopsis. software Plant Physiol. 2000;124:1483–92. Greenhall JA, Zapala MA, Cáceres M, Libiger O, Barlow C, Schork NJ, et al. Detecting genetic variation in microarray expression Once a SNP was identified between BTx623 and Rio for data. Genome Res. 2007;17:1228–35. a particular gene of interest, the sequence harboring the Gupta PK, Rustgi S, Mir RR. Array-based high-throughput DNA SNP in question was uploaded into the publicly available markers for crop improvement. Heredity. 2008;101:5–18. WebSNAPER software (http://pga.mgh.harvard.edu/cgi- Hazen SP, Borevitz JO, Harmon FG, Pruneda-Paz JL, Schultz TF, Yanovsky MJ, et al. Rapid array mapping of circadian clock and bin/snap3/websnaper3.cgi). TheSNAPprocedurehas developmental mutations in Arabidopsis. Plant Physiol. been previously described (Drenkard et al. 2000). Several 2005;138:990–7. primer pairs per SNP were tested, and the ones that Hazen SP, Kay SA. Gene arrays are not just for measuring gene successfully distinguished the SNP in one line or the other expression. Trends Plant Sci. 2003;8:413–6. 142 Rice (2009) 2:129–142 Jansen RC, Nap JP. Genetical genomics: the added value from Shiu SH, Borevitz JO. The next generation of microarray research: segregation. Trends Genet. 2001;17:388–91. applications in evolutionary and ecological genomics. Heredity. Kumar R, Qiu J, Joshi T, Valliyodan B, Xu D, Nguyen HT. Single 2008;100:141–9. feature polymorphism discovery in rice. PLoS ONE. 2007;2: Tsutsumi K, Kagaya Y, Hidaka S, Suzuki J, Tokairin Y, Hirai T, et al. e284. Structural analysis of the chloroplastic and cytoplasmic aldolase- Leung EW, Guddat LW. Conformational changes in a plant ketol-acid encoding genes implicated the occurrence of multiple loci in rice. reductoisomerase upon Mg(2+) and NADPH binding as revealed Gene. 1994;141:215–20. by two crystal structures. J Mol Biol. 2009. doi:10.1016/j. Varshney RK, Graner A, Sorrells ME. Genomics-assisted breeding for jmb.2009.04.012. crop improvement. Trends Plant Sci. 2005;10:621–30. Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Werner JD, Borevitz JO, Warthmann N, Trainer GT. Quantitative trait Gundlach H, et al. The Sorghum bicolor genome and the locus mapping and DNA array hybridization identify an FLM diversification of grasses. Nature. 2009;457:551–6. deletion as a cause for natural flowering-time variation. Proc Natl Potokina E, Druka A, Luo Z, Wise R, Waugh R, Kearsey M. Gene Acad Sci USA. 2005;102:2460–5. expression quantitative trait locus analysis of 16 000 barley genes West MA, van Leeuwen H, Kozik A, Kliebenstein DJ, Doerge RW, St reveals a complex pattern of genome-wide transcriptional Clair DA, et al. High-density haplotyping with microarray-based regulation. Plant J. 2008;53:90–101. expression and single feature polymorphism markers in Arabi- Ritter KB, McIntyre CL, Godwin ID, Jordan DR, Chapman SC. An dopsis. Genome Res. 2006;16:787–95. assessment of the genetic relationship between sweet and grain Xu JH, Messing J. Organization of the prolamin gene family provides sorghums, within Sorghum bicolor ssp. bicolor (L.) Moench, insight into the evolution of the maize genome and gene using AFLP markers. Euphytica. 2007;157:161–76. duplications in grass species. Proc Natl Acad Sci USA. Rostoks N, Borevitz JO, Hedley PE, Russell J, Mudie S, Morris J, et 2008;105:14330–5. al. Single-feature polymorphism discovery in the barley tran- Zhu T, Salmeron J. High-definition genome profiling for genetic scriptome. Genome Biol. 2005;6:R54. marker discovery. Trends Plant Sci. 2007;12:1360–85.

Journal

RiceSpringer Journals

Published: Aug 15, 2009

There are no references for this article.