Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

The Promoter Signatures in Rice LEA Genes Can Be Used to Build a Co-expressing LEA Gene Network

The Promoter Signatures in Rice LEA Genes Can Be Used to Build a Co-expressing LEA Gene Network Rice (2008) 1:177–187 DOI 10.1007/s12284-008-9017-4 The Promoter Signatures in Rice LEA Genes Can Be Used to Build a Co-expressing LEA Gene Network Stuart Meier & Chris Gehring & Cameron Ross MacPherson & Mandeep Kaur & Monique Maqungo & Sheela Reuben & Samson Muyanga & Ming-Der Shih & Fu-Jin Wei & Samart Wanchana & Ramil Mauleon & Aleksandar Radovanovic & Richard Bruskiewich & Tsuyoshi Tanaka & Bijayalaxmi Mohanty & Takeshi Itoh & Rod Wing & Takashi Gojobori & Takuji Sasaki & Sanjay Swarup & Yue-ie Hsing & Vladimir B. Bajic Received: 3 July 2008 /Accepted: 31 October 2008 /Published online: 22 November 2008 The Author(s) 2008. This article is published with open access at Springerlink.com Abstract Coordinated transcriptional modulation of large promoters with all other promoters. When the method was gene sets depends on the combinatorial use of cis-regulatory tested in rice starting from a group of co-expressing Late motifs in promoters. We postulate that promoter content Embryogenesis Abundant (LEA) genes, we obtained a similarities are diagnostic for co-expressing genes that function promoter similarity-based network that contained candidate coherently during specific cellular responses. To find the co- genes that could plausibly complement the function of LEA expressing genes we propose an ab initio method that identifies genes. Importantly, 73.36% of 244 genes predicted by our motif families in promoters of target gene groups, map these method were experimentally confirmed to co-express with the families to the promoters of all genes in the genome, and LEA genes in maturing rice embryos, making this methodology determine the best matches of each of the target group gene a promising tool for biological systems analyses. Stuart Meier, Chris Gehring, Yue-ie Hsing, and Vladimir Bajic are the first authors. Electronic supplementary material The online version of this article (doi:10.1007/s12284-008-9017-4) contains supplementary material, which is available to authorized users. : : : : : : S. Meier C. R. MacPherson M. Kaur M. Maqungo M.-D. Shih F.-J. Wei Y.-i. Hsing : : : S. Muyanga A. Radovanovic B. Mohanty V. B. Bajic (*) Institute of Plant and Microbial Biology, Academia Sinica, South African National Bioinformatics Institute (SANBI), Taipei, Taiwan University of the Western Cape, : : Cape Town, South Africa S. Wanchana R. Mauleon R. Bruskiewich e-mail: vlad@sanbi.ac.za International Rice Research Institute (IRRI), URL: www.sanbi.ac.za/people/faculty/professors/vlad-bajic/ Metro Manila, Philippines : : T. Tanaka T. Itoh T. Sasaki National Institute of Agrobiological Sciences, C. Gehring Tsukuba, Japan Department of Biotechnology, University of the Western Cape, R. Wing Cape Town, South Africa Department of Plant Sciences, University of Arizona, Tucson, USA S. Reuben S. Swarup T. Gojobori Department of Biological Sciences, National Institute of Genetics, National University of Singapore, Shizuoka, Japan Singapore, Singapore 178 Rice (2008) 1:177–187 . . Keywords Transcription regulation Co-expression developed that based on promoter content similarity builds Co-regulation a putative transcriptional regulatory network of co- expressed genes. We use the term ‘network’ to define a group of genes that are linked by the fact they contain a Introduction common set of motifs in their promoters that we believe could be causative for their transcriptional regulation. The In the post-genomic sequencing era, computationally based promoters of these genes would presumably bind common tools are required to help decipher biologically meaningful TFs thus connecting the genes into a putative transcription- information from the masses of sequence data generated. al regulatory gene network. Computationally based homology comparisons are com- We tested our method in rice using a group of late monly used to infer gene functions based on similarities to embryogenesis abundant (LEA) genes that were confirmed previously functionally annotated genes. While extremely to be co-expressed in developing embryos. In plants, the useful, homology comparisons are somewhat limited to the LEA genes are believed to function in protecting cellular identification of ‘more of the same’ type of genes and fail components during developmentally induced desiccation in to provide information regarding the temporal, spatial, and embryos and during water deficit stress in vegetative tissue stimulus-specific context in which the gene is expressed [9]. We, therefore, hypothesize that other genes that are and active. This problem is particularly apparent when determined to share the most similar promoter motif considering within a genome large gene families which combinations with the LEA genes will function coherently share high sequence similarity yet function within distinct with them in achieving a common cellular response, which cellular responses. Alternatively, although global gene will be manifested by their co-expression with the LEA expression studies, such as microarray, can reveal transcrip- genes. Experimental validation of our predictions shows tional responses of entire genomes in a single experiment, that 73.36% of the 244 predicted genes co-express with the they only provide expression profiles at specific time points LEA genes. In addition, a literature analysis indicated that in response to a specific stimulus. Furthermore, the biological the function of many of the genes could plausibly roles of many of the genes identified in large-scale expression complement the function of the LEA genes. studies are not well characterized and do not link genes to specific regulatory pathways since the expression profile can be a direct or indirect result of the treatments. Results In eukaryotes, many cellular processes require the coherent participation of multiple gene products as evident Method outline by the co-expression of large sets of genes in response to specific stimuli [10, 27, 29]. Furthermore, a number of We have developed a method that builds a putative studies have shown that genes that have been confirmed to transcriptional network of co-expressed genes based on them be co-expressed in response to a range of conditions have sharing highly similar promoter contents. The network correlated functional relationships, including physical inter- building relies on a reference target gene group (TGG) that actions between their proteins [1, 16, 17, 25, 34,]. is defined in terms of being co-expressed in response to a Collectively, these studies indicate that cells possess a specific biological condition. A typical example could be a mechanism that coordinates the expression of genes that are cluster of co-expressing genes identified in a microarray involved in common functional responses. expression experiment. The promoters of these genes are Accordingtothe cis-regulatory logic [2, 6], the then collectively assessed for the presence of specific regulation of eukaryotic gene expression is critically signatures in the form of specific motif combinations that dictated by the combinational presence (and effect) of we believe could be causative for their transcriptional regulatory motifs, or signatures, in their promoters which is responses. The signatures identified in each of the individual necessitated by the specific binding requirements of promoters of the TGG are then mapped to other promoters in transcription factors (TFs) [1, 2, 4, 6, 23]. Genomic the genome. These signatures thus serve to identify other sequences contain these regulatory motifs encoded mainly genes that share the most similar promoter content and thus in promoter regions of individual genes. have the greatest potential to be co-regulated with each gene We hypothesize that promoter content similarity can of the TGG. The method generates a putative transcriptional therefore be used to identify groups of co-expressed genes network that contains groups of candidate genes that we that function coherently during defined cellular processes, predict will co-express and function coherently with the TGG including changes in growth and development programs or in producing a common cellular response. This method environmental challenges. We report here a method we extends the regulatory relationships of a TGG to other Rice (2008) 1:177–187 179 179 candidate genes and thus links them to a well-defined promoters of the LEA genes (see Supplementary File 3). A biological response providing insights into the biological complete network diagram (see Supplementary File 4) was context in which the gene(s) functions. constructed to illustrate the edge relationship between the The method described above briefly consists of the TGG and the predicted genes. Figure 2 illustrates such a following steps (details of which, related to the implemen- relationship between a single LEA gene and its neighbors tation we made, are given in the “Methods” section): in the network. A detailed literature search that was performed for 110 1. Determine the target gene group based on their co- of the 244 identified genes indicated that the function of expression in a common systemic response. many of the genes, which possessed functional descrip- 2. Identify promoters for the TGG genes. tions, could reasonably be linked to embryo development 3. Identify enriched motif families in the promoters of the and water deficit stress responses (Supplementary File 5). TGG. 4. Map identified motif families to all promoters of the Experimental validation of predicted gene co-expression genome. Overlapping of mapped motifs is allowed. 5. For each of the promoters of the TGG, search for other Experimental validation of our method was obtained using promoters in the genome that share the highest number semi-quantitative reverse transcriptase polymerase chain of the mapped motifs with the individual TGG reaction (RT-PCR) and MPSS expression analysis to promoter. We hypothesize that genes associated with determine if the predicted genes are co-expressed with the these identified promoters have a high probability to LEA genes in maturing embryos. The results show that co-express with the genes in TGG under the same (based on RT-PCR and MPSS) 179 (73.36%) out of the 244 biological conditions. genes tested were co-expressed with the LEA genes (see Supplementary Files 6 and 7). A more detailed analysis revealed a strong positive correlation between the number Identification of TGG and construction of a putative of motifs shared between genes and the percentage that co-expressing gene network were co-expressed (Fig. 2). We found that 100% of the predicted genes that shared 27 or more motifs with the LEA We tested our method on the recently sequenced rice genes were co-expressed with the LEA genes, compared to genome and used 31 annotated LEA genes as the TGG. the 73.36% for the overall prediction success. This analysis These LEA genes were all determined to be co-expressed in thus provides compelling experimental support for our mature rice embryos as determined from massively parallel method since it illustrates an extremely high correlation signature sequencing (MPSS) expression data (see Supple- coefficient between the number of shared motifs and co- mentary File 1). The Dragon Motif Builder (DMB) program expression (correlation coefficient=0.97) when we consider was used to identify 30 enriched motif families in the pro- genes with 22 or more shared motifs. moters of the LEA genes (Table 1). For each of the motif Further, in order to test whether the proportion of our families, the consensus motif was determined. The PATCH predicted genes found to be expressed in maturing embryos program of the Transfac database suite indicated that 21 of was significantly greater than that for the whole rice the 30 identified consensus motifs conform to known plant genome, we performed a global MPSS expression analysis cis-elements and 19 of these contain sequences that to determine the percentage of all non-transposable element correspond to binding sites for known plant TFs (Table 1) (TE) genes that are expressed in maturing rice embryos. some of which have been shown to regulate the expression of According to TIGR v.5, there are 41,047 non-TE genes in LEA genes (Table 1 and Supplementary File 2). the rice genome. Using MPSS analysis of matured rice The presence and abundance of the 30 motif families in embryos, we found that 27.99% (11,488) non-TE genes are the individual promoters of the TGG was used to build expressed in maturing rice embryos with a TPM≥1, and promoter signatures for each of the individual LEA genes. 20.07% (8241) non-TE genes are expressed with a TPM≥4 These signatures were then used to map to the most similar (TPM stands for ‘transcripts per million’). Consequently, promoters in the genome and thus identify other genes that the enrichment of the experimentally confirmed genes that have the greatest potential to be co-regulated with the LEA co-express with LEA genes in our computationally predicted genes. A summary of the average spatial distribution of gene set, relative to those from the whole rice genome that each of the 30 identified motifs in the promoters of the express in maturing embryos, is characterized by the p values predicted genes is depicted in Fig. 1.Thisanalysis of 1.90e−044 (TPM≥1) and 1.37e−067 (TPM≥4). These p identified an additional 244 genes that shared the highest values represent the p values corrected for multiplicity number of common motifs with each of the individual testing (see details in “Methods”). Therefore, the successful 180 Rice (2008) 1:177–187 Table 1 Identified Consensus Promoter Motifs in Original LEA Genes and the Plant TFs That Were Predicted to Bind to Them in the PATCH Program Consensus motif Species/gene identifier Position Score Predicted plant TF Site binding pattern sequence 1 GAGAAGAAG AT$PHYA_01 2 (−) 100 CAMTA3 TCTTCT 2 GGCGCGYGG AT$AVP1_01 2 (−) 91.7 (VOZ1&2)2, CAMTA1 ACGCGC RICE$ZB8_02 3 (+)(−) 91.7 CBT CGCGCG AS$CBT_01 3 (+) 91.7 CBT CGCGCG CACGCG MAIZE 5 (+) 90.0 No match CGTGG $ADH1P_01&03 DAUCE$DC3_04 3 (−) 91.7 DPBF-1, DPBF-2 CACGCG 3 CCGTCGWCC AT$H4_05 1 (+) 100 CCGTCG AT$COR15A_01 3 (−) 90 ANT, CBF1, CBF2, DREB1A, CCGAC ERFLP1, TSI1 AT$RD29B_01 3 (−) 90 CBF1 CCGAC AT$COR78_01 3 (−) 90 ANT, CBF1, CBF2, DREB1A CCGAC AT$COR15B_01 3 (−) 90 CBF1, CBF2, DREB1A CCGAC RAPE$BN115_01 3 (−) 90 CBF17, CBF5 CCGAC AT$FL0521F13_01 3 (−) 91.7 DREB1A GCCGAC BAR$HVA1_03 3 (−) 90 CBF1, CBF2 CCGAC GOSHI$LEAD113_01 3 (−) 90 DBP1 CCGAC AT$COR78_01 3 (−) 90 ANT, CBF1, CBF2, DREB1A CCGAC AT$COR15A_03 3 (−) 91.7 DREB1A GCCGAC AS$CEF1_02 3 (−) 90 CEF1 CCGAC GOSHI$LEAD113_01 3 (−) 90 DBP1 CCGAC 4 GCGGAGAAG No match 5 GCVGGGCAG MAIZE$ADH11S_06 3 (−) 90 GCBP-1, Sp1 GCCCC 6 AACADCAAA WHEAT$CATHB_08 1 (−), 2 90 GAMYB TTGTT (−) 7 AGCAGCAGC No match 8 MCCGACGGC AT$COR15A_03&04 1 (+) 91.7 DREB1A GCCCAG MAIZE$DHN1_01 1 (+) 91.7 DBF1, DBF2 ACCGAC AS$TINY2_01 1 (+) 91.7 TINY2 ACCGAC HELAN$HSP176_02 1 (−) 91.7 No match GTCGGT AT$COR15A_01 2 (+) 100 ANT, CBF1, CBF2, DREB1A, CCGAC ERFLP1, TSI1 RAPE$BN115_02 2 (+) 100 CBF17, CBF5 CCGAC AS$DREBLP1_01 2 (+) 100 DREBLP1 CCGAC GOSHI$LEAD113_01 2 (+) 100 DBP1 CCGAC AT$H4_05 3 (−) 100 No match CCGTCG 9 ACACATACG No match 10 TTCMTTTCA DAUCE$EXT_02 1 (−) 92.86 No match AAATGAA POT$KST1_01 1 (−) 90 DOF1 AAAAG BAR$CPI_01 3 (−) 90 PBF, SED AAAGG AT$WUSCHEL_01 4 (−) 91.7 No match TGAAAA 11 AWATTATAT No match 12 CGGCGSCGG AT$HLS1_01 2 (−) 91.7 ATERF7, ERF-1,2,3,4,5, ERFLP1 GCCGCC TO$NP24PP_0 2 (−) 91.7 ERF-1,2,3,4 GCCGCC AS$GCCBOX_02 2 (−) 91.7 Pti4 GCCGCC AS$CEF1_01 2 (−) 91.7 CEF1 GCCGCC BAR$HVA1_04 3 (+) 90 CBF1 GCCGCC AT$H4_05 4 (−) 91.7 No match CCGTCG AT$COR15A_0 5 (−) 90 ANT; CBF1,2; DREB1A, ERFLP1, CCGAC TSI RAPE$BN115_01 5 (−) 90 CBF17, CBF5 CCGAC 13 CTTCTTCCT No match 14 AAAATAATA SOYBN$VSPB_03 1 (−) TATTTT Rice (2008) 1:177–187 181 181 Table 1 (continued) Consensus motif Species/gene identifier Position Score Predicted plant TF Site binding pattern sequence 15 AAATYGARA AS$ARR10_17 2 (−) 90 ARR10 CGATT 16 AGAAGATCA AT$PHYA_01 1 (−) 100 CAMTA3 TCTTCT 17 RCAGCAGCA No match 18 CGCGCGGCG RICE$ZB8_02 1 (+) 100 CBT CGCGCG 19 GTTAMATAT AT$CAB2_03 1 (+) 90 GT-3a GTTAC PV$PHS_03 2 (+) 90 No match TTAAA RICE$ZB8_01 2 (−) 92.86 TBP2 TATTTAA MAIZE$PMS1_ 3 (+) 92.86 No match TAAATAT 20 TTGYTTAAT WHEAT$CATHB 1 (+) 90 GAMYB TTGTT AS$ARR10_18 2 (+) 90 ARR10 TGATT PEA$RS3A_03 3 (+) 91.67 GT-1, GT-1a, SBF-1 GGTTAA OAT$PHYA3_0 3 (+) 92.86 No match GGTTAAT RICE$PHYA_0 3 (+) 92.86 GT-1, GT-2 GGTTAAT PV$PHS_03 4 (+) 91.67 No match TTTAAT PV$PHS_03 4 (−) 90 No match TTAAA 21 TGTACTCSC TO$LAP171A_ 3 (−) 100 JAMYC2 GAGTA 22 MSGATGRTG BARL$CAB11_12 2 (−) 90 MCB1, MCB2 CATCC 23 AGCACACAT No match 24 CMAAAAGCT AS$PF1_01&02 2 (−) 90 PF1 TTTTT POT$KST1_01 3 (+) 100 DOF1 AAAAG 25 CGGCTCGCC No match 26 GAATGGATG WHEAT$CAB1_ 4 (−) 100 MCB1, MCB2 ATCCA BARL$CAB11&12 5 (−) 100 MCB1, MCB2 CATCC 27 ATCAAGGAA AT$ATBZIP60 2 (+) 100 No match TCAAG 28 TGGCGCCGC No match 29 GCCGSGGCC MAIZE$ADH1P&11 2 (−) 91.67 No match CCCCGG MAIZE$ADH1P 3 (+) 90 No match CGTGG AS$mEMBP_17 3 (−) 91.67 EmBP-1a GCCACG MAIZE$ADH11 4 (−) 90 GCBP-1, Sp1 GCCCC 30 AATTTTRGT AS$PF1_01 3 (+) 90 PF1 TTTTT Species/gene identifier represents species and gene acronyms ($) and consecutive site number in which the identified motif is found in plant genes. Position indicates the position and strand within the consensus motif where the TF is predicted to bind; score is a measure of the match between the consensus sequence and the known binding site sequence with 100 being perfect expression rate of 73.4% of our predicted genes is genes co-expressed with the LEA genes in maturing rice significantly higher (see p values) using both cut-off criteria embryos. This value is significantly greater than the and provides strong support for the method applied here. proportion of all rice genes that were determined, based on MPSS experimental data, to be expressed in maturing rice embryos, being 27.99% (TPM≥1) and 20.07% (TPM≥ Discussion 4). These findings are consistent with a number of other studies in plants that have used promoter motif analysis to The promoter regions of eukaryotic genes contain important link gene groups to defined biological processes [11, 31]. regulatory elements that are largely responsible for coordi- In plants, the LEA genes are believed to function in nating their transcriptional responses [2, 6]. We have protecting cellular components during developmentally developed a method that, based on promoter content induced desiccation in embryos and during water deficit similarity, constructed a putative network of genes that we stress in vegetative tissue [9]. We identified a group of 31 predicted to be co-expressed with LEA genes in maturing LEA genes that were experimentally determined (MPSS) to rice embryos. Experimental verification of our predictions be co-expressed in maturing rice embryos (Supplementary determined that 179 (73.36%) out of 244 of the predicted File 1), and using ab initio methodology, we identified 30 182 Rice (2008) 1:177–187 Fig. 1 The average spatial distribution of all 30 identified motifs relative to the TSS across promoters of all 244 predicted genes. enriched motif families in the promoters of these genes An analysis of the consensus family motifs using the (Table 1). These motifs we believe could be causative for PATCH program in the Transfac database indicated that 19 co-expression of the LEA genes in maturing embryos. of the 30 consensus motifs contain sequences that corre- spond to experimentally confirmed binding sites for specific plant TFs (Table 1). A number of these TFs correspond to those that are well-established regulators of transcriptional responses during water-deficit-related abiotic stresses and embryo development which are both well- established conditions that induce the expression of LEA genes in plants (see Supplementary File 2 for description of TFs) [24]. In brief, according to the PATCH program, the sequences of some of these motifs correspond to abscisic acid (ABA) response elements (ABRE, ABA being a key abiotic stress-activated plant hormone) and dehydration response elements (DRE) which are considered master switches in regulating drought-, cold-, and high salt- responsive gene expression in plants including that of LEA genes [8]. Additionally, a number of motifs that Fig. 2 Network diagram depicting the link/edge relationship between regulate endosperm-specific gene expression were also a single LEA gene (yellow) and its associated predicted genes (purple) identified including DNA binding with one finger (DOF) that share the highest number of common promoter motifs. Rice (2008) 1:177–187 183 183 class prolamine-box binding factors (PBF [33]) and MYB 244 genes (73.36%) co-expressed with the LEA genes in class GAMYB TFs [7]. The identification of these motifs in maturing rice embryos. The high success rate of our pre- the promoters of the LEA genes is consistent with their dictions is put into perspective when considering other studies being representative of promoter elements that would that have attempted to identify groups of co-responsive regulate the transcription of LEA genes and other genes genes based on the presence of specific cis-elements in their regulated during abiotic stresses and during embryo promoters. Attempts to identify ABA-responsive genes in development. plants have reported success rates of 67.5% in Arabidopsis The occurrences of each of these motifs in the promoter [35] and 49% in rice [22], with the latter being considered of each LEA gene were used to build a promoter signature particularly high by the authors. It is also noteworthy that the for each individual LEA gene. The signature for each LEA success rate reported for Arabidopsis was based on their top gene was then used to identify other genes in the rice 40 predicted genes and not all predicted genes as in our genome that contained the most similar signature (by way study. Further, both these studies were dependent on of the highest number of shared motifs) and thus, have the knowledge of well-defined experimentally determined cis- greatest potential to be co-regulated with the LEA genes. elements for their analysis. This analysis identified an additional 244 rice genes that The high success rate of our study also compares quite were included in a putative co-expressing LEA gene favorably with similar studies performed in non-plant network. There was an enrichment of some motifs in organisms. In Drosophila, the identification of Dorsal promoter regions ranging from 0 to 200 nucleotides responsive genes based on the presence of known cis- downstream of the transcription start site (TSS; Fig. 1). This elements yielded a 34% success rate [18], while in the observation is consistent with a study in Arabidopsis which nematode, Caenorhabditis elegans, the use of defined cis- documented that promoters have a compact nature [31]. elements that are characteristic to target gene promoters A detailed literature search that was performed for 110 reported an overall success rate of 72% for 57 arbitrarily of the 244 identified genes indicated that the function of selected predictions of interneuron AIY-expressed genes many of the genes, which possessed functional descrip- [32]. That analysis, however, required the use of defined tions, could reasonably be linked to LEA gene functions AIY motifs and phylogenetic footprinting over genomic during embryo development and water deficit stress sequence data from two nematodes to identify candidate responses (Supplementary File 5). genes. In comparison, our method predicted 244 genes The putative LEA co-expressing gene network included using genomic sequence data from a single organism. a number of genes encoding lipid transfer/seed storage Contrary to Wenick and Hobert [32], all of our predictions proteins, lipolytic enzymes, and amino acid transporters were experimentally tested, with no selection bias, and which may be involved in the building/mobilization of 73.36% of genes were confirmed to co-express with the storage reserves during seed embryonic development. LEA genes. As noted above and depicted in Fig. 3, the Further, numerous A1 peptidases were also present which have been shown to be expressed in developing seed pods and be involved in the proteolytic processing and matura- tion of seed storage proteins in numerous plant species including rice [12] and additionally have proteolytic roles during water deficit stress [5]. The list also included genes involved in abiotic stress signaling including ABA-inducible kinases and some well- characterized components of the phosphatidylinositol sec- ond messenger signaling pathway [26], cellular protection and detoxification, photosynthesis, ion transport, and cell cycle regulators. It is also worth noting that 44 hypothetical proteins with unknown functions were identified and confirmed to be co-expressed with the LEA genes thus linking them to a specific biological response. These genes are thus interesting candidates for future studies investigat- ing systemic late embryogenesis and/or drought-response- related genes that can be targeted for biotechnological interventions. Fig. 3 Correlation between the percentage of predicted genes As previously stated, experimental validation of our confirmed to co-express in the maturing embryo and the number of putative LEA gene network determined that 179 of the motifs they share with the original LEA genes. 184 Rice (2008) 1:177–187 success rate of our method increases up to 100% if we promoter motif content similarity, which we predict to select the top ranked genes, i.e., those that share 27 or more function coherently in response to defined biological common motifs. The strong positive correlation between conditions. Thus, both genes with and without known motif number and co-expression provides compelling assigned functions can be linked to specific biological evidence that supports the biological relevance of the processes based on their promoter similarities and their identified motifs. Furthermore, the positive expression of predicted co-expression (under specific conditions) with 73.36% of our predicted genes significantly exceeded that genes of well-defined functions. Although this study was from all rice genes determined to express in maturing rice performed in rice, we believe it can be applied to a wide embryos by MPSS, being 20.1% (when using the TPM≥4 range of eukaryotes, including other plant species, animals, cut-off used for LEA and predicted genes). Even when humans, and fungi since gene transcription is critically applying a less stringent positive expression criterion regulated by the combinational presence and use of specific (TPM≥1), only 28.0% of rice genes expressed positive, cis-regulatory sequences in the promoter regions of genes providing compelling support for our method as also in eukaryotic organisms in general [30]. demonstrated with the previously determined p values for In comparison to other approaches, our method (a) the enrichment of experimentally confirmed expression in predicts co-expressing genes by selecting the best matches our predicted gene set. for each promoter of the TGG relying on the specific We were intrigued to determine if the sole presence of promoter motif combinations, (b) does not require previ- our motifs that correspond with ABRE and DRE in the ously defined models of transcription factor binding sites or promoters of the predicted 244 genes could alone account knowledge of specific transcription factors that control for the found expression in mature embryos. Our results TGG, (c) uses sequence data of only one genome, and (d) is show that out of 179 genes co-expressing with LEA applicable to any genome. genes, 72.07% (129/179) contain motifs related to DRE or ABRE or both. At the same time, for the predicted Conclusions genes that were not co-expressed with the LEA genes (65), we observe that 69.23% (45/65) contain motifs that In summary, we demonstrate that similarities in promoter correspond to DRE or ABRE or both (see Supplementary composition, interpreted in terms of the pool and number File 9). We therefore conclude that the presence of DRE or of shared motifs, can be used to identify putative ABRE motifs, or both, in promoters is not itself sufficient transcriptional networks of genes that co-express with rice to account for the accurate prediction of co-expression. LEA genes. A literature analysis indicates that many of Thus, other motifs that we identified appear to be required these genes could plausibly function coherently with the LEA genes during developmentally induced desiccation in and may act synergistically with these to secure specific gene co-expression. the embryo. This type of analysis can greatly contribute The higher overall success rate of our study may reflect towards understanding the function of newly annotated the limitations in other methods that are dependent on using genes since it can be used to functionally associate them known and exclusive types of cis-elements in predicting co- with genes that have well-defined functions in specific responsive genes. Since not all cis-elements are known and biological processes. Further, it provides valuable informa- transcription regulation most likely results from the tion regarding the transcriptional regulation of functionally presence and a combinational use of multiple transcription related gene networks which could greatly facilitate in bio- factor binding sites [21], computational identification of technological manipulations to improve cellular responses such sites can provide a rapid and cost-effective method for to specific biological conditions. identifying groups of co-expressing genes with high success. Additionally, the use of computationally derived motifs allows a global spectrum of application of the Methods method since it can be applied to any biological process occurring within a eukaryotic organism without relying on Target gene group and promoters or being restricted to well-studied processes in well-studied organisms that have experimentally confirmed cis-elements The first step of the analysis is the selection of a target available. group genes. In our case we identified 31 rice LEA genes This computationally based prediction technique is through MPSS analysis [19] that were determined to be particularly useful and applicable to newly sequenced expressed in mature rice embryos (see Supplementary File eukaryotic genomes from species for which there is little 1). MPSS provides a comprehensive assessment of gene global expression data available. The technique can be used expression by generating short sequence tags, each 20 bp to build putative transcriptional networks of genes based on long, produced from a defined position for each transcript. Rice (2008) 1:177–187 185 185 Promoter sequences for genes covering the region [−2,000, The final extension step was performed at 72°C for 3 min. +200] relative to the transcription start site were obtained The PCR products were separated by electrophoresis and from the International Rice Genome Sequencing Project stained with ethidium bromide. The sample collection, RT- [14]. PCR, and gel analysis were all performed in duplicate. The RT-PCR gel image (Supplementary File 7) intensi- Motif identification ties were graded using standard techniques. A value of 0 was assigned to genes when no PCR products were To identify motifs enriched in the promoter regions, we detected. Samples that gave positive products were used the Dragon Motif Builder system (http://apps.sanbi.ac. assigned a value of 1 (weakest) to 5 (strongest). The values za/MotifBuilder/index.php)[13]. In total, we identified 30 presented in Supplementary File 6 are the average values enriched motif families with motifs of nine nucleotides in determined by three independent assessments. length. We used the following parameters: method = EM2, threshold=0.875, and the random DNA background with Massively parallel signature sequencing data equal proportion of the four nucleotides. The details about the algorithm of DMB and a guide for interpretation of its RNA samples were extracted from ME and YE. The RNA results can be found on the system’s website. The spatial samples were sent to Illumina Company for custom service distribution of all 30 motifs in the promoter regions of the of MPSS analysis [19]. The total tag number received from predicted genes was determined (Fig. 1). Illumina was 3,520,358. The raw number was normalized to a metric of TPM. Positive expression of LEA genes Determining promoter with similar content based on MPSS data was limited to a minimal signal of at least 4 TPM according to Brandenberger et al. [3]. We have used position weight matrix of each of the 30 The percentage of all genomic non-transposable element motif families identified with DMB, and with the same genes that are expressed in maturing rice embryos was threshold used for motif identification, we predicted motifs determined using global MPSS expression analysis. This on the promoters of all rice genes. Then, for each of the analysis was performed using a cut-off of at least 4 TPM promoters of genes from the TGG, we searched all other (as used for positive selection of LEA genes and predicted promoters that shared with it the highest number of genes) and also with the less stringent cut-off of at least common annotated promoter motifs. We have limited the 1 TPM. number of predicted promoters/genes to the top three promoters that shared the highest number of common Statistical test for enrichment promoter motifs with the TGG. If it was not possible to limit the number of candidate promoters to three, we extended the We calculate the p values for the enrichment of the set of associated promoters to include all those promoters that experimentally confirmed genes that co-express with LEA had the highest number of promoter elements. These genes in our computationally predicted gene set, relative to associations were then used to generate a TGG-like tran- the whole rice genome. We used Fisher’s exact right-side scriptional regulatory network (see Supplementary File 4). test based on hypergeometric distribution and corrected for multiplicity testing by the Bonferroni method. The param- Experimental confirmation of co-expression of predicted eters used are as follows: genes with the TGG Genes predicted to co-express with the LEA genes: n= The rice cultivar Tainung 67 (Oryza sativa L. ssp. japonica) 244 Genes with experimentally confirmed expression (out was grown in the paddy field at the Academia Sinica campus. Embryos were harvested and dissected from the of 244): k=179 seeds at 15–20 or 25 days after pollination (DAP) and Total number of genes in the rice genome: N=41,047 designated as milky stage embryos (ME) and yellow stage Total number of rice genes expressing in embryo embryos (YE), respectively. The total RNA was extracted (TPM≥1): K=11,488 using Trizol (Invitrogen). First-strand cDNA was synthe- Bonferroni correction factor=41,047 sized using standard protocols and SuperScript III reverse p value=4.62e−049, corrected for multiplicity testing p value=1.90e−044 transcriptase (Invitrogen). The PCR reaction was performed using the primers sets listed in Supplementary File 8. The Genes predicted to co-express with the LEA genes: n= amplification was performed using 30 or 35 cycles consist- ing of 15 s at 94°C, 30 s at 60°C, and 60 s at 72°C, Genes with experimentally confirmed expression (out following an initial denaturation cycle of 2 min at 94°C. of 244): k=179 186 Rice (2008) 1:177–187 expression and relation to drought susceptibility. FEBS Lett Total number of genes in the rice genome: N=41,047 2001;492:242–6. Total number of rice genes expressing in embryo 6. Davidson EH, McClay DR, Hood L. Regulatory gene networks (TPM≥4): K=8,241 and the properties of the developmental process. Proc Natl Acad Bonferroni correction factor=41,047 Sci U S A 2003;100:1475–80. 7. Diaz I, Vicente-Carbajosa J, Abraham Z, Martinez M, Isabel-La p value=3.33e−072, corrected for multiplicity testing Moneda I, Carbonero P. The GAMYB protein from barley p value=1.37e−067 interacts with the DOF transcription factor BPBF and activates endosperm-specific genes during seed development. Plant J 2002;29:453–64. Online database 8. Dubouzet JG, Sakuma Y, Ito Y, Kasuga M, Dubouzet EG, Miura S, Seki M, Shinozaki K, Yamaguchi-Shinozaki K. OsDREB genes We have created an online Dragon Database for Explora- in rice, Oryza sativa L., encode transcription activators that function in drought-, high-salt- and cold-responsive gene expres- tion of late embryogenesis abundant genes in rice (http:// sion. Plant J 2003;33:751–63. apps.sanbi.ac.za/dlea) to allow access to our results and 9. Dure LI, Crouch M, Harada J, Ho T-HD, Mundy J, Quatrano R, data. Using Rice Annotation Project (RAP, eg: Os01 Thomas T, Sung ZR. Common amino acid sequence domains among g0159600) or TIGR (LOC_Os01g06630) identifiers, one the LEA proteins of higher plants. Plant Mol Biol 1989;12:475–86. 10. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis can access the promoter details for individual genes. This and display of genome-wide expression patterns. Proc Natl Acad provides information on the number of occurrences and Sci U S A 1998;95:14863–8. spatial location of all motifs present in individual gene 11. Geisler M, Kleczkowski LA, Karpinski S. A universal algorithm promoters. Further, the database also contains information for genome-wide in silico identification of biologically significant gene promoter putative cis-regulatory-elements; identification of generated with the DMB algorithm that illustrates the new elements for reactive oxygen species and sucrose signaling in spatial distribution of the best motifs from each of the Arabidopsis. Plant J 2006;45:384–98. motif families in the promoters of the TGG (LEA genes). 12. Hiraiwa N, Kondo M, Nishimura M, Hara-Nishimura I. An The site also contains links to the RAP database (http:// aspartic endopeptidase is involved in the breakdown of propep- tides of storage proteins in protein-storage vacuoles of plants. Eur rapdb.dna.affrc.go.jp/) that provides additional information J Biochem 1997;246:133–41. on gene annotations [15, 20, 28]. 13. Huang E, Yang L, Chowdhary R, Kassim A, Bajic V. An algo- rithm for ab initio DNA motif detection. In: Bajic VB, Tan TW, Acknowledgments SM received postdoctoral fellowship from editors. Information processing and living systems. Singapore: NBN; CG received support from NRF; CRM received support from World Scientific; 2005. p. 611–4. SSABMI program; MK received postdoctoral fellowship from the 14. International Rice Genome Sequencing Project. The map-based Claude Leon Foundation; MM received support from NBN and NRF sequence of the rice genome. Nature 2005;436:793–800. FA2006040900002; YIH received support from NSC; VBB received 15. Itoh T, Tanaka T, Barrero RA, Yamasaki C, Fujii Y, Hilton PB, partial support from the DST/NRF Research Chair grant, NBN, and Antonio BA, Aono H, Apweiler R, Bruskiewich R, Bureau T, NRF grants FA2007051400013, ICD2006071000003, and Burr F, Costa DO, Fuks G, Habara T, Haberer G, Han B, Harada FA2006040900002. E, Hiraki AT, Hirochika H, Hoen D, Hokari H, Hosokawa S, Hsing YI, Ikawa H, Ikeo K, Imanishi T, Ito Y, Jaiswal P, Kanno M, Kawahara Y, Kawamura T, Kawashima H, Khurana JP, Open Access This article is distributed under the terms of the Kikuchi S, Komatsu S, Koyanagi KO, Kubooka H, Lieberherr Creative Commons Attribution Noncommercial License which per- D, Lin YC, Lonsdale D, Matsumoto T, Matsuya A, McCombie mits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited. WR, Messing J, Miyao A, Mulder N, Nagamura Y, Nam J, Namiki N, Numa H, Nurimoto S, O’Donovan C, Ohyanagi H, Okido T, OOta S, Osato N, Palmer LE, Quetier F, Raghuvanshi S, Saichi N, Sakai H, Sakai Y, Sakata K, Sakurai T, Sato F, Sato Y, Schoof H, Seki M, Shibata M, Shimizu Y, Shinozaki K, Shinso Y, References Singh NK, Smith-White B, Takeda J, Tanino M, Tatusova T, Thongjuea S, Todokoro F, Tsugane M, Tyagi AK, Vanavichit A, 1. Allocco DJ, Kohane IS, Butte AJ. Quantifying the relationship Wang A, Wing RA, Yamaguchi K, Yamamoto M, Yamamoto N, between co-expression, co-regulation and gene function. BMC Yu Y, Zhang H, Zhao Q, Higo K, Burr B, Gojobori T, Sasaki T. Bioinformatics 2004;5:18. Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. Genome 2. Beer MA, Tavazoie S. Predicting gene expression from sequence. Res 2007;17:175–83. Cell 2004;117:185–98. 16. Jansen R, Greenbaum D, Gerstein M. Relating whole-genome 3. Brandenberger R, Khrebtukova I, Thies RS, Miura T, Jingli C, expression data with protein–protein interactions. Genome Res Puri R, Vasicek T, Lebkowski J, Rao M. MPSS profiling of 2002;12:37–46. human embryonic stem cells. BMC Dev Biol 2004;4:10. 17. Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P. Coexpression 4. Brazma A, Jonassen I, Vilo J, Ukkonen E. Predicting gene analysis of human genes across many microarray data sets. regulatory elements in silico on a genomic scale. Genome Res Genome Res 2004;14:1085–94. 1998;8:1202–15. 5. Cruz de Carvalho MH, d’Arcy-Lameta A, Roy-Macauley H, 18. Markstein M, Markstein P, Markstein V, Levine MS. Genome- wide analysis of clustered Dorsal binding sites identifies putative Gareil M, El Maarouf H, Pham-Thi AT, Zuily-Fodil Y. Aspartic protease in leaves of common bean (Phaseolus vulgaris L.) and target genes in the Drosophila embryo. Proc Natl Acad Sci U S A cowpea (Vigna unguiculata L. Walp): enzymatic activity, gene 2002;99:763–8. Rice (2008) 1:177–187 187 187 19. Nobuta K, Venu RC, Lu C, Belo A, Vemaraju K, Kulkarni K, Sato Y, Shinso Y, Suzuki M, Takeda J, Tanino M, Todokoro F, Wang W, Pillay M, Green PJ, Wang GL, Meyers BC. An Yamaguchi K, Yamamoto N, Yamasaki C, Imanishi T, Okido T, expression atlas of rice mRNAs and small RNAs. Nat Biotechnol Tada M, Ikeo K, Tateno Y, Gojobori T, Lin YC, Wei FJ, Hsing YI, 2007;25:473–7. Zhao Q, Han B, Kramer MR, McCombie RW, Lonsdale D, 20. Ohyanagi H, Tanaka T, Sakai H, Shigemoto Y, Yamaguchi K, O’Donovan CC, Whitfield EJ, Apweiler R, Koyanagi KO, Habara T, Fujii Y, Antonio BA, Nagamura Y, Imanishi T, Ikeo K, Khurana JP, Raghuvanshi S, Singh NK, Tyagi AK, Haberer G, Itoh T, Gojobori T, Sasaki T. The Rice Annotation Project Fujisawa M, Hosokawa S, Ito Y, Ikawa H, Shibata M, Yamamoto Database (RAP-DB): hub for Oryza sativa ssp. japonica genome M, Bruskiewich RM, Hoen DR, Bureau TE, Namiki N, Ohyanagi information. Nucleic Acids Res 2006;34:D741–744. H, Sakai Y, Nobushima S, Sakata K, Barrero RA, Sato Y, 21. Pilpel Y, Sudarsanam P, Church GM. Identifying regulatory Souvorov A, Smith-White B, Tatusova T, An S, An G, OOta S, networks by combinatorial analysis of promoter elements. Nat Fuks G, Fuks G, Messing J, Christie KR, Lieberherr D, Kim H, Genet 2001;29:153–9. Zuccolo A, Wing RA, Nobuta K, Green PJ, Lu C, Meyers BC, 22. Ross C, Shen QJ. Computational prediction and experimental Chaparro C, Piegu B, Panaud O, Echeverria M. The Rice verification of HVA1-like abscisic acid responsive promoters in Annotation Project Database (RAP-DB): 2008 update. Nucleic rice (Oryza sativa). Plant Mol Biol 2006;62:233–46. Acids Res 2008;36:D1028–1033. 23. Schulze A, Downward J. Navigating gene expression using 29. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. microarrays—a technology review. Nat Cell Biol 2001;3:E190–195. Systematic determination of genetic network architecture. Nat 24. Shinozaki K, Yamaguchi-Shinozaki K. Molecular responses to Genet 1999;22:281–5. dehydration and low temperature: differences and cross-talk 30. Tuch BB, Li H, Johnson AD. Evolution of eukaryotic transcrip- between two stress signaling pathways. Curr Opin Plant Biol tion circuits. Science 2008;319:1797–9. 2000;3:217–23. 31. Vandepoele K, Casneuf T, Van de PY. Identification of novel 25. Stuart JM, Segal E, Koller D, Kim SK. A gene-coexpression regulatory modules in dicotyledonous plants using expression data network for global discovery of conserved genetic modules. and comparative genomics. Genome Biol 2006;7:R103. Science 2003;302:249–55. 32. Wenick AS, Hobert O. Genomic cis-regulatory architecture and 26. Takahashi S, Katagiri T, Hirayama T, Yamaguchi-Shinozaki K, trans-acting regulators of a single interneuron-specific gene Shinozaki K. Hyperosmotic stress induces a rapid and transient battery in C. elegans. Dev Cell 2004;6:757–70. increase in inositol 1,4,5-trisphosphate independent of abscisic acid 33. Wu C, Washida H, Onodera Y, Harada K, Takaiwa F. Quantitative in Arabidopsis cell culture. Plant Cell Physiol 2001;42:214–22. nature of the Prolamin-box, ACGT and AACA motifs in a rice 27. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, glutelin gene promoter: minimal cis-element requirements for Dmitrovsky E, Lander ES, Golub TR. Interpreting patterns of endosperm-specific gene expression. Plant J 2000;23:415–21. gene expression with self-organizing maps: methods and applica- 34. Yan X, Mehan MR, Huang Y, Waterman MS, Yu PS, Zhou XJ. A tion to hematopoietic differentiation. Proc Natl Acad Sci U S A graph-based approach to systematically reconstruct human tran- 1999;96:2907–12. scriptional regulatory modules. Bioinformatics 2007;23:i577–86. 28. Tanaka T, Antonio BA, Kikuchi S, Matsumoto T, Nagamura Y, 35. Zhang W, Ruan J, Ho TH, You Y, Yu T, Quatrano RS. Cis- Numa H, Sakai H, Wu J, Itoh T, Sasaki T, Aono R, Fujii Y, regulatory element based targeted gene finding: genome-wide Habara T, Harada E, Kanno M, Kawahara Y, Kawashima H, identification of abscisic acid- and abiotic stress-responsive genes Kubooka H, Matsuya A, Nakaoka H, Saichi N, Sanbonmatsu R, in Arabidopsis thaliana. Bioinformatics 2005;21:3074–81. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Rice Springer Journals

Loading next page...
 
/lp/springer-journals/the-promoter-signatures-in-rice-lea-genes-can-be-used-to-build-a-co-Dyk00sr0LW

References (38)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2008
Subject
Life Sciences; Plant Sciences; Plant Genetics & Genomics; Plant Breeding/Biotechnology; Agriculture; Plant Ecology
ISSN
1939-8425
eISSN
1939-8433
DOI
10.1007/s12284-008-9017-4
Publisher site
See Article on Publisher Site

Abstract

Rice (2008) 1:177–187 DOI 10.1007/s12284-008-9017-4 The Promoter Signatures in Rice LEA Genes Can Be Used to Build a Co-expressing LEA Gene Network Stuart Meier & Chris Gehring & Cameron Ross MacPherson & Mandeep Kaur & Monique Maqungo & Sheela Reuben & Samson Muyanga & Ming-Der Shih & Fu-Jin Wei & Samart Wanchana & Ramil Mauleon & Aleksandar Radovanovic & Richard Bruskiewich & Tsuyoshi Tanaka & Bijayalaxmi Mohanty & Takeshi Itoh & Rod Wing & Takashi Gojobori & Takuji Sasaki & Sanjay Swarup & Yue-ie Hsing & Vladimir B. Bajic Received: 3 July 2008 /Accepted: 31 October 2008 /Published online: 22 November 2008 The Author(s) 2008. This article is published with open access at Springerlink.com Abstract Coordinated transcriptional modulation of large promoters with all other promoters. When the method was gene sets depends on the combinatorial use of cis-regulatory tested in rice starting from a group of co-expressing Late motifs in promoters. We postulate that promoter content Embryogenesis Abundant (LEA) genes, we obtained a similarities are diagnostic for co-expressing genes that function promoter similarity-based network that contained candidate coherently during specific cellular responses. To find the co- genes that could plausibly complement the function of LEA expressing genes we propose an ab initio method that identifies genes. Importantly, 73.36% of 244 genes predicted by our motif families in promoters of target gene groups, map these method were experimentally confirmed to co-express with the families to the promoters of all genes in the genome, and LEA genes in maturing rice embryos, making this methodology determine the best matches of each of the target group gene a promising tool for biological systems analyses. Stuart Meier, Chris Gehring, Yue-ie Hsing, and Vladimir Bajic are the first authors. Electronic supplementary material The online version of this article (doi:10.1007/s12284-008-9017-4) contains supplementary material, which is available to authorized users. : : : : : : S. Meier C. R. MacPherson M. Kaur M. Maqungo M.-D. Shih F.-J. Wei Y.-i. Hsing : : : S. Muyanga A. Radovanovic B. Mohanty V. B. Bajic (*) Institute of Plant and Microbial Biology, Academia Sinica, South African National Bioinformatics Institute (SANBI), Taipei, Taiwan University of the Western Cape, : : Cape Town, South Africa S. Wanchana R. Mauleon R. Bruskiewich e-mail: vlad@sanbi.ac.za International Rice Research Institute (IRRI), URL: www.sanbi.ac.za/people/faculty/professors/vlad-bajic/ Metro Manila, Philippines : : T. Tanaka T. Itoh T. Sasaki National Institute of Agrobiological Sciences, C. Gehring Tsukuba, Japan Department of Biotechnology, University of the Western Cape, R. Wing Cape Town, South Africa Department of Plant Sciences, University of Arizona, Tucson, USA S. Reuben S. Swarup T. Gojobori Department of Biological Sciences, National Institute of Genetics, National University of Singapore, Shizuoka, Japan Singapore, Singapore 178 Rice (2008) 1:177–187 . . Keywords Transcription regulation Co-expression developed that based on promoter content similarity builds Co-regulation a putative transcriptional regulatory network of co- expressed genes. We use the term ‘network’ to define a group of genes that are linked by the fact they contain a Introduction common set of motifs in their promoters that we believe could be causative for their transcriptional regulation. The In the post-genomic sequencing era, computationally based promoters of these genes would presumably bind common tools are required to help decipher biologically meaningful TFs thus connecting the genes into a putative transcription- information from the masses of sequence data generated. al regulatory gene network. Computationally based homology comparisons are com- We tested our method in rice using a group of late monly used to infer gene functions based on similarities to embryogenesis abundant (LEA) genes that were confirmed previously functionally annotated genes. While extremely to be co-expressed in developing embryos. In plants, the useful, homology comparisons are somewhat limited to the LEA genes are believed to function in protecting cellular identification of ‘more of the same’ type of genes and fail components during developmentally induced desiccation in to provide information regarding the temporal, spatial, and embryos and during water deficit stress in vegetative tissue stimulus-specific context in which the gene is expressed [9]. We, therefore, hypothesize that other genes that are and active. This problem is particularly apparent when determined to share the most similar promoter motif considering within a genome large gene families which combinations with the LEA genes will function coherently share high sequence similarity yet function within distinct with them in achieving a common cellular response, which cellular responses. Alternatively, although global gene will be manifested by their co-expression with the LEA expression studies, such as microarray, can reveal transcrip- genes. Experimental validation of our predictions shows tional responses of entire genomes in a single experiment, that 73.36% of the 244 predicted genes co-express with the they only provide expression profiles at specific time points LEA genes. In addition, a literature analysis indicated that in response to a specific stimulus. Furthermore, the biological the function of many of the genes could plausibly roles of many of the genes identified in large-scale expression complement the function of the LEA genes. studies are not well characterized and do not link genes to specific regulatory pathways since the expression profile can be a direct or indirect result of the treatments. Results In eukaryotes, many cellular processes require the coherent participation of multiple gene products as evident Method outline by the co-expression of large sets of genes in response to specific stimuli [10, 27, 29]. Furthermore, a number of We have developed a method that builds a putative studies have shown that genes that have been confirmed to transcriptional network of co-expressed genes based on them be co-expressed in response to a range of conditions have sharing highly similar promoter contents. The network correlated functional relationships, including physical inter- building relies on a reference target gene group (TGG) that actions between their proteins [1, 16, 17, 25, 34,]. is defined in terms of being co-expressed in response to a Collectively, these studies indicate that cells possess a specific biological condition. A typical example could be a mechanism that coordinates the expression of genes that are cluster of co-expressing genes identified in a microarray involved in common functional responses. expression experiment. The promoters of these genes are Accordingtothe cis-regulatory logic [2, 6], the then collectively assessed for the presence of specific regulation of eukaryotic gene expression is critically signatures in the form of specific motif combinations that dictated by the combinational presence (and effect) of we believe could be causative for their transcriptional regulatory motifs, or signatures, in their promoters which is responses. The signatures identified in each of the individual necessitated by the specific binding requirements of promoters of the TGG are then mapped to other promoters in transcription factors (TFs) [1, 2, 4, 6, 23]. Genomic the genome. These signatures thus serve to identify other sequences contain these regulatory motifs encoded mainly genes that share the most similar promoter content and thus in promoter regions of individual genes. have the greatest potential to be co-regulated with each gene We hypothesize that promoter content similarity can of the TGG. The method generates a putative transcriptional therefore be used to identify groups of co-expressed genes network that contains groups of candidate genes that we that function coherently during defined cellular processes, predict will co-express and function coherently with the TGG including changes in growth and development programs or in producing a common cellular response. This method environmental challenges. We report here a method we extends the regulatory relationships of a TGG to other Rice (2008) 1:177–187 179 179 candidate genes and thus links them to a well-defined promoters of the LEA genes (see Supplementary File 3). A biological response providing insights into the biological complete network diagram (see Supplementary File 4) was context in which the gene(s) functions. constructed to illustrate the edge relationship between the The method described above briefly consists of the TGG and the predicted genes. Figure 2 illustrates such a following steps (details of which, related to the implemen- relationship between a single LEA gene and its neighbors tation we made, are given in the “Methods” section): in the network. A detailed literature search that was performed for 110 1. Determine the target gene group based on their co- of the 244 identified genes indicated that the function of expression in a common systemic response. many of the genes, which possessed functional descrip- 2. Identify promoters for the TGG genes. tions, could reasonably be linked to embryo development 3. Identify enriched motif families in the promoters of the and water deficit stress responses (Supplementary File 5). TGG. 4. Map identified motif families to all promoters of the Experimental validation of predicted gene co-expression genome. Overlapping of mapped motifs is allowed. 5. For each of the promoters of the TGG, search for other Experimental validation of our method was obtained using promoters in the genome that share the highest number semi-quantitative reverse transcriptase polymerase chain of the mapped motifs with the individual TGG reaction (RT-PCR) and MPSS expression analysis to promoter. We hypothesize that genes associated with determine if the predicted genes are co-expressed with the these identified promoters have a high probability to LEA genes in maturing embryos. The results show that co-express with the genes in TGG under the same (based on RT-PCR and MPSS) 179 (73.36%) out of the 244 biological conditions. genes tested were co-expressed with the LEA genes (see Supplementary Files 6 and 7). A more detailed analysis revealed a strong positive correlation between the number Identification of TGG and construction of a putative of motifs shared between genes and the percentage that co-expressing gene network were co-expressed (Fig. 2). We found that 100% of the predicted genes that shared 27 or more motifs with the LEA We tested our method on the recently sequenced rice genes were co-expressed with the LEA genes, compared to genome and used 31 annotated LEA genes as the TGG. the 73.36% for the overall prediction success. This analysis These LEA genes were all determined to be co-expressed in thus provides compelling experimental support for our mature rice embryos as determined from massively parallel method since it illustrates an extremely high correlation signature sequencing (MPSS) expression data (see Supple- coefficient between the number of shared motifs and co- mentary File 1). The Dragon Motif Builder (DMB) program expression (correlation coefficient=0.97) when we consider was used to identify 30 enriched motif families in the pro- genes with 22 or more shared motifs. moters of the LEA genes (Table 1). For each of the motif Further, in order to test whether the proportion of our families, the consensus motif was determined. The PATCH predicted genes found to be expressed in maturing embryos program of the Transfac database suite indicated that 21 of was significantly greater than that for the whole rice the 30 identified consensus motifs conform to known plant genome, we performed a global MPSS expression analysis cis-elements and 19 of these contain sequences that to determine the percentage of all non-transposable element correspond to binding sites for known plant TFs (Table 1) (TE) genes that are expressed in maturing rice embryos. some of which have been shown to regulate the expression of According to TIGR v.5, there are 41,047 non-TE genes in LEA genes (Table 1 and Supplementary File 2). the rice genome. Using MPSS analysis of matured rice The presence and abundance of the 30 motif families in embryos, we found that 27.99% (11,488) non-TE genes are the individual promoters of the TGG was used to build expressed in maturing rice embryos with a TPM≥1, and promoter signatures for each of the individual LEA genes. 20.07% (8241) non-TE genes are expressed with a TPM≥4 These signatures were then used to map to the most similar (TPM stands for ‘transcripts per million’). Consequently, promoters in the genome and thus identify other genes that the enrichment of the experimentally confirmed genes that have the greatest potential to be co-regulated with the LEA co-express with LEA genes in our computationally predicted genes. A summary of the average spatial distribution of gene set, relative to those from the whole rice genome that each of the 30 identified motifs in the promoters of the express in maturing embryos, is characterized by the p values predicted genes is depicted in Fig. 1.Thisanalysis of 1.90e−044 (TPM≥1) and 1.37e−067 (TPM≥4). These p identified an additional 244 genes that shared the highest values represent the p values corrected for multiplicity number of common motifs with each of the individual testing (see details in “Methods”). Therefore, the successful 180 Rice (2008) 1:177–187 Table 1 Identified Consensus Promoter Motifs in Original LEA Genes and the Plant TFs That Were Predicted to Bind to Them in the PATCH Program Consensus motif Species/gene identifier Position Score Predicted plant TF Site binding pattern sequence 1 GAGAAGAAG AT$PHYA_01 2 (−) 100 CAMTA3 TCTTCT 2 GGCGCGYGG AT$AVP1_01 2 (−) 91.7 (VOZ1&2)2, CAMTA1 ACGCGC RICE$ZB8_02 3 (+)(−) 91.7 CBT CGCGCG AS$CBT_01 3 (+) 91.7 CBT CGCGCG CACGCG MAIZE 5 (+) 90.0 No match CGTGG $ADH1P_01&03 DAUCE$DC3_04 3 (−) 91.7 DPBF-1, DPBF-2 CACGCG 3 CCGTCGWCC AT$H4_05 1 (+) 100 CCGTCG AT$COR15A_01 3 (−) 90 ANT, CBF1, CBF2, DREB1A, CCGAC ERFLP1, TSI1 AT$RD29B_01 3 (−) 90 CBF1 CCGAC AT$COR78_01 3 (−) 90 ANT, CBF1, CBF2, DREB1A CCGAC AT$COR15B_01 3 (−) 90 CBF1, CBF2, DREB1A CCGAC RAPE$BN115_01 3 (−) 90 CBF17, CBF5 CCGAC AT$FL0521F13_01 3 (−) 91.7 DREB1A GCCGAC BAR$HVA1_03 3 (−) 90 CBF1, CBF2 CCGAC GOSHI$LEAD113_01 3 (−) 90 DBP1 CCGAC AT$COR78_01 3 (−) 90 ANT, CBF1, CBF2, DREB1A CCGAC AT$COR15A_03 3 (−) 91.7 DREB1A GCCGAC AS$CEF1_02 3 (−) 90 CEF1 CCGAC GOSHI$LEAD113_01 3 (−) 90 DBP1 CCGAC 4 GCGGAGAAG No match 5 GCVGGGCAG MAIZE$ADH11S_06 3 (−) 90 GCBP-1, Sp1 GCCCC 6 AACADCAAA WHEAT$CATHB_08 1 (−), 2 90 GAMYB TTGTT (−) 7 AGCAGCAGC No match 8 MCCGACGGC AT$COR15A_03&04 1 (+) 91.7 DREB1A GCCCAG MAIZE$DHN1_01 1 (+) 91.7 DBF1, DBF2 ACCGAC AS$TINY2_01 1 (+) 91.7 TINY2 ACCGAC HELAN$HSP176_02 1 (−) 91.7 No match GTCGGT AT$COR15A_01 2 (+) 100 ANT, CBF1, CBF2, DREB1A, CCGAC ERFLP1, TSI1 RAPE$BN115_02 2 (+) 100 CBF17, CBF5 CCGAC AS$DREBLP1_01 2 (+) 100 DREBLP1 CCGAC GOSHI$LEAD113_01 2 (+) 100 DBP1 CCGAC AT$H4_05 3 (−) 100 No match CCGTCG 9 ACACATACG No match 10 TTCMTTTCA DAUCE$EXT_02 1 (−) 92.86 No match AAATGAA POT$KST1_01 1 (−) 90 DOF1 AAAAG BAR$CPI_01 3 (−) 90 PBF, SED AAAGG AT$WUSCHEL_01 4 (−) 91.7 No match TGAAAA 11 AWATTATAT No match 12 CGGCGSCGG AT$HLS1_01 2 (−) 91.7 ATERF7, ERF-1,2,3,4,5, ERFLP1 GCCGCC TO$NP24PP_0 2 (−) 91.7 ERF-1,2,3,4 GCCGCC AS$GCCBOX_02 2 (−) 91.7 Pti4 GCCGCC AS$CEF1_01 2 (−) 91.7 CEF1 GCCGCC BAR$HVA1_04 3 (+) 90 CBF1 GCCGCC AT$H4_05 4 (−) 91.7 No match CCGTCG AT$COR15A_0 5 (−) 90 ANT; CBF1,2; DREB1A, ERFLP1, CCGAC TSI RAPE$BN115_01 5 (−) 90 CBF17, CBF5 CCGAC 13 CTTCTTCCT No match 14 AAAATAATA SOYBN$VSPB_03 1 (−) TATTTT Rice (2008) 1:177–187 181 181 Table 1 (continued) Consensus motif Species/gene identifier Position Score Predicted plant TF Site binding pattern sequence 15 AAATYGARA AS$ARR10_17 2 (−) 90 ARR10 CGATT 16 AGAAGATCA AT$PHYA_01 1 (−) 100 CAMTA3 TCTTCT 17 RCAGCAGCA No match 18 CGCGCGGCG RICE$ZB8_02 1 (+) 100 CBT CGCGCG 19 GTTAMATAT AT$CAB2_03 1 (+) 90 GT-3a GTTAC PV$PHS_03 2 (+) 90 No match TTAAA RICE$ZB8_01 2 (−) 92.86 TBP2 TATTTAA MAIZE$PMS1_ 3 (+) 92.86 No match TAAATAT 20 TTGYTTAAT WHEAT$CATHB 1 (+) 90 GAMYB TTGTT AS$ARR10_18 2 (+) 90 ARR10 TGATT PEA$RS3A_03 3 (+) 91.67 GT-1, GT-1a, SBF-1 GGTTAA OAT$PHYA3_0 3 (+) 92.86 No match GGTTAAT RICE$PHYA_0 3 (+) 92.86 GT-1, GT-2 GGTTAAT PV$PHS_03 4 (+) 91.67 No match TTTAAT PV$PHS_03 4 (−) 90 No match TTAAA 21 TGTACTCSC TO$LAP171A_ 3 (−) 100 JAMYC2 GAGTA 22 MSGATGRTG BARL$CAB11_12 2 (−) 90 MCB1, MCB2 CATCC 23 AGCACACAT No match 24 CMAAAAGCT AS$PF1_01&02 2 (−) 90 PF1 TTTTT POT$KST1_01 3 (+) 100 DOF1 AAAAG 25 CGGCTCGCC No match 26 GAATGGATG WHEAT$CAB1_ 4 (−) 100 MCB1, MCB2 ATCCA BARL$CAB11&12 5 (−) 100 MCB1, MCB2 CATCC 27 ATCAAGGAA AT$ATBZIP60 2 (+) 100 No match TCAAG 28 TGGCGCCGC No match 29 GCCGSGGCC MAIZE$ADH1P&11 2 (−) 91.67 No match CCCCGG MAIZE$ADH1P 3 (+) 90 No match CGTGG AS$mEMBP_17 3 (−) 91.67 EmBP-1a GCCACG MAIZE$ADH11 4 (−) 90 GCBP-1, Sp1 GCCCC 30 AATTTTRGT AS$PF1_01 3 (+) 90 PF1 TTTTT Species/gene identifier represents species and gene acronyms ($) and consecutive site number in which the identified motif is found in plant genes. Position indicates the position and strand within the consensus motif where the TF is predicted to bind; score is a measure of the match between the consensus sequence and the known binding site sequence with 100 being perfect expression rate of 73.4% of our predicted genes is genes co-expressed with the LEA genes in maturing rice significantly higher (see p values) using both cut-off criteria embryos. This value is significantly greater than the and provides strong support for the method applied here. proportion of all rice genes that were determined, based on MPSS experimental data, to be expressed in maturing rice embryos, being 27.99% (TPM≥1) and 20.07% (TPM≥ Discussion 4). These findings are consistent with a number of other studies in plants that have used promoter motif analysis to The promoter regions of eukaryotic genes contain important link gene groups to defined biological processes [11, 31]. regulatory elements that are largely responsible for coordi- In plants, the LEA genes are believed to function in nating their transcriptional responses [2, 6]. We have protecting cellular components during developmentally developed a method that, based on promoter content induced desiccation in embryos and during water deficit similarity, constructed a putative network of genes that we stress in vegetative tissue [9]. We identified a group of 31 predicted to be co-expressed with LEA genes in maturing LEA genes that were experimentally determined (MPSS) to rice embryos. Experimental verification of our predictions be co-expressed in maturing rice embryos (Supplementary determined that 179 (73.36%) out of 244 of the predicted File 1), and using ab initio methodology, we identified 30 182 Rice (2008) 1:177–187 Fig. 1 The average spatial distribution of all 30 identified motifs relative to the TSS across promoters of all 244 predicted genes. enriched motif families in the promoters of these genes An analysis of the consensus family motifs using the (Table 1). These motifs we believe could be causative for PATCH program in the Transfac database indicated that 19 co-expression of the LEA genes in maturing embryos. of the 30 consensus motifs contain sequences that corre- spond to experimentally confirmed binding sites for specific plant TFs (Table 1). A number of these TFs correspond to those that are well-established regulators of transcriptional responses during water-deficit-related abiotic stresses and embryo development which are both well- established conditions that induce the expression of LEA genes in plants (see Supplementary File 2 for description of TFs) [24]. In brief, according to the PATCH program, the sequences of some of these motifs correspond to abscisic acid (ABA) response elements (ABRE, ABA being a key abiotic stress-activated plant hormone) and dehydration response elements (DRE) which are considered master switches in regulating drought-, cold-, and high salt- responsive gene expression in plants including that of LEA genes [8]. Additionally, a number of motifs that Fig. 2 Network diagram depicting the link/edge relationship between regulate endosperm-specific gene expression were also a single LEA gene (yellow) and its associated predicted genes (purple) identified including DNA binding with one finger (DOF) that share the highest number of common promoter motifs. Rice (2008) 1:177–187 183 183 class prolamine-box binding factors (PBF [33]) and MYB 244 genes (73.36%) co-expressed with the LEA genes in class GAMYB TFs [7]. The identification of these motifs in maturing rice embryos. The high success rate of our pre- the promoters of the LEA genes is consistent with their dictions is put into perspective when considering other studies being representative of promoter elements that would that have attempted to identify groups of co-responsive regulate the transcription of LEA genes and other genes genes based on the presence of specific cis-elements in their regulated during abiotic stresses and during embryo promoters. Attempts to identify ABA-responsive genes in development. plants have reported success rates of 67.5% in Arabidopsis The occurrences of each of these motifs in the promoter [35] and 49% in rice [22], with the latter being considered of each LEA gene were used to build a promoter signature particularly high by the authors. It is also noteworthy that the for each individual LEA gene. The signature for each LEA success rate reported for Arabidopsis was based on their top gene was then used to identify other genes in the rice 40 predicted genes and not all predicted genes as in our genome that contained the most similar signature (by way study. Further, both these studies were dependent on of the highest number of shared motifs) and thus, have the knowledge of well-defined experimentally determined cis- greatest potential to be co-regulated with the LEA genes. elements for their analysis. This analysis identified an additional 244 rice genes that The high success rate of our study also compares quite were included in a putative co-expressing LEA gene favorably with similar studies performed in non-plant network. There was an enrichment of some motifs in organisms. In Drosophila, the identification of Dorsal promoter regions ranging from 0 to 200 nucleotides responsive genes based on the presence of known cis- downstream of the transcription start site (TSS; Fig. 1). This elements yielded a 34% success rate [18], while in the observation is consistent with a study in Arabidopsis which nematode, Caenorhabditis elegans, the use of defined cis- documented that promoters have a compact nature [31]. elements that are characteristic to target gene promoters A detailed literature search that was performed for 110 reported an overall success rate of 72% for 57 arbitrarily of the 244 identified genes indicated that the function of selected predictions of interneuron AIY-expressed genes many of the genes, which possessed functional descrip- [32]. That analysis, however, required the use of defined tions, could reasonably be linked to LEA gene functions AIY motifs and phylogenetic footprinting over genomic during embryo development and water deficit stress sequence data from two nematodes to identify candidate responses (Supplementary File 5). genes. In comparison, our method predicted 244 genes The putative LEA co-expressing gene network included using genomic sequence data from a single organism. a number of genes encoding lipid transfer/seed storage Contrary to Wenick and Hobert [32], all of our predictions proteins, lipolytic enzymes, and amino acid transporters were experimentally tested, with no selection bias, and which may be involved in the building/mobilization of 73.36% of genes were confirmed to co-express with the storage reserves during seed embryonic development. LEA genes. As noted above and depicted in Fig. 3, the Further, numerous A1 peptidases were also present which have been shown to be expressed in developing seed pods and be involved in the proteolytic processing and matura- tion of seed storage proteins in numerous plant species including rice [12] and additionally have proteolytic roles during water deficit stress [5]. The list also included genes involved in abiotic stress signaling including ABA-inducible kinases and some well- characterized components of the phosphatidylinositol sec- ond messenger signaling pathway [26], cellular protection and detoxification, photosynthesis, ion transport, and cell cycle regulators. It is also worth noting that 44 hypothetical proteins with unknown functions were identified and confirmed to be co-expressed with the LEA genes thus linking them to a specific biological response. These genes are thus interesting candidates for future studies investigat- ing systemic late embryogenesis and/or drought-response- related genes that can be targeted for biotechnological interventions. Fig. 3 Correlation between the percentage of predicted genes As previously stated, experimental validation of our confirmed to co-express in the maturing embryo and the number of putative LEA gene network determined that 179 of the motifs they share with the original LEA genes. 184 Rice (2008) 1:177–187 success rate of our method increases up to 100% if we promoter motif content similarity, which we predict to select the top ranked genes, i.e., those that share 27 or more function coherently in response to defined biological common motifs. The strong positive correlation between conditions. Thus, both genes with and without known motif number and co-expression provides compelling assigned functions can be linked to specific biological evidence that supports the biological relevance of the processes based on their promoter similarities and their identified motifs. Furthermore, the positive expression of predicted co-expression (under specific conditions) with 73.36% of our predicted genes significantly exceeded that genes of well-defined functions. Although this study was from all rice genes determined to express in maturing rice performed in rice, we believe it can be applied to a wide embryos by MPSS, being 20.1% (when using the TPM≥4 range of eukaryotes, including other plant species, animals, cut-off used for LEA and predicted genes). Even when humans, and fungi since gene transcription is critically applying a less stringent positive expression criterion regulated by the combinational presence and use of specific (TPM≥1), only 28.0% of rice genes expressed positive, cis-regulatory sequences in the promoter regions of genes providing compelling support for our method as also in eukaryotic organisms in general [30]. demonstrated with the previously determined p values for In comparison to other approaches, our method (a) the enrichment of experimentally confirmed expression in predicts co-expressing genes by selecting the best matches our predicted gene set. for each promoter of the TGG relying on the specific We were intrigued to determine if the sole presence of promoter motif combinations, (b) does not require previ- our motifs that correspond with ABRE and DRE in the ously defined models of transcription factor binding sites or promoters of the predicted 244 genes could alone account knowledge of specific transcription factors that control for the found expression in mature embryos. Our results TGG, (c) uses sequence data of only one genome, and (d) is show that out of 179 genes co-expressing with LEA applicable to any genome. genes, 72.07% (129/179) contain motifs related to DRE or ABRE or both. At the same time, for the predicted Conclusions genes that were not co-expressed with the LEA genes (65), we observe that 69.23% (45/65) contain motifs that In summary, we demonstrate that similarities in promoter correspond to DRE or ABRE or both (see Supplementary composition, interpreted in terms of the pool and number File 9). We therefore conclude that the presence of DRE or of shared motifs, can be used to identify putative ABRE motifs, or both, in promoters is not itself sufficient transcriptional networks of genes that co-express with rice to account for the accurate prediction of co-expression. LEA genes. A literature analysis indicates that many of Thus, other motifs that we identified appear to be required these genes could plausibly function coherently with the LEA genes during developmentally induced desiccation in and may act synergistically with these to secure specific gene co-expression. the embryo. This type of analysis can greatly contribute The higher overall success rate of our study may reflect towards understanding the function of newly annotated the limitations in other methods that are dependent on using genes since it can be used to functionally associate them known and exclusive types of cis-elements in predicting co- with genes that have well-defined functions in specific responsive genes. Since not all cis-elements are known and biological processes. Further, it provides valuable informa- transcription regulation most likely results from the tion regarding the transcriptional regulation of functionally presence and a combinational use of multiple transcription related gene networks which could greatly facilitate in bio- factor binding sites [21], computational identification of technological manipulations to improve cellular responses such sites can provide a rapid and cost-effective method for to specific biological conditions. identifying groups of co-expressing genes with high success. Additionally, the use of computationally derived motifs allows a global spectrum of application of the Methods method since it can be applied to any biological process occurring within a eukaryotic organism without relying on Target gene group and promoters or being restricted to well-studied processes in well-studied organisms that have experimentally confirmed cis-elements The first step of the analysis is the selection of a target available. group genes. In our case we identified 31 rice LEA genes This computationally based prediction technique is through MPSS analysis [19] that were determined to be particularly useful and applicable to newly sequenced expressed in mature rice embryos (see Supplementary File eukaryotic genomes from species for which there is little 1). MPSS provides a comprehensive assessment of gene global expression data available. The technique can be used expression by generating short sequence tags, each 20 bp to build putative transcriptional networks of genes based on long, produced from a defined position for each transcript. Rice (2008) 1:177–187 185 185 Promoter sequences for genes covering the region [−2,000, The final extension step was performed at 72°C for 3 min. +200] relative to the transcription start site were obtained The PCR products were separated by electrophoresis and from the International Rice Genome Sequencing Project stained with ethidium bromide. The sample collection, RT- [14]. PCR, and gel analysis were all performed in duplicate. The RT-PCR gel image (Supplementary File 7) intensi- Motif identification ties were graded using standard techniques. A value of 0 was assigned to genes when no PCR products were To identify motifs enriched in the promoter regions, we detected. Samples that gave positive products were used the Dragon Motif Builder system (http://apps.sanbi.ac. assigned a value of 1 (weakest) to 5 (strongest). The values za/MotifBuilder/index.php)[13]. In total, we identified 30 presented in Supplementary File 6 are the average values enriched motif families with motifs of nine nucleotides in determined by three independent assessments. length. We used the following parameters: method = EM2, threshold=0.875, and the random DNA background with Massively parallel signature sequencing data equal proportion of the four nucleotides. The details about the algorithm of DMB and a guide for interpretation of its RNA samples were extracted from ME and YE. The RNA results can be found on the system’s website. The spatial samples were sent to Illumina Company for custom service distribution of all 30 motifs in the promoter regions of the of MPSS analysis [19]. The total tag number received from predicted genes was determined (Fig. 1). Illumina was 3,520,358. The raw number was normalized to a metric of TPM. Positive expression of LEA genes Determining promoter with similar content based on MPSS data was limited to a minimal signal of at least 4 TPM according to Brandenberger et al. [3]. We have used position weight matrix of each of the 30 The percentage of all genomic non-transposable element motif families identified with DMB, and with the same genes that are expressed in maturing rice embryos was threshold used for motif identification, we predicted motifs determined using global MPSS expression analysis. This on the promoters of all rice genes. Then, for each of the analysis was performed using a cut-off of at least 4 TPM promoters of genes from the TGG, we searched all other (as used for positive selection of LEA genes and predicted promoters that shared with it the highest number of genes) and also with the less stringent cut-off of at least common annotated promoter motifs. We have limited the 1 TPM. number of predicted promoters/genes to the top three promoters that shared the highest number of common Statistical test for enrichment promoter motifs with the TGG. If it was not possible to limit the number of candidate promoters to three, we extended the We calculate the p values for the enrichment of the set of associated promoters to include all those promoters that experimentally confirmed genes that co-express with LEA had the highest number of promoter elements. These genes in our computationally predicted gene set, relative to associations were then used to generate a TGG-like tran- the whole rice genome. We used Fisher’s exact right-side scriptional regulatory network (see Supplementary File 4). test based on hypergeometric distribution and corrected for multiplicity testing by the Bonferroni method. The param- Experimental confirmation of co-expression of predicted eters used are as follows: genes with the TGG Genes predicted to co-express with the LEA genes: n= The rice cultivar Tainung 67 (Oryza sativa L. ssp. japonica) 244 Genes with experimentally confirmed expression (out was grown in the paddy field at the Academia Sinica campus. Embryos were harvested and dissected from the of 244): k=179 seeds at 15–20 or 25 days after pollination (DAP) and Total number of genes in the rice genome: N=41,047 designated as milky stage embryos (ME) and yellow stage Total number of rice genes expressing in embryo embryos (YE), respectively. The total RNA was extracted (TPM≥1): K=11,488 using Trizol (Invitrogen). First-strand cDNA was synthe- Bonferroni correction factor=41,047 sized using standard protocols and SuperScript III reverse p value=4.62e−049, corrected for multiplicity testing p value=1.90e−044 transcriptase (Invitrogen). The PCR reaction was performed using the primers sets listed in Supplementary File 8. The Genes predicted to co-express with the LEA genes: n= amplification was performed using 30 or 35 cycles consist- ing of 15 s at 94°C, 30 s at 60°C, and 60 s at 72°C, Genes with experimentally confirmed expression (out following an initial denaturation cycle of 2 min at 94°C. of 244): k=179 186 Rice (2008) 1:177–187 expression and relation to drought susceptibility. FEBS Lett Total number of genes in the rice genome: N=41,047 2001;492:242–6. Total number of rice genes expressing in embryo 6. Davidson EH, McClay DR, Hood L. Regulatory gene networks (TPM≥4): K=8,241 and the properties of the developmental process. Proc Natl Acad Bonferroni correction factor=41,047 Sci U S A 2003;100:1475–80. 7. Diaz I, Vicente-Carbajosa J, Abraham Z, Martinez M, Isabel-La p value=3.33e−072, corrected for multiplicity testing Moneda I, Carbonero P. The GAMYB protein from barley p value=1.37e−067 interacts with the DOF transcription factor BPBF and activates endosperm-specific genes during seed development. Plant J 2002;29:453–64. Online database 8. Dubouzet JG, Sakuma Y, Ito Y, Kasuga M, Dubouzet EG, Miura S, Seki M, Shinozaki K, Yamaguchi-Shinozaki K. OsDREB genes We have created an online Dragon Database for Explora- in rice, Oryza sativa L., encode transcription activators that function in drought-, high-salt- and cold-responsive gene expres- tion of late embryogenesis abundant genes in rice (http:// sion. Plant J 2003;33:751–63. apps.sanbi.ac.za/dlea) to allow access to our results and 9. Dure LI, Crouch M, Harada J, Ho T-HD, Mundy J, Quatrano R, data. Using Rice Annotation Project (RAP, eg: Os01 Thomas T, Sung ZR. Common amino acid sequence domains among g0159600) or TIGR (LOC_Os01g06630) identifiers, one the LEA proteins of higher plants. Plant Mol Biol 1989;12:475–86. 10. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis can access the promoter details for individual genes. This and display of genome-wide expression patterns. Proc Natl Acad provides information on the number of occurrences and Sci U S A 1998;95:14863–8. spatial location of all motifs present in individual gene 11. Geisler M, Kleczkowski LA, Karpinski S. A universal algorithm promoters. Further, the database also contains information for genome-wide in silico identification of biologically significant gene promoter putative cis-regulatory-elements; identification of generated with the DMB algorithm that illustrates the new elements for reactive oxygen species and sucrose signaling in spatial distribution of the best motifs from each of the Arabidopsis. Plant J 2006;45:384–98. motif families in the promoters of the TGG (LEA genes). 12. Hiraiwa N, Kondo M, Nishimura M, Hara-Nishimura I. An The site also contains links to the RAP database (http:// aspartic endopeptidase is involved in the breakdown of propep- tides of storage proteins in protein-storage vacuoles of plants. Eur rapdb.dna.affrc.go.jp/) that provides additional information J Biochem 1997;246:133–41. on gene annotations [15, 20, 28]. 13. Huang E, Yang L, Chowdhary R, Kassim A, Bajic V. An algo- rithm for ab initio DNA motif detection. In: Bajic VB, Tan TW, Acknowledgments SM received postdoctoral fellowship from editors. Information processing and living systems. Singapore: NBN; CG received support from NRF; CRM received support from World Scientific; 2005. p. 611–4. SSABMI program; MK received postdoctoral fellowship from the 14. International Rice Genome Sequencing Project. The map-based Claude Leon Foundation; MM received support from NBN and NRF sequence of the rice genome. Nature 2005;436:793–800. FA2006040900002; YIH received support from NSC; VBB received 15. Itoh T, Tanaka T, Barrero RA, Yamasaki C, Fujii Y, Hilton PB, partial support from the DST/NRF Research Chair grant, NBN, and Antonio BA, Aono H, Apweiler R, Bruskiewich R, Bureau T, NRF grants FA2007051400013, ICD2006071000003, and Burr F, Costa DO, Fuks G, Habara T, Haberer G, Han B, Harada FA2006040900002. E, Hiraki AT, Hirochika H, Hoen D, Hokari H, Hosokawa S, Hsing YI, Ikawa H, Ikeo K, Imanishi T, Ito Y, Jaiswal P, Kanno M, Kawahara Y, Kawamura T, Kawashima H, Khurana JP, Open Access This article is distributed under the terms of the Kikuchi S, Komatsu S, Koyanagi KO, Kubooka H, Lieberherr Creative Commons Attribution Noncommercial License which per- D, Lin YC, Lonsdale D, Matsumoto T, Matsuya A, McCombie mits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited. WR, Messing J, Miyao A, Mulder N, Nagamura Y, Nam J, Namiki N, Numa H, Nurimoto S, O’Donovan C, Ohyanagi H, Okido T, OOta S, Osato N, Palmer LE, Quetier F, Raghuvanshi S, Saichi N, Sakai H, Sakai Y, Sakata K, Sakurai T, Sato F, Sato Y, Schoof H, Seki M, Shibata M, Shimizu Y, Shinozaki K, Shinso Y, References Singh NK, Smith-White B, Takeda J, Tanino M, Tatusova T, Thongjuea S, Todokoro F, Tsugane M, Tyagi AK, Vanavichit A, 1. Allocco DJ, Kohane IS, Butte AJ. Quantifying the relationship Wang A, Wing RA, Yamaguchi K, Yamamoto M, Yamamoto N, between co-expression, co-regulation and gene function. BMC Yu Y, Zhang H, Zhao Q, Higo K, Burr B, Gojobori T, Sasaki T. Bioinformatics 2004;5:18. Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. Genome 2. Beer MA, Tavazoie S. Predicting gene expression from sequence. Res 2007;17:175–83. Cell 2004;117:185–98. 16. Jansen R, Greenbaum D, Gerstein M. Relating whole-genome 3. Brandenberger R, Khrebtukova I, Thies RS, Miura T, Jingli C, expression data with protein–protein interactions. Genome Res Puri R, Vasicek T, Lebkowski J, Rao M. MPSS profiling of 2002;12:37–46. human embryonic stem cells. BMC Dev Biol 2004;4:10. 17. Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P. Coexpression 4. Brazma A, Jonassen I, Vilo J, Ukkonen E. Predicting gene analysis of human genes across many microarray data sets. regulatory elements in silico on a genomic scale. Genome Res Genome Res 2004;14:1085–94. 1998;8:1202–15. 5. Cruz de Carvalho MH, d’Arcy-Lameta A, Roy-Macauley H, 18. Markstein M, Markstein P, Markstein V, Levine MS. Genome- wide analysis of clustered Dorsal binding sites identifies putative Gareil M, El Maarouf H, Pham-Thi AT, Zuily-Fodil Y. Aspartic protease in leaves of common bean (Phaseolus vulgaris L.) and target genes in the Drosophila embryo. Proc Natl Acad Sci U S A cowpea (Vigna unguiculata L. Walp): enzymatic activity, gene 2002;99:763–8. Rice (2008) 1:177–187 187 187 19. Nobuta K, Venu RC, Lu C, Belo A, Vemaraju K, Kulkarni K, Sato Y, Shinso Y, Suzuki M, Takeda J, Tanino M, Todokoro F, Wang W, Pillay M, Green PJ, Wang GL, Meyers BC. An Yamaguchi K, Yamamoto N, Yamasaki C, Imanishi T, Okido T, expression atlas of rice mRNAs and small RNAs. Nat Biotechnol Tada M, Ikeo K, Tateno Y, Gojobori T, Lin YC, Wei FJ, Hsing YI, 2007;25:473–7. Zhao Q, Han B, Kramer MR, McCombie RW, Lonsdale D, 20. Ohyanagi H, Tanaka T, Sakai H, Shigemoto Y, Yamaguchi K, O’Donovan CC, Whitfield EJ, Apweiler R, Koyanagi KO, Habara T, Fujii Y, Antonio BA, Nagamura Y, Imanishi T, Ikeo K, Khurana JP, Raghuvanshi S, Singh NK, Tyagi AK, Haberer G, Itoh T, Gojobori T, Sasaki T. The Rice Annotation Project Fujisawa M, Hosokawa S, Ito Y, Ikawa H, Shibata M, Yamamoto Database (RAP-DB): hub for Oryza sativa ssp. japonica genome M, Bruskiewich RM, Hoen DR, Bureau TE, Namiki N, Ohyanagi information. Nucleic Acids Res 2006;34:D741–744. H, Sakai Y, Nobushima S, Sakata K, Barrero RA, Sato Y, 21. Pilpel Y, Sudarsanam P, Church GM. Identifying regulatory Souvorov A, Smith-White B, Tatusova T, An S, An G, OOta S, networks by combinatorial analysis of promoter elements. Nat Fuks G, Fuks G, Messing J, Christie KR, Lieberherr D, Kim H, Genet 2001;29:153–9. Zuccolo A, Wing RA, Nobuta K, Green PJ, Lu C, Meyers BC, 22. Ross C, Shen QJ. Computational prediction and experimental Chaparro C, Piegu B, Panaud O, Echeverria M. The Rice verification of HVA1-like abscisic acid responsive promoters in Annotation Project Database (RAP-DB): 2008 update. Nucleic rice (Oryza sativa). Plant Mol Biol 2006;62:233–46. Acids Res 2008;36:D1028–1033. 23. Schulze A, Downward J. Navigating gene expression using 29. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. microarrays—a technology review. Nat Cell Biol 2001;3:E190–195. Systematic determination of genetic network architecture. Nat 24. Shinozaki K, Yamaguchi-Shinozaki K. Molecular responses to Genet 1999;22:281–5. dehydration and low temperature: differences and cross-talk 30. Tuch BB, Li H, Johnson AD. Evolution of eukaryotic transcrip- between two stress signaling pathways. Curr Opin Plant Biol tion circuits. Science 2008;319:1797–9. 2000;3:217–23. 31. Vandepoele K, Casneuf T, Van de PY. Identification of novel 25. Stuart JM, Segal E, Koller D, Kim SK. A gene-coexpression regulatory modules in dicotyledonous plants using expression data network for global discovery of conserved genetic modules. and comparative genomics. Genome Biol 2006;7:R103. Science 2003;302:249–55. 32. Wenick AS, Hobert O. Genomic cis-regulatory architecture and 26. Takahashi S, Katagiri T, Hirayama T, Yamaguchi-Shinozaki K, trans-acting regulators of a single interneuron-specific gene Shinozaki K. Hyperosmotic stress induces a rapid and transient battery in C. elegans. Dev Cell 2004;6:757–70. increase in inositol 1,4,5-trisphosphate independent of abscisic acid 33. Wu C, Washida H, Onodera Y, Harada K, Takaiwa F. Quantitative in Arabidopsis cell culture. Plant Cell Physiol 2001;42:214–22. nature of the Prolamin-box, ACGT and AACA motifs in a rice 27. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, glutelin gene promoter: minimal cis-element requirements for Dmitrovsky E, Lander ES, Golub TR. Interpreting patterns of endosperm-specific gene expression. Plant J 2000;23:415–21. gene expression with self-organizing maps: methods and applica- 34. Yan X, Mehan MR, Huang Y, Waterman MS, Yu PS, Zhou XJ. A tion to hematopoietic differentiation. Proc Natl Acad Sci U S A graph-based approach to systematically reconstruct human tran- 1999;96:2907–12. scriptional regulatory modules. Bioinformatics 2007;23:i577–86. 28. Tanaka T, Antonio BA, Kikuchi S, Matsumoto T, Nagamura Y, 35. Zhang W, Ruan J, Ho TH, You Y, Yu T, Quatrano RS. Cis- Numa H, Sakai H, Wu J, Itoh T, Sasaki T, Aono R, Fujii Y, regulatory element based targeted gene finding: genome-wide Habara T, Harada E, Kanno M, Kawahara Y, Kawashima H, identification of abscisic acid- and abiotic stress-responsive genes Kubooka H, Matsuya A, Nakaoka H, Saichi N, Sanbonmatsu R, in Arabidopsis thaliana. Bioinformatics 2005;21:3074–81.

Journal

RiceSpringer Journals

Published: Dec 1, 2008

Keywords: Transcription regulation; Co-expression; Co-regulation

There are no references for this article.