Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

starBase: a database for exploring microRNAmRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data

starBase: a database for exploring microRNAmRNA interaction maps from Argonaute CLIP-Seq and... D202–D209 Nucleic Acids Research, 2011, Vol. 39, Database issue Published online 29 October 2010 doi:10.1093/nar/gkq1056 starBase: a database for exploring microRNA–mRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data Jian-Hua Yang, Jun-Hao Li, Peng Shao, Hui Zhou, Yue-Qin Chen and Liang-Hu Qu* Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory for Biocontrol, Sun Yat-sen University, Guangzhou 510275, P. R. China Received August 12, 2010; Revised and Accepted October 13, 2010 INTRODUCTION ABSTRACT MicroRNAs (miRNAs) are endogenous 22 nt RNAs MicroRNAs (miRNAs) represent an important class that direct the post-transcriptional repression of protein- of small non-coding RNAs (sRNAs) that regulate coding genes (1,2). By base pairing to mRNAs, miRNAs gene expression by targeting messenger RNAs. mediate translational repression or mRNA degradation However, assigning miRNAs to their regulatory (1–3). Functional studies indicate that miRNAs partici- target genes remains technically challenging. pate in the regulation of numerous cellular processes, Recently, high-throughput CLIP-Seq and such as proliferation, apoptosis, differentiation and the degradome sequencing (Degradome-Seq) methods cell cycle (1–3). have been applied to identify the sites of Argonaute Thousands of miRNAs have been identified in animals interaction and miRNA cleavage sites, respectively. and plants by cloning and deep sequencing, but In this study, we introduce a novel database, determining the targets of these miRNAs is an ongoing challenge (1–4). To date, a large number of target predic- starBase (sRNA target Base), which we have de- tion computer programs have been developed, such as veloped to facilitate the comprehensive exploration TargetScan (5,6), PicTar (7), miRanda (8), PITA (9) and of miRNA–target interaction maps from CLIP-Seq RNA22 (10) for animal miRNA targets, and miRU (11) and Degradome-Seq data. The current version and TargetFinder (12) for plant miRNA targets. In includes high-throughput sequencing data addition, several resources have been established to sys- generated from 21 CLIP-Seq and 10 Degradome- tematically collect and describe both experimentally Seq experiments from six organisms. By analyzing validated miRNA targets [TarBase (13), miRecords (14)] millions of mapped CLIP-Seq and Degradome-Seq and predicted miRNA targets [miRGator (15), reads, we identified 1 million Ago-binding clusters MiRNAMap (16)]. However, because miRNA regulation and 2 million cleaved target clusters in animals of an animal mRNA requires base pairing with only few and plants, respectively. Analyses of these nucleotides of the 3 -UTR region of the target mRNA, different target prediction programs produce different clusters, and of target sites predicted by 6 miRNA results and have high false positive rates (4,17–19). target prediction programs, resulted in our identifi- Although plant miRNA targets have been predicted on cation of approximately 400 000 and approximately the basis of their extensive and often conserved comple- 66 000 miRNA-target regulatory relationships from mentarity to miRNAs (1,2), we must spend substantial CLIP-Seq and Degradome-Seq data, respectively. time and effort attempting to validate predicted miRNA Furthermore, two web servers were provided to targets that turn out to be false. discover novel miRNA target sites from CLIP-Seq In the past several years, significant efforts have been and Degradome-Seq data. Our web implementation made in determining biologically relevant miRNA–target supports diverse query types and exploration of interactions using high-throughput experimental common targets, gene ontologies and pathways. approaches. Several recent studies have reported the use The starBase is available at http://starbase.sysu of cross-linking and Argonaute (Ago) immunopre- cipitation coupled with high-throughput sequencing .edu.cn/. *To whom correspondence should be addressed. Tel: 86 20 84112399; Fax: 86 20 84036551; Email: lssqlh@mail.sysu.edu.cn Present address: Liang-Hu Qu, Biotechnology Research Center, Sun Yat-sen University, Guangzhou 510275, PR China. The Author(s) 2010. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2011, Vol. 39, Database issue D203 [CLIP-Seq, also referred to as high-throughput sequencing and Degradome-Seq data, this database is expected to provide considerable resources to help researchers of RNA isolated by crosslinking immunoprecipitation investigating new miRNA–target interactions and de- (HITS-CLIP)] (20–22) and high-throughput degradome veloping next generation miRNA target prediction sequencing [Degradome-Seq, also referred to as ‘parallel analysis of RNA ends’ (PARE)] (23–27) to isolate targets algorithms. in animals and plants. The application of CLIP-Seq and Degradome-Seq methods has significantly reduced the MATERIALS AND METHODS rate of false positive predictions of miRNA binding sites and has also reduced the size of the search space for Twenty-one Ago or TNRC6 CLIP-Seq sequence data sets miRNA target sites (20–27). The increasing amount of and 10 Degradome-Seq sequence data sets were compiled CLIP-Seq and Degradome-Seq data generates a strong from eight related studies (20–27) and downloaded from demand among researchers for an integrated database NCBI GEO database (28) or obtained from the that could facilitate the annotation and analysis of these Supplementary Data of the original articles (20–27). Ago data. and TNRC6 CLIP-Seq reads were mapped to genomes To meet this need, we have developed and are using the Bowtie program (version 0.12.0) (29) with introducing via the current study, the starBase database. options:-a –best –strata -v 2 -m 1. Reads with multiple The starBase facilitates the integrative, interactive and ver- equivalent hits to the genome were discarded. satile display of, as well as the comprehensive annotation Degradome-Seq data were mapped to genomes and and discovery of, miRNA–target interaction maps from cDNA sequences using the Bowtie program (version CLIP-Seq and Degradome-Seq data from six organisms: 0.12.0) (29) with options: -a -m 10 -v 0 and -a -v 0, re- human, mouse, Caenhorhabditis elegans, Arabidopsis spectively. The overlapping reads were grouped into thaliana, Oryza sativa and Vitis vinifera (Figure 1). clusters, with each cluster including at least one sequence Information on tens of thousands of miRNA–target regu- read and having a minimum length of 20 nt. In addition, latory relationships, as well as millions of Ago-binding the high reliable Ago (ALG-1) CLIP-Seq clusters in sites and cleavage sites (Table 1) is contained within L4-stage wild-type (wt) worms, the top Ago or TNRC6 starBase. In addition, two novel web servers were de- CLIP-Seq clusters in human HEK293 cells and the Ago veloped to identify miRNA binding sites or cleavage CLIP-Seq clusters/peaks in mouse neocortex were sites from CLIP-Seq and Degradome-Seq data. As a obtained from the supplementary material of the means of comprehensively integrating Ago CLIP-Seq original articles (20–22). Figure 1. System overview of starBase core framework. All results generated by starBase are deposited in MySQL relational databases and displayed in the visual browser and web page. D204 Nucleic Acids Research, 2011, Vol. 39, Database issue Table 1. Data statistics in starBase Species Library Mapped read Peak cluster Relationship Known target Human 13 1 137 137 390 990 156 751 536 Mouse 5 2 831 042 725 145 232 196 132 Caenhorhabditis elegans 3 178 639 24 699 12 904 53 Arabidopsis 6 5 575 026 669 589 25 688 / Oryza sativa 3 6 448 130 1 009 870 37 254 / Vitis vinifera 1 2 404 808 431 180 3179 / Statistics indicating the numbers of library (CLIP-Seq and Degradome-Seq), mapped reads, peak clusters (CLIP-Seq clusters or Degradome-Seq clusters), relationships (miRNA–target regulatory relationships) and known animal miRNA target sites for the six organisms (human, mouse, C. elegans, Arabidopsis, Oryza sativa and Vitis vinifera). ‘/’ means that known plant miRNA target sites are not present in the table because miRecords (14) provides only animal miRNA target sites. It should be noted that some miRNA–target regulatory relationships involved miRNAs from the same family and involved mRNA isoforms. Animal miRNA target sites predicted by 5 prediction sequences were downloaded from UCSC bioinformatics programs: [TargetScan (5,6), PicTar (7), miRanda (8), websites (30). Arabidopsis (TAIR9) genome sequences PITA (9) and RNA22 (10)] were downloaded from their were downloaded from TIGR (42). Rice genome se- corresponding websites. Predicted target site coordinates quences were downloaded from the MSU Rice Genome from TargetScan (5), PicTar (7) and PITA (9) were con- Annotation Website (43). Grapevine genome sequences verted to coordinates of recently released genomes using were downloaded from the Genoscope website (44). the liftOver utility from the University of California Santa Cruz (UCSC) bioinformatics websites (30). Target se- quences from RNA22 (10) and miRanda (8) were DATABASE CONTENT aligned to genomes to determine genome coordinates using the Bowtie program (29). Plant miRNA target Genomic landscape of Ago-binding sites and sites were predicted by the CleaveLand program (version miRNA-cleavage sites 2.0) (31). We only considered the miRNA–target inter- To study genome-wide Ago–RNA interaction patterns, we actions with an alignment score from CleaveLand not ex- grouped 4 million mapped CLIP-Seq reads into about 1 ceeding the cutoff threshold of 7.0. Experimentally million clusters (details in ‘Materials and Methods’ validated target sites were downloaded from the section, Table 1). The sequencing depth distribution of miRecords website (14). Then these validated target sites these clusters is presented in the form of target peaks were mapped to genomes to allow determination of (t-peaks), which were displayed in our deepView genome genome coordinates using the Bowtie program (29). browser (Figure 1 and Supplementary Figure S1). This The ClipSearch program was developed to search display method allows a direct comparison of peak for 6–8-mers (8-mer, 7-mer-m8 and 7-mer-A1) (2,5) patterns generated from different Ago proteins, cell lines in CLIP-Seq data. The DegradomeSearch program was and tissues to determine miRNA target sites. In general, developed to search Degradome-Seq clusters for nearly clusters corresponding to the bona fide binding site perfect complements of miRNA sequences. Degradome- are found at a higher peak than those corresponding to Seq cluster sequences, extended by an additional 15 nt 0 0 biological noise (Figure 1 and Supplementary Figure S1). in both the 5 - and 3 -directions for each of the species, All clusters were intersected with annotated genomic were extracted and used as the DegradomeSearch input elements, and >10% of clusters were found to overlap data set. DegradomeSearch web-server aligned miRNA to known 3 -UTR regions in each species (Supplementary extended clusters using segemehl (version 0.093) (32). Table S1). Intriguingly, we found that >10% and >7% Interactions between miRNA and target were scored ac- of clusters overlapped with CDS and intron regions, re- cording to the previously described methods (31,33,34). spectively, in each species (Supplementary Table S1). For each interaction, we performed a search of the The same strategy was applied to group 14 million genome-wide cleavage sites pre-deposited in the MySQL mapped Degradome-Seq reads into about 2 million database to determine if there were cleavage tags at the clusters (Table 1). The majority of these clusters 10th nucleotide of the alignment. overlapped with the mRNA sequences. To study All known miRNAs were downloaded from miRBase patterns of RNA degradation, genome-wide target plots [release 15.0, (35)]. All refGenes were downloaded from (t-plots) (23,45) (Supplementary Figure S2) were con- the UCSC bioinformatics websites (30). GO ontology (36), structed by plotting the abundance of each cleavage sig- Kyoto Encyclopedia of Genes and Genomes (KEGG) nature on the genome sequences. As described by German pathways (37) and BioCarta pathways for refGenes were et al. (23,45), these t-plots can be used to distinguish true extracted from UCSC Table browser (30). Known miRNA cleavage sites from background noise. In general, non-coding RNA genes were downloaded from Ensembl for bona fide miRNA targets, the cleavage tags corres- (38) or UCSC (30) or were obtained from related literature (39–41). Human (UCSC hg19), mouse (UCSC mm9, ponding to the cleavage site are found at higher abun- NCBI Build 37) and C. elegans (WS190) genome dances than those at other positions, making them fairly Nucleic Acids Research, 2011, Vol. 39, Database issue D205 easy to distinguish by simple observation of the t-plots WEB INTERFACE (23,45) (Supplementary Figure S2). The starBase database provides various query interfaces and graphical visualization pages to facilitate analysis Annotation and identification of miRNA–target of the CLIP-Seq and Degradome-Seq data sets and regulatory relationships exploration of miRNA–target interactions. Our improved deepView Genome Browser (46) provides an To investigate animal miRNA–target regulatory relation- integrated view of mapped reads, predicted and known ships, animal miRNA target sites predicted by the five miRNA targets, ncRNAs, protein-coding genes, target prediction programs [TargetScan (5,6), PicTar (7), clusters, target-peaks and target-plots (Figure 2 and miRanda (8), PITA (9) and RNA22 (10)] were intersected Supplementary Figures S1–S2). Bench biologists can use with all CLIP-Seq peak clusters. In total, we identified the genome browser to simultaneously compare the maps approximately 400 000 regulatory relationships between of t-peaks or t-plots generated from multiple experiments 1348 miRNAs and 26 296 genes (Table 1 and and the conservation of binding sites from all target pre- Supplementary Table S2). By using CLIP-Seq data to diction programs. Clicking a track item within the browser filter candidates, the predicted results of each target pre- launches a detailed page providing further information on diction program were significantly reduced, suggesting that item or links to external resources such as NCBI, that there may be a number of false positive predictions UCSC and TAIR, from which one can obtain more com- generated from different computational approaches. prehensive information. To provide valuable insights as to the function of each miRNA, we carried out a comprehensive gene set Web-based exploration of miRNA–target regulation analysis of the miRNA target sets by combining the relationships KEGG pathways (37), the BioCarta pathways and the Gene Ontologies (GO) categories (36). We provide two web interfaces, CLIP-Seq and We applied the CleaveLand program (version 2.0) (31) Degradome-Seq, with which to display the miRNA– to plant Degradome-Seq data, and identified approxi- target interaction relationships (Supplementary Figure mately 66 000 miRNA–target regulatory relationships S3–S4). Users can browse the relationships by entering a that involved 25 579 genes and 856 miRNAs (Table 1 gene name or by selecting a microRNA name. When one and Supplementary Table S2). Due to the integration of starts typing a gene name in the search box, suggested the large number of Degradome-Seq libraries from diverse gene names are displayed in the list box. The user can tissues, this analysis provides an enhanced resolution for then either choose a gene from the list box or finish these regulatory relationships. typing the full gene name. The user can also search for intersections among targets by choosing interested target Predicting target sites of small RNAs from CLIP-Seq prediction programs. The results of the search are listed as and Degradome-Seq data the miRNA–target table. For the CLIP-Seq section, the number of predicted binding sites given by each prediction The increasing amount of CLIP-Seq and Degradome-Seq program and the number of CLIP-Seq reads are indicated data also produces a strong demand for web-based tools in a table. The users can click on the number within to predict target sites of small RNAs from these data. Two the table to launch a detailed page providing further in- web-based tools, ClipSearch and DegradomeSearch, were formation on that miRNA–target interaction. The user developed to screen the potential miRNA binding sites also can click on the title of the table to sort miRNA– and cleavage sites. ClipSearch predicts biological target interactions according to various features, such as miRNA–target interactions by searching for 8-mer and the number of binding sites, miRNA names or gene 7-mer sites that match the seed region of the miRNA. names. The detailed information for a miRNA–target ClipSearch searches for these sites in CLIP-Seq clusters interaction includes a description of the target gene, the that overlap with the 3 -UTR of the known genes. GO terms of the gene, the pathways the target gene is ClipSearch can discover non-conserved miRNA binding involved in and the number of Clip-Seq reads (Supplemen- sites because it does not use cross-species sequence con- tary Figure S3). This information allows the user to filter servation to filter candidates. the putative targets further. DegradomeSearch predicts functional miRNA–target The Degradome-Seq section is organized similarly to the interactions by searching for sites with a near-perfect CLIP-Seq section. The target genes, the genomic coordin- match to the whole miRNA sequence. DegradomeSearch ation, the penalty score of miRNA–mRNA interactions searches for these sites in Degradome-Seq clusters that and the sequence number of cleavage sites are all presented overlap with mRNA (details in ‘Materials and Methods’ in a table. Clicking on the target gene within the table section). Interactions are scored according to a scoring launches a page showing detailed information on the scheme that successfully identified miRNA target sites in miRNA–target interactions (Supplementary Figure S4). plants (31,33,34). In its default setting, DegradomeSearch finds miRNA–target interactions with a penalty score not Web-based tools for predicting target sites of novel small exceeding 7.0 and having at least one cleavage tag. False RNAs from CLIP-Seq and Degradome-Seq data positives or predicted results can be reduced by choosing a lower penalty score or by limiting the minimum number The starBase provides two simple and user-friendly inter- of cleavage tags. faces to allow the users to predict target sites of small D206 Nucleic Acids Research, 2011, Vol. 39, Database issue Figure 2. Illustrative screen shots from the deepView browser. The deepView browser provides an integrated view of CLIP-Seq and Degradome-Seq data, known and predicted miRNA target sites, protein-coding genes, ncRNA genes, miRNAs, strand-specific peak clusters, genome-wide target-peaks and target-plots. RNAs from CLIP-Seq and Degradome-Seq data DISCUSSION AND CONCLUSIONS (Supplementary Figures S5–S6). The user is required to Our global analysis of Ago CLIP-Seq and Degradome- select an intended organism, and then enter nucleotides Seq data derived from 31 experiments in six organisms 2–8 of a mature sequence or a mature miRNA sequence provides a comprehensive integrated map of the for the ClipSearch and DegradomeSearch programs, re- miRNA–target interactions. The large number of spectively. After data submission, a typical run may take Ago-binding sites and cleavage sites identified in this several minutes to finish. To reduce false positives in the study have shown there to be an extensive and complex predicted targets from the ClipSearch program, the user interaction map among Ago proteins, miRNAs and target can filter the candidate targets by selecting site types, RNAs (Table 1). which are classified into 8-mer, 7-mer-m8 and 7-mer-A1 Our initial analysis found that the majority of CLIP-Seq (2,5). The user can also limit the penalty score to reduce and Degradome-Seq clusters could not be clearly pre- the false positive predictions in the DegradomeSearch dicted to be miRNA targets (Supplementary Table 1), program. The sequence depth of a target site or cleavage implying they may bind to novel small RNAs, or site can be used to further reduce false positives in the miRNAs that follow unexpected rules of binding, such predicted targets. The output of the ClipSearch program as the centered pairings (center sites) recently reported consists of three parts: site type, information about the by Bartel and his colleagues (47). Moreover, numerous target gene and visual sequence alignments matched to CLIP-Seq clusters were not located within the 3 -UTR a specific CLIP-Seq cluster (Supplementary Figure S7). of the gene, indicating that the miRNA may bind to the The output of the DegradomeSearch program also coding region and the 5 -UTR, as has been reported for consists of three parts: the penalty score, the miRNA– ribosomal protein regulation by mir-10a (48) and Nanog, mRNA interaction map and the sequence number of Oct4 and Sox2 regulation by miR-134, miR-296 and cleavage site in different experiments (Supplementary miR-470 (49). Recent reports revealed that the Ago Figure S8). A link to the DeepView genome browser is protein also plays a role in miRNA-derived cleavage also provided to allow the user to view various features (47) and in miRNA processing (50) in animals. of each target region. Therefore, one might speculate that a substantial Nucleic Acids Research, 2011, Vol. 39, Database issue D207 number of Argonaute-catalyzed cleavage sites may be to increase the amount of storage space and improve hiding in these data. In plants, vast amounts of the computational efficiency of our computer servers Degradome-Seq data might not be miRNA-derived for storing and analyzing these new data. In addition, cleavage sites, but rather the by-product of other degrad- we intend to integrate other CLIP-Seq data from other ation pathways. Nevertheless, we anticipate that future RNA binding proteins (51), such as PUM2 (22), Nova investigation of these data might provide important (52), FOX2 (53) and PTB (54), into starBase to improve insights into rules governing miRNA–target interactions. our understanding of the eukaryotic regulatory networks. Compared to the other miRNA target-related data- bases, including TarBase (13), miRecords (14), AVAILABILITY miRGator (15) and MiRNAMap (16), which only collect predicted targets or experimentally supported The starBase database is freely available at http://starbase targets, the distinctive features in our starBase database .sysu.edu.cn/. All starBase data files can be freely down- are as follows: (i) CLIP-Seq and Degradome-Seq are loaded and used according to the GNU Public License. the newest high-throughput technology for the transcriptome-wide identification of miRNA target sites in animals and plants (20–27). Our starBase database SUPPLEMENTARY DATA is the first database to provide comprehensive analysis Supplementary Data are available at NAR Online. of public CLIP-Seq and Degradome-Seq data, (ii) genome-wide t-peak and t-plot maps generated by starBase allow users to easily search within these signa- ACKNOWLEDGEMENTS tures for the miRNA cleavage sites or binding sites, (iii) We thank Ling-ling Zheng for her valuable comments; our improved deepView browser in starBase provides an Markus Hafner and Thomas Tuschl for providing the integrated view of multidimensional data to facilitate PAR-CLIP data; Robert Darnell for providing the miRNA regulatory networks research (Figure 2, HITS-CLIP clusters and peaks; and Gene Yeo for Supplementary Figures S1–S2), (iv) two web-based tools, providing CLIP-Seq data. ClipSearch and DegradomeSearch, can be used to predict animal and plant target sites of small RNAs from CLIP-Seq and Degradome-Seq data. We expect that FUNDING access to these tools will enable more researchers to search for target sites of novel miRNAs or endo-siRNAs National Natural Science Foundation of China in the ever-increasing amounts of CLIP-Seq and (No. 30830066, U0631001, 30900820); Ministry of Degradome-Seq data, (v) The starBase database also Science and Technology of China, National Basic provides users the GO annotation and biological Research Program (No. 2005CB724600, 2011CB811300); pathways of miRNA targets (Figure 1). These associated the funds from the Ministry of Education of China and terms may provide valuable insights into the regulatory Guangdong Province (No. IRT0447, NSF05200303, role and function of each miRNA. 9451027501002591); China Postdoctoral Science We have provided a variety of information to facilitate Foundation (No. 20080440800, 200902348). Funding for exploration of miRNA–target interaction maps. Although open access charge: Ministry of Science and Technology some CLIP-Seq clusters with small read numbers may of China, National Basic Research Program (No. simply represent experimental or biological noises, users 2011CB811300). can further filter these CLIP-Seq clusters by viewing Conflict of interest statement. None declared. whether they overlapped with bona fide clusters obtained from the original articles, how many reads mapped to the CLIP-Seq cluster and how many CLIP-Seq experiments REFERENCES include the CLIP-Seq cluster. We expected that the data, web-based tools and the integrative, interactive and versa- 1. Bartel,D.P. (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell, 116, 281–297. tile display provided by the starBase database will aid 2. Bartel,D.P. (2009) MicroRNAs: target recognition and regulatory future experimental and computational studies to functions. Cell, 136, 215–233. discover new miRNA target sites and miRNA–target 3. Filipowicz,W., Bhattacharyya,S.N. and Sonenberg,N. (2008) interaction features. Mechanisms of post-transcriptional regulation by microRNAs: are the answers in sight? Nat. Rev. Genet., 9, 102–114. 4. Rajewsky,N. (2006) microRNA target predictions in animals. Nat. Genet., 38(Suppl.), S8–13. FUTURE DIRECTIONS 5. Lewis,B.P., Burge,C.B. and Bartel,D.P. (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of CLIP-Seq and Degradome-Seq technologies have human genes are microRNA targets. Cell, 120, 15–20. provided powerful ways to study biologically relevant 6. Lewis,B.P., Shih,I.H., Jones-Rhoades,M.W., Bartel,D.P. and Burge,C.B. (2003) Prediction of mammalian microRNA targets. miRNA–target interactions at the transcriptome-wide Cell, 115, 787–798. level. As these technologies are applied to a broader set 7. Krek,A., Grun,D., Poy,M.N., Wolf,R., Rosenberg,L., of species, cell lines, tissues and conditions, we will con- Epstein,E.J., MacMenamin,P., da Piedade,I., Gunsalus,K.C., tinuously maintain and update the database to keep up Stoffel,M. et al. (2005) Combinatorial microRNA target with these improvements. Moreover, we will continue predictions. Nat. Genet., 37, 495–500. D208 Nucleic Acids Research, 2011, Vol. 39, Database issue 8. John,B., Enright,A.J., Aravin,A., Tuschl,T., Sander,C. and 29. Langmead,B., Trapnell,C., Pop,M. and Salzberg,S.L. (2009) Marks,D.S. (2004) Human MicroRNA targets. PLoS Biol., 2, Ultrafast and memory-efficient alignment of short DNA e363. sequences to the human genome. Genome Biol., 10, R25. 9. Kertesz,M., Iovino,N., Unnerstall,U., Gaul,U. and Segal,E. (2007) 30. Rhead,B., Karolchik,D., Kuhn,R.M., Hinrichs,A.S., Zweig,A.S., The role of site accessibility in microRNA target recognition. Fujita,P.A., Diekhans,M., Smith,K.E., Rosenbloom,K.R., Nat. Genet., 39, 1278–1284. Raney,B.J. et al. (2010) The UCSC Genome Browser database: 10. Miranda,K.C., Huynh,T., Tay,Y., Ang,Y.S., Tam,W.L., update 2010. Nucleic Acids Res., 38, D613–D619. Thomson,A.M., Lim,B. and Rigoutsos,I. (2006) A pattern-based 31. Addo-Quaye,C., Miller,W. and Axtell,M.J. (2009) CleaveLand: method for the identification of MicroRNA binding sites and a pipeline for using degradome data to find cleaved small RNA their corresponding heteroduplexes. Cell, 126, 1203–1217. targets. Bioinformatics, 25, 130–131. 11. Zhang,Y. (2005) miRU: an automated plant miRNA target 32. Hoffmann,S., Otto,C., Kurtz,S., Sharma,C.M., Khaitovich,P., prediction server. Nucleic Acids Res., 33, W701–W704. Vogel,J., Stadler,P.F. and Hackermuller,J. (2009) Fast mapping of 12. Fahlgren,N. and Carrington,J.C. (2010) miRNA target prediction short sequences with mismatches, insertions and deletions using in plants. Methods Mol. Biol., 592, 51–57. index structures. PLoS Comput. Biol., 5, e1000502. 13. Papadopoulos,G.L., Reczko,M., Simossis,V.A., Sethupathy,P. and 33. Jones-Rhoades,M.W. and Bartel,D.P. (2004) Computational Hatzigeorgiou,A.G. (2009) The database of experimentally identification of plant microRNAs and their targets, including a supported targets: a functional update of TarBase. Nucleic Acids stress-induced miRNA. Mol. Cell, 14, 787–799. Res., 37, D155–D158. 34. Allen,E., Xie,Z., Gustafson,A.M. and Carrington,J.C. (2005) 14. Xiao,F., Zuo,Z., Cai,G., Kang,S., Gao,X. and Li,T. (2009) microRNA-directed phasing during trans-acting siRNA biogenesis miRecords: an integrated resource for microRNA-target in plants. Cell, 121, 207–221. interactions. Nucleic Acids Res., 37, D105–D110. 35. Griffiths-Jones,S., Saini,H.K., van Dongen,S. and Enright,A.J. 15. Nam,S., Kim,B., Shin,S. and Lee,S. (2008) miRGator: an (2008) miRBase: tools for microRNA genomics. Nucleic Acids integrated system for functional annotation of microRNAs. Res., 36, D154–D158. Nucleic Acids Res., 36, D159–D164. 36. Ashburner,M., Ball,C.A., Blake,J.A., Botstein,D., Butler,H., 16. Hsu,S.D., Chu,C.H., Tsou,A.P., Chen,S.J., Chen,H.C., Hsu,P.W., Cherry,J.M., Davis,A.P., Dolinski,K., Dwight,S.S., Eppig,J.T. Wong,Y.H., Chen,Y.H., Chen,G.H. and Huang,H.D. (2008) et al. (2000) Gene ontology: tool for the unification of miRNAMap 2.0: genomic maps of microRNAs in metazoan biology. The Gene Ontology Consortium. Nat. Genet., 25, genomes. Nucleic Acids Res., 36, D165–D169. 25–29. 17. Ambros,V. (2004) The functions of animal microRNAs. Nature, 37. Kanehisa,M. and Goto,S. (2000) KEGG: kyoto encyclopedia of 431, 350–355. genes and genomes. Nucleic Acids Res., 28, 27–30. 18. Baek,D., Villen,J., Shin,C., Camargo,F.D., Gygi,S.P. and 38. Kersey,P.J., Lawson,D., Birney,E., Derwent,P.S., Haimel,M., Bartel,D.P. (2008) The impact of microRNAs on protein output. Herrero,J., Keenan,S., Kerhornou,A., Koscielny,G., Kahari,A. Nature, 455, 64–71. et al. (2010) Ensembl Genomes: extending Ensembl across the 19. Sethupathy,P., Megraw,M. and Hatzigeorgiou,A.G. (2006) A taxonomic space. Nucleic Acids Res., 38, D563–D569. guide through present computational approaches for the 39. Lestrade,L. and Weber,M.J. (2006) snoRNA-LBME-db, a identification of mammalian microRNA targets. Nat. Methods, 3, comprehensive database of human H/ACA and C/D box 881–886. snoRNAs. Nucleic Acids Res., 34, D158–D162. 20. Chi,S.W., Zang,J.B., Mele,A. and Darnell,R.B. (2009) Argonaute 40. Chan,P.P. and Lowe,T.M. (2009) GtRNAdb: a database of HITS-CLIP decodes microRNA-mRNA interaction maps. Nature, transfer RNA genes detected in genomic sequence. Nucleic Acids 460, 479–486. Res., 37, D93–D97. 21. Zisoulis,D.G., Lovci,M.T., Wilbert,M.L., Hutt,K.R., Liang,T.Y., 41. Brown,J.W., Echeverria,M., Qu,L.H., Lowe,T.M., Bachellerie,J.P., Pasquinelli,A.E. and Yeo,G.W. (2010) Comprehensive discovery Huttenhofer,A., Kastenmayer,J.P., Green,P.J., Shaw,P. and of endogenous Argonaute binding sites in Caenorhabditis elegans. Marshall,D.F. (2003) Plant snoRNA database. Nucleic Acids Res., Nat. Struct. Mol. Biol., 17, 173–179. 31, 432–435. 22. Hafner,M., Landthaler,M., Burger,L., Khorshid,M., Hausser,J., 42. Chan,A.P., Rabinowicz,P.D., Quackenbush,J., Buell,C.R. and Berninger,P., Rothballer,A., Ascano,M. Jr, Jungkamp,A.C., Town,C.D. (2007) Plant database resources at The Institute for Munschauer,M. et al. (2010) Transcriptome-wide identification of Genomic Research. Methods Mol. Biol., 406, 113–136. RNA-binding protein and microRNA target sites by PAR-CLIP. 43. Ouyang,S., Zhu,W., Hamilton,J., Lin,H., Campbell,M., Childs,K., Cell, 141, 129–141. Thibaud-Nissen,F., Malek,R.L., Lee,Y., Zheng,L. et al. (2007) 23. German,M.A., Pillay,M., Jeong,D.H., Hetawal,A., Luo,S., The TIGR Rice Genome Annotation Resource: improvements Janardhanan,P., Kannan,V., Rymarquis,L.A., Nobuta,K., and new features. Nucleic Acids Res., 35, D883–D887. German,R. et al. (2008) Global identification of microRNA-target 44. Jaillon,O., Aury,J.M., Noel,B., Policriti,A., Clepet,C., RNA pairs by parallel analysis of RNA ends. Nat. Biotechnol., Casagrande,A., Choisne,N., Aubourg,S., Vitulo,N., Jubin,C. et al. 26, 941–946. (2007) The grapevine genome sequence suggests ancestral 24. Addo-Quaye,C., Eshoo,T.W., Bartel,D.P. and Axtell,M.J. (2008) hexaploidization in major angiosperm phyla. Nature, 449, Endogenous siRNA and miRNA targets identified by sequencing 463–467. of the Arabidopsis degradome. Curr Biol., 18, 758–762. 45. German,M.A., Luo,S., Schroth,G., Meyers,B.C. and Green,P.J. 25. Wu,L., Zhang,Q., Zhou,H., Ni,F., Wu,X. and Qi,Y. (2009) Rice (2009) Construction of Parallel Analysis of RNA Ends (PARE) MicroRNA effector complexes and targets. Plant Cell, 21, libraries for the study of cleaved miRNA targets and the RNA 3421–3435. degradome. Nat. Protoc., 4, 356–362. 26. Pantaleo,V., Szittya,G., Moxon,S., Miozzi,L., Moulton,V., 46. Yang,J.H., Shao,P., Zhou,H., Chen,Y.Q. and Qu,L.H. (2010) Dalmay,T. and Burgyan,J. (2010) Identification of grapevine deepBase: a database for deeply annotating and mining deep microRNAs and their targets using high throughput sequencing sequencing data. Nucleic Acids Res., 38, D123–D130. and degradome analysis. Plant J., 62, 960–976. 47. Shin,C., Nam,J.W., Farh,K.K., Chiang,H.R., Shkumatava,A. and 27. Zhou,M., Gu,L., Li,P., Song,X., Wei,L., Chen,Z. and Cao,X. Bartel,D.P. (2010) Expanding the microRNA targeting code: (2010) Degradome sequencing reveals endogenous small functional sites with centered pairing. Mol. Cell, 38, 789–802. RNA targets in rice (Oryza sativa L. ssp. indica). Front. Biol., 5, 48. Orom,U.A., Nielsen,F.C. and Lund,A.H. (2008) MicroRNA-10a 67–90. binds the 5 UTR of ribosomal protein mRNAs and enhances 28. Barrett,T., Troup,D.B., Wilhite,S.E., Ledoux,P., Rudnev,D., their translation. Mol. Cell, 30, 460–471. Evangelista,C., Kim,I.F., Soboleva,A., Tomashevsky,M., 49. Tay,Y., Zhang,J., Thomson,A.M., Lim,B. and Rigoutsos,I. (2008) Marshall,K.A. et al. (2009) NCBI GEO: archive for MicroRNAs to Nanog, Oct4 and Sox2 coding regions high-throughput functional genomic data. Nucleic Acids Res., 37, modulate embryonic stem cell differentiation. Nature, 455, D885–D890. 1124–1128. Nucleic Acids Research, 2011, Vol. 39, Database issue D209 50. Karginov,F.V., Cheloufi,S., Chong,M.M., Stark,A., Smith,A.D. 53. Yeo,G.W., Coufal,N.G., Liang,T.Y., Peng,G.E., Fu,X.D. and and Hannon,G.J. (2010) Diverse endonucleolytic cleavage sites in Gage,F.H. (2009) An RNA code for the FOX2 splicing regulator the mammalian transcriptome depend upon microRNAs, Drosha, revealed by mapping RNA-protein interactions in stem cells. and additional nucleases. Mol. Cell, 38, 781–788. Nat. Struct. Mol. Biol., 16, 130–137. 51. Licatalosi,D.D. and Darnell,R.B. RNA processing and its 54. Xue,Y., Zhou,Y., Wu,T., Zhu,T., Ji,X., Kwon,Y.S., Zhang,C., regulation: global insights into biological networks. Yeo,G., Black,D.L., Sun,H. et al. (2009) Genome-wide analysis Nat. Rev. Genet., 11, 75–87. of PTB-RNA interactions reveals a strategy used by the general 52. Licatalosi,D.D., Mele,A., Fak,J.J., Ule,J., Kayikci,M., Chi,S.W., splicing repressor to modulate exon inclusion or skipping. Clark,T.A., Schweitzer,A.C., Blume,J.E., Wang,X. et al. (2008) Mol. Cell, 36, 996–1006. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature, 456, 464–469. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nucleic Acids Research Oxford University Press

starBase: a database for exploring microRNAmRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data

Loading next page...
 
/lp/oxford-university-press/starbase-a-database-for-exploring-micrornamrna-interaction-maps-from-A65NmPnFtS

References (64)

Publisher
Oxford University Press
Copyright
The Author(s) 2010. Published by Oxford University Press.
ISSN
0305-1048
eISSN
1362-4962
DOI
10.1093/nar/gkq1056
pmid
21037263
Publisher site
See Article on Publisher Site

Abstract

D202–D209 Nucleic Acids Research, 2011, Vol. 39, Database issue Published online 29 October 2010 doi:10.1093/nar/gkq1056 starBase: a database for exploring microRNA–mRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data Jian-Hua Yang, Jun-Hao Li, Peng Shao, Hui Zhou, Yue-Qin Chen and Liang-Hu Qu* Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory for Biocontrol, Sun Yat-sen University, Guangzhou 510275, P. R. China Received August 12, 2010; Revised and Accepted October 13, 2010 INTRODUCTION ABSTRACT MicroRNAs (miRNAs) are endogenous 22 nt RNAs MicroRNAs (miRNAs) represent an important class that direct the post-transcriptional repression of protein- of small non-coding RNAs (sRNAs) that regulate coding genes (1,2). By base pairing to mRNAs, miRNAs gene expression by targeting messenger RNAs. mediate translational repression or mRNA degradation However, assigning miRNAs to their regulatory (1–3). Functional studies indicate that miRNAs partici- target genes remains technically challenging. pate in the regulation of numerous cellular processes, Recently, high-throughput CLIP-Seq and such as proliferation, apoptosis, differentiation and the degradome sequencing (Degradome-Seq) methods cell cycle (1–3). have been applied to identify the sites of Argonaute Thousands of miRNAs have been identified in animals interaction and miRNA cleavage sites, respectively. and plants by cloning and deep sequencing, but In this study, we introduce a novel database, determining the targets of these miRNAs is an ongoing challenge (1–4). To date, a large number of target predic- starBase (sRNA target Base), which we have de- tion computer programs have been developed, such as veloped to facilitate the comprehensive exploration TargetScan (5,6), PicTar (7), miRanda (8), PITA (9) and of miRNA–target interaction maps from CLIP-Seq RNA22 (10) for animal miRNA targets, and miRU (11) and Degradome-Seq data. The current version and TargetFinder (12) for plant miRNA targets. In includes high-throughput sequencing data addition, several resources have been established to sys- generated from 21 CLIP-Seq and 10 Degradome- tematically collect and describe both experimentally Seq experiments from six organisms. By analyzing validated miRNA targets [TarBase (13), miRecords (14)] millions of mapped CLIP-Seq and Degradome-Seq and predicted miRNA targets [miRGator (15), reads, we identified 1 million Ago-binding clusters MiRNAMap (16)]. However, because miRNA regulation and 2 million cleaved target clusters in animals of an animal mRNA requires base pairing with only few and plants, respectively. Analyses of these nucleotides of the 3 -UTR region of the target mRNA, different target prediction programs produce different clusters, and of target sites predicted by 6 miRNA results and have high false positive rates (4,17–19). target prediction programs, resulted in our identifi- Although plant miRNA targets have been predicted on cation of approximately 400 000 and approximately the basis of their extensive and often conserved comple- 66 000 miRNA-target regulatory relationships from mentarity to miRNAs (1,2), we must spend substantial CLIP-Seq and Degradome-Seq data, respectively. time and effort attempting to validate predicted miRNA Furthermore, two web servers were provided to targets that turn out to be false. discover novel miRNA target sites from CLIP-Seq In the past several years, significant efforts have been and Degradome-Seq data. Our web implementation made in determining biologically relevant miRNA–target supports diverse query types and exploration of interactions using high-throughput experimental common targets, gene ontologies and pathways. approaches. Several recent studies have reported the use The starBase is available at http://starbase.sysu of cross-linking and Argonaute (Ago) immunopre- cipitation coupled with high-throughput sequencing .edu.cn/. *To whom correspondence should be addressed. Tel: 86 20 84112399; Fax: 86 20 84036551; Email: lssqlh@mail.sysu.edu.cn Present address: Liang-Hu Qu, Biotechnology Research Center, Sun Yat-sen University, Guangzhou 510275, PR China. The Author(s) 2010. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2011, Vol. 39, Database issue D203 [CLIP-Seq, also referred to as high-throughput sequencing and Degradome-Seq data, this database is expected to provide considerable resources to help researchers of RNA isolated by crosslinking immunoprecipitation investigating new miRNA–target interactions and de- (HITS-CLIP)] (20–22) and high-throughput degradome veloping next generation miRNA target prediction sequencing [Degradome-Seq, also referred to as ‘parallel analysis of RNA ends’ (PARE)] (23–27) to isolate targets algorithms. in animals and plants. The application of CLIP-Seq and Degradome-Seq methods has significantly reduced the MATERIALS AND METHODS rate of false positive predictions of miRNA binding sites and has also reduced the size of the search space for Twenty-one Ago or TNRC6 CLIP-Seq sequence data sets miRNA target sites (20–27). The increasing amount of and 10 Degradome-Seq sequence data sets were compiled CLIP-Seq and Degradome-Seq data generates a strong from eight related studies (20–27) and downloaded from demand among researchers for an integrated database NCBI GEO database (28) or obtained from the that could facilitate the annotation and analysis of these Supplementary Data of the original articles (20–27). Ago data. and TNRC6 CLIP-Seq reads were mapped to genomes To meet this need, we have developed and are using the Bowtie program (version 0.12.0) (29) with introducing via the current study, the starBase database. options:-a –best –strata -v 2 -m 1. Reads with multiple The starBase facilitates the integrative, interactive and ver- equivalent hits to the genome were discarded. satile display of, as well as the comprehensive annotation Degradome-Seq data were mapped to genomes and and discovery of, miRNA–target interaction maps from cDNA sequences using the Bowtie program (version CLIP-Seq and Degradome-Seq data from six organisms: 0.12.0) (29) with options: -a -m 10 -v 0 and -a -v 0, re- human, mouse, Caenhorhabditis elegans, Arabidopsis spectively. The overlapping reads were grouped into thaliana, Oryza sativa and Vitis vinifera (Figure 1). clusters, with each cluster including at least one sequence Information on tens of thousands of miRNA–target regu- read and having a minimum length of 20 nt. In addition, latory relationships, as well as millions of Ago-binding the high reliable Ago (ALG-1) CLIP-Seq clusters in sites and cleavage sites (Table 1) is contained within L4-stage wild-type (wt) worms, the top Ago or TNRC6 starBase. In addition, two novel web servers were de- CLIP-Seq clusters in human HEK293 cells and the Ago veloped to identify miRNA binding sites or cleavage CLIP-Seq clusters/peaks in mouse neocortex were sites from CLIP-Seq and Degradome-Seq data. As a obtained from the supplementary material of the means of comprehensively integrating Ago CLIP-Seq original articles (20–22). Figure 1. System overview of starBase core framework. All results generated by starBase are deposited in MySQL relational databases and displayed in the visual browser and web page. D204 Nucleic Acids Research, 2011, Vol. 39, Database issue Table 1. Data statistics in starBase Species Library Mapped read Peak cluster Relationship Known target Human 13 1 137 137 390 990 156 751 536 Mouse 5 2 831 042 725 145 232 196 132 Caenhorhabditis elegans 3 178 639 24 699 12 904 53 Arabidopsis 6 5 575 026 669 589 25 688 / Oryza sativa 3 6 448 130 1 009 870 37 254 / Vitis vinifera 1 2 404 808 431 180 3179 / Statistics indicating the numbers of library (CLIP-Seq and Degradome-Seq), mapped reads, peak clusters (CLIP-Seq clusters or Degradome-Seq clusters), relationships (miRNA–target regulatory relationships) and known animal miRNA target sites for the six organisms (human, mouse, C. elegans, Arabidopsis, Oryza sativa and Vitis vinifera). ‘/’ means that known plant miRNA target sites are not present in the table because miRecords (14) provides only animal miRNA target sites. It should be noted that some miRNA–target regulatory relationships involved miRNAs from the same family and involved mRNA isoforms. Animal miRNA target sites predicted by 5 prediction sequences were downloaded from UCSC bioinformatics programs: [TargetScan (5,6), PicTar (7), miRanda (8), websites (30). Arabidopsis (TAIR9) genome sequences PITA (9) and RNA22 (10)] were downloaded from their were downloaded from TIGR (42). Rice genome se- corresponding websites. Predicted target site coordinates quences were downloaded from the MSU Rice Genome from TargetScan (5), PicTar (7) and PITA (9) were con- Annotation Website (43). Grapevine genome sequences verted to coordinates of recently released genomes using were downloaded from the Genoscope website (44). the liftOver utility from the University of California Santa Cruz (UCSC) bioinformatics websites (30). Target se- quences from RNA22 (10) and miRanda (8) were DATABASE CONTENT aligned to genomes to determine genome coordinates using the Bowtie program (29). Plant miRNA target Genomic landscape of Ago-binding sites and sites were predicted by the CleaveLand program (version miRNA-cleavage sites 2.0) (31). We only considered the miRNA–target inter- To study genome-wide Ago–RNA interaction patterns, we actions with an alignment score from CleaveLand not ex- grouped 4 million mapped CLIP-Seq reads into about 1 ceeding the cutoff threshold of 7.0. Experimentally million clusters (details in ‘Materials and Methods’ validated target sites were downloaded from the section, Table 1). The sequencing depth distribution of miRecords website (14). Then these validated target sites these clusters is presented in the form of target peaks were mapped to genomes to allow determination of (t-peaks), which were displayed in our deepView genome genome coordinates using the Bowtie program (29). browser (Figure 1 and Supplementary Figure S1). This The ClipSearch program was developed to search display method allows a direct comparison of peak for 6–8-mers (8-mer, 7-mer-m8 and 7-mer-A1) (2,5) patterns generated from different Ago proteins, cell lines in CLIP-Seq data. The DegradomeSearch program was and tissues to determine miRNA target sites. In general, developed to search Degradome-Seq clusters for nearly clusters corresponding to the bona fide binding site perfect complements of miRNA sequences. Degradome- are found at a higher peak than those corresponding to Seq cluster sequences, extended by an additional 15 nt 0 0 biological noise (Figure 1 and Supplementary Figure S1). in both the 5 - and 3 -directions for each of the species, All clusters were intersected with annotated genomic were extracted and used as the DegradomeSearch input elements, and >10% of clusters were found to overlap data set. DegradomeSearch web-server aligned miRNA to known 3 -UTR regions in each species (Supplementary extended clusters using segemehl (version 0.093) (32). Table S1). Intriguingly, we found that >10% and >7% Interactions between miRNA and target were scored ac- of clusters overlapped with CDS and intron regions, re- cording to the previously described methods (31,33,34). spectively, in each species (Supplementary Table S1). For each interaction, we performed a search of the The same strategy was applied to group 14 million genome-wide cleavage sites pre-deposited in the MySQL mapped Degradome-Seq reads into about 2 million database to determine if there were cleavage tags at the clusters (Table 1). The majority of these clusters 10th nucleotide of the alignment. overlapped with the mRNA sequences. To study All known miRNAs were downloaded from miRBase patterns of RNA degradation, genome-wide target plots [release 15.0, (35)]. All refGenes were downloaded from (t-plots) (23,45) (Supplementary Figure S2) were con- the UCSC bioinformatics websites (30). GO ontology (36), structed by plotting the abundance of each cleavage sig- Kyoto Encyclopedia of Genes and Genomes (KEGG) nature on the genome sequences. As described by German pathways (37) and BioCarta pathways for refGenes were et al. (23,45), these t-plots can be used to distinguish true extracted from UCSC Table browser (30). Known miRNA cleavage sites from background noise. In general, non-coding RNA genes were downloaded from Ensembl for bona fide miRNA targets, the cleavage tags corres- (38) or UCSC (30) or were obtained from related literature (39–41). Human (UCSC hg19), mouse (UCSC mm9, ponding to the cleavage site are found at higher abun- NCBI Build 37) and C. elegans (WS190) genome dances than those at other positions, making them fairly Nucleic Acids Research, 2011, Vol. 39, Database issue D205 easy to distinguish by simple observation of the t-plots WEB INTERFACE (23,45) (Supplementary Figure S2). The starBase database provides various query interfaces and graphical visualization pages to facilitate analysis Annotation and identification of miRNA–target of the CLIP-Seq and Degradome-Seq data sets and regulatory relationships exploration of miRNA–target interactions. Our improved deepView Genome Browser (46) provides an To investigate animal miRNA–target regulatory relation- integrated view of mapped reads, predicted and known ships, animal miRNA target sites predicted by the five miRNA targets, ncRNAs, protein-coding genes, target prediction programs [TargetScan (5,6), PicTar (7), clusters, target-peaks and target-plots (Figure 2 and miRanda (8), PITA (9) and RNA22 (10)] were intersected Supplementary Figures S1–S2). Bench biologists can use with all CLIP-Seq peak clusters. In total, we identified the genome browser to simultaneously compare the maps approximately 400 000 regulatory relationships between of t-peaks or t-plots generated from multiple experiments 1348 miRNAs and 26 296 genes (Table 1 and and the conservation of binding sites from all target pre- Supplementary Table S2). By using CLIP-Seq data to diction programs. Clicking a track item within the browser filter candidates, the predicted results of each target pre- launches a detailed page providing further information on diction program were significantly reduced, suggesting that item or links to external resources such as NCBI, that there may be a number of false positive predictions UCSC and TAIR, from which one can obtain more com- generated from different computational approaches. prehensive information. To provide valuable insights as to the function of each miRNA, we carried out a comprehensive gene set Web-based exploration of miRNA–target regulation analysis of the miRNA target sets by combining the relationships KEGG pathways (37), the BioCarta pathways and the Gene Ontologies (GO) categories (36). We provide two web interfaces, CLIP-Seq and We applied the CleaveLand program (version 2.0) (31) Degradome-Seq, with which to display the miRNA– to plant Degradome-Seq data, and identified approxi- target interaction relationships (Supplementary Figure mately 66 000 miRNA–target regulatory relationships S3–S4). Users can browse the relationships by entering a that involved 25 579 genes and 856 miRNAs (Table 1 gene name or by selecting a microRNA name. When one and Supplementary Table S2). Due to the integration of starts typing a gene name in the search box, suggested the large number of Degradome-Seq libraries from diverse gene names are displayed in the list box. The user can tissues, this analysis provides an enhanced resolution for then either choose a gene from the list box or finish these regulatory relationships. typing the full gene name. The user can also search for intersections among targets by choosing interested target Predicting target sites of small RNAs from CLIP-Seq prediction programs. The results of the search are listed as and Degradome-Seq data the miRNA–target table. For the CLIP-Seq section, the number of predicted binding sites given by each prediction The increasing amount of CLIP-Seq and Degradome-Seq program and the number of CLIP-Seq reads are indicated data also produces a strong demand for web-based tools in a table. The users can click on the number within to predict target sites of small RNAs from these data. Two the table to launch a detailed page providing further in- web-based tools, ClipSearch and DegradomeSearch, were formation on that miRNA–target interaction. The user developed to screen the potential miRNA binding sites also can click on the title of the table to sort miRNA– and cleavage sites. ClipSearch predicts biological target interactions according to various features, such as miRNA–target interactions by searching for 8-mer and the number of binding sites, miRNA names or gene 7-mer sites that match the seed region of the miRNA. names. The detailed information for a miRNA–target ClipSearch searches for these sites in CLIP-Seq clusters interaction includes a description of the target gene, the that overlap with the 3 -UTR of the known genes. GO terms of the gene, the pathways the target gene is ClipSearch can discover non-conserved miRNA binding involved in and the number of Clip-Seq reads (Supplemen- sites because it does not use cross-species sequence con- tary Figure S3). This information allows the user to filter servation to filter candidates. the putative targets further. DegradomeSearch predicts functional miRNA–target The Degradome-Seq section is organized similarly to the interactions by searching for sites with a near-perfect CLIP-Seq section. The target genes, the genomic coordin- match to the whole miRNA sequence. DegradomeSearch ation, the penalty score of miRNA–mRNA interactions searches for these sites in Degradome-Seq clusters that and the sequence number of cleavage sites are all presented overlap with mRNA (details in ‘Materials and Methods’ in a table. Clicking on the target gene within the table section). Interactions are scored according to a scoring launches a page showing detailed information on the scheme that successfully identified miRNA target sites in miRNA–target interactions (Supplementary Figure S4). plants (31,33,34). In its default setting, DegradomeSearch finds miRNA–target interactions with a penalty score not Web-based tools for predicting target sites of novel small exceeding 7.0 and having at least one cleavage tag. False RNAs from CLIP-Seq and Degradome-Seq data positives or predicted results can be reduced by choosing a lower penalty score or by limiting the minimum number The starBase provides two simple and user-friendly inter- of cleavage tags. faces to allow the users to predict target sites of small D206 Nucleic Acids Research, 2011, Vol. 39, Database issue Figure 2. Illustrative screen shots from the deepView browser. The deepView browser provides an integrated view of CLIP-Seq and Degradome-Seq data, known and predicted miRNA target sites, protein-coding genes, ncRNA genes, miRNAs, strand-specific peak clusters, genome-wide target-peaks and target-plots. RNAs from CLIP-Seq and Degradome-Seq data DISCUSSION AND CONCLUSIONS (Supplementary Figures S5–S6). The user is required to Our global analysis of Ago CLIP-Seq and Degradome- select an intended organism, and then enter nucleotides Seq data derived from 31 experiments in six organisms 2–8 of a mature sequence or a mature miRNA sequence provides a comprehensive integrated map of the for the ClipSearch and DegradomeSearch programs, re- miRNA–target interactions. The large number of spectively. After data submission, a typical run may take Ago-binding sites and cleavage sites identified in this several minutes to finish. To reduce false positives in the study have shown there to be an extensive and complex predicted targets from the ClipSearch program, the user interaction map among Ago proteins, miRNAs and target can filter the candidate targets by selecting site types, RNAs (Table 1). which are classified into 8-mer, 7-mer-m8 and 7-mer-A1 Our initial analysis found that the majority of CLIP-Seq (2,5). The user can also limit the penalty score to reduce and Degradome-Seq clusters could not be clearly pre- the false positive predictions in the DegradomeSearch dicted to be miRNA targets (Supplementary Table 1), program. The sequence depth of a target site or cleavage implying they may bind to novel small RNAs, or site can be used to further reduce false positives in the miRNAs that follow unexpected rules of binding, such predicted targets. The output of the ClipSearch program as the centered pairings (center sites) recently reported consists of three parts: site type, information about the by Bartel and his colleagues (47). Moreover, numerous target gene and visual sequence alignments matched to CLIP-Seq clusters were not located within the 3 -UTR a specific CLIP-Seq cluster (Supplementary Figure S7). of the gene, indicating that the miRNA may bind to the The output of the DegradomeSearch program also coding region and the 5 -UTR, as has been reported for consists of three parts: the penalty score, the miRNA– ribosomal protein regulation by mir-10a (48) and Nanog, mRNA interaction map and the sequence number of Oct4 and Sox2 regulation by miR-134, miR-296 and cleavage site in different experiments (Supplementary miR-470 (49). Recent reports revealed that the Ago Figure S8). A link to the DeepView genome browser is protein also plays a role in miRNA-derived cleavage also provided to allow the user to view various features (47) and in miRNA processing (50) in animals. of each target region. Therefore, one might speculate that a substantial Nucleic Acids Research, 2011, Vol. 39, Database issue D207 number of Argonaute-catalyzed cleavage sites may be to increase the amount of storage space and improve hiding in these data. In plants, vast amounts of the computational efficiency of our computer servers Degradome-Seq data might not be miRNA-derived for storing and analyzing these new data. In addition, cleavage sites, but rather the by-product of other degrad- we intend to integrate other CLIP-Seq data from other ation pathways. Nevertheless, we anticipate that future RNA binding proteins (51), such as PUM2 (22), Nova investigation of these data might provide important (52), FOX2 (53) and PTB (54), into starBase to improve insights into rules governing miRNA–target interactions. our understanding of the eukaryotic regulatory networks. Compared to the other miRNA target-related data- bases, including TarBase (13), miRecords (14), AVAILABILITY miRGator (15) and MiRNAMap (16), which only collect predicted targets or experimentally supported The starBase database is freely available at http://starbase targets, the distinctive features in our starBase database .sysu.edu.cn/. All starBase data files can be freely down- are as follows: (i) CLIP-Seq and Degradome-Seq are loaded and used according to the GNU Public License. the newest high-throughput technology for the transcriptome-wide identification of miRNA target sites in animals and plants (20–27). Our starBase database SUPPLEMENTARY DATA is the first database to provide comprehensive analysis Supplementary Data are available at NAR Online. of public CLIP-Seq and Degradome-Seq data, (ii) genome-wide t-peak and t-plot maps generated by starBase allow users to easily search within these signa- ACKNOWLEDGEMENTS tures for the miRNA cleavage sites or binding sites, (iii) We thank Ling-ling Zheng for her valuable comments; our improved deepView browser in starBase provides an Markus Hafner and Thomas Tuschl for providing the integrated view of multidimensional data to facilitate PAR-CLIP data; Robert Darnell for providing the miRNA regulatory networks research (Figure 2, HITS-CLIP clusters and peaks; and Gene Yeo for Supplementary Figures S1–S2), (iv) two web-based tools, providing CLIP-Seq data. ClipSearch and DegradomeSearch, can be used to predict animal and plant target sites of small RNAs from CLIP-Seq and Degradome-Seq data. We expect that FUNDING access to these tools will enable more researchers to search for target sites of novel miRNAs or endo-siRNAs National Natural Science Foundation of China in the ever-increasing amounts of CLIP-Seq and (No. 30830066, U0631001, 30900820); Ministry of Degradome-Seq data, (v) The starBase database also Science and Technology of China, National Basic provides users the GO annotation and biological Research Program (No. 2005CB724600, 2011CB811300); pathways of miRNA targets (Figure 1). These associated the funds from the Ministry of Education of China and terms may provide valuable insights into the regulatory Guangdong Province (No. IRT0447, NSF05200303, role and function of each miRNA. 9451027501002591); China Postdoctoral Science We have provided a variety of information to facilitate Foundation (No. 20080440800, 200902348). Funding for exploration of miRNA–target interaction maps. Although open access charge: Ministry of Science and Technology some CLIP-Seq clusters with small read numbers may of China, National Basic Research Program (No. simply represent experimental or biological noises, users 2011CB811300). can further filter these CLIP-Seq clusters by viewing Conflict of interest statement. None declared. whether they overlapped with bona fide clusters obtained from the original articles, how many reads mapped to the CLIP-Seq cluster and how many CLIP-Seq experiments REFERENCES include the CLIP-Seq cluster. We expected that the data, web-based tools and the integrative, interactive and versa- 1. Bartel,D.P. (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell, 116, 281–297. tile display provided by the starBase database will aid 2. Bartel,D.P. (2009) MicroRNAs: target recognition and regulatory future experimental and computational studies to functions. Cell, 136, 215–233. discover new miRNA target sites and miRNA–target 3. Filipowicz,W., Bhattacharyya,S.N. and Sonenberg,N. (2008) interaction features. Mechanisms of post-transcriptional regulation by microRNAs: are the answers in sight? Nat. Rev. Genet., 9, 102–114. 4. Rajewsky,N. (2006) microRNA target predictions in animals. Nat. Genet., 38(Suppl.), S8–13. FUTURE DIRECTIONS 5. Lewis,B.P., Burge,C.B. and Bartel,D.P. (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of CLIP-Seq and Degradome-Seq technologies have human genes are microRNA targets. Cell, 120, 15–20. provided powerful ways to study biologically relevant 6. Lewis,B.P., Shih,I.H., Jones-Rhoades,M.W., Bartel,D.P. and Burge,C.B. (2003) Prediction of mammalian microRNA targets. miRNA–target interactions at the transcriptome-wide Cell, 115, 787–798. level. As these technologies are applied to a broader set 7. Krek,A., Grun,D., Poy,M.N., Wolf,R., Rosenberg,L., of species, cell lines, tissues and conditions, we will con- Epstein,E.J., MacMenamin,P., da Piedade,I., Gunsalus,K.C., tinuously maintain and update the database to keep up Stoffel,M. et al. (2005) Combinatorial microRNA target with these improvements. Moreover, we will continue predictions. Nat. Genet., 37, 495–500. D208 Nucleic Acids Research, 2011, Vol. 39, Database issue 8. John,B., Enright,A.J., Aravin,A., Tuschl,T., Sander,C. and 29. Langmead,B., Trapnell,C., Pop,M. and Salzberg,S.L. (2009) Marks,D.S. (2004) Human MicroRNA targets. PLoS Biol., 2, Ultrafast and memory-efficient alignment of short DNA e363. sequences to the human genome. Genome Biol., 10, R25. 9. Kertesz,M., Iovino,N., Unnerstall,U., Gaul,U. and Segal,E. (2007) 30. Rhead,B., Karolchik,D., Kuhn,R.M., Hinrichs,A.S., Zweig,A.S., The role of site accessibility in microRNA target recognition. Fujita,P.A., Diekhans,M., Smith,K.E., Rosenbloom,K.R., Nat. Genet., 39, 1278–1284. Raney,B.J. et al. (2010) The UCSC Genome Browser database: 10. Miranda,K.C., Huynh,T., Tay,Y., Ang,Y.S., Tam,W.L., update 2010. Nucleic Acids Res., 38, D613–D619. Thomson,A.M., Lim,B. and Rigoutsos,I. (2006) A pattern-based 31. Addo-Quaye,C., Miller,W. and Axtell,M.J. (2009) CleaveLand: method for the identification of MicroRNA binding sites and a pipeline for using degradome data to find cleaved small RNA their corresponding heteroduplexes. Cell, 126, 1203–1217. targets. Bioinformatics, 25, 130–131. 11. Zhang,Y. (2005) miRU: an automated plant miRNA target 32. Hoffmann,S., Otto,C., Kurtz,S., Sharma,C.M., Khaitovich,P., prediction server. Nucleic Acids Res., 33, W701–W704. Vogel,J., Stadler,P.F. and Hackermuller,J. (2009) Fast mapping of 12. Fahlgren,N. and Carrington,J.C. (2010) miRNA target prediction short sequences with mismatches, insertions and deletions using in plants. Methods Mol. Biol., 592, 51–57. index structures. PLoS Comput. Biol., 5, e1000502. 13. Papadopoulos,G.L., Reczko,M., Simossis,V.A., Sethupathy,P. and 33. Jones-Rhoades,M.W. and Bartel,D.P. (2004) Computational Hatzigeorgiou,A.G. (2009) The database of experimentally identification of plant microRNAs and their targets, including a supported targets: a functional update of TarBase. Nucleic Acids stress-induced miRNA. Mol. Cell, 14, 787–799. Res., 37, D155–D158. 34. Allen,E., Xie,Z., Gustafson,A.M. and Carrington,J.C. (2005) 14. Xiao,F., Zuo,Z., Cai,G., Kang,S., Gao,X. and Li,T. (2009) microRNA-directed phasing during trans-acting siRNA biogenesis miRecords: an integrated resource for microRNA-target in plants. Cell, 121, 207–221. interactions. Nucleic Acids Res., 37, D105–D110. 35. Griffiths-Jones,S., Saini,H.K., van Dongen,S. and Enright,A.J. 15. Nam,S., Kim,B., Shin,S. and Lee,S. (2008) miRGator: an (2008) miRBase: tools for microRNA genomics. Nucleic Acids integrated system for functional annotation of microRNAs. Res., 36, D154–D158. Nucleic Acids Res., 36, D159–D164. 36. Ashburner,M., Ball,C.A., Blake,J.A., Botstein,D., Butler,H., 16. Hsu,S.D., Chu,C.H., Tsou,A.P., Chen,S.J., Chen,H.C., Hsu,P.W., Cherry,J.M., Davis,A.P., Dolinski,K., Dwight,S.S., Eppig,J.T. Wong,Y.H., Chen,Y.H., Chen,G.H. and Huang,H.D. (2008) et al. (2000) Gene ontology: tool for the unification of miRNAMap 2.0: genomic maps of microRNAs in metazoan biology. The Gene Ontology Consortium. Nat. Genet., 25, genomes. Nucleic Acids Res., 36, D165–D169. 25–29. 17. Ambros,V. (2004) The functions of animal microRNAs. Nature, 37. Kanehisa,M. and Goto,S. (2000) KEGG: kyoto encyclopedia of 431, 350–355. genes and genomes. Nucleic Acids Res., 28, 27–30. 18. Baek,D., Villen,J., Shin,C., Camargo,F.D., Gygi,S.P. and 38. Kersey,P.J., Lawson,D., Birney,E., Derwent,P.S., Haimel,M., Bartel,D.P. (2008) The impact of microRNAs on protein output. Herrero,J., Keenan,S., Kerhornou,A., Koscielny,G., Kahari,A. Nature, 455, 64–71. et al. (2010) Ensembl Genomes: extending Ensembl across the 19. Sethupathy,P., Megraw,M. and Hatzigeorgiou,A.G. (2006) A taxonomic space. Nucleic Acids Res., 38, D563–D569. guide through present computational approaches for the 39. Lestrade,L. and Weber,M.J. (2006) snoRNA-LBME-db, a identification of mammalian microRNA targets. Nat. Methods, 3, comprehensive database of human H/ACA and C/D box 881–886. snoRNAs. Nucleic Acids Res., 34, D158–D162. 20. Chi,S.W., Zang,J.B., Mele,A. and Darnell,R.B. (2009) Argonaute 40. Chan,P.P. and Lowe,T.M. (2009) GtRNAdb: a database of HITS-CLIP decodes microRNA-mRNA interaction maps. Nature, transfer RNA genes detected in genomic sequence. Nucleic Acids 460, 479–486. Res., 37, D93–D97. 21. Zisoulis,D.G., Lovci,M.T., Wilbert,M.L., Hutt,K.R., Liang,T.Y., 41. Brown,J.W., Echeverria,M., Qu,L.H., Lowe,T.M., Bachellerie,J.P., Pasquinelli,A.E. and Yeo,G.W. (2010) Comprehensive discovery Huttenhofer,A., Kastenmayer,J.P., Green,P.J., Shaw,P. and of endogenous Argonaute binding sites in Caenorhabditis elegans. Marshall,D.F. (2003) Plant snoRNA database. Nucleic Acids Res., Nat. Struct. Mol. Biol., 17, 173–179. 31, 432–435. 22. Hafner,M., Landthaler,M., Burger,L., Khorshid,M., Hausser,J., 42. Chan,A.P., Rabinowicz,P.D., Quackenbush,J., Buell,C.R. and Berninger,P., Rothballer,A., Ascano,M. Jr, Jungkamp,A.C., Town,C.D. (2007) Plant database resources at The Institute for Munschauer,M. et al. (2010) Transcriptome-wide identification of Genomic Research. Methods Mol. Biol., 406, 113–136. RNA-binding protein and microRNA target sites by PAR-CLIP. 43. Ouyang,S., Zhu,W., Hamilton,J., Lin,H., Campbell,M., Childs,K., Cell, 141, 129–141. Thibaud-Nissen,F., Malek,R.L., Lee,Y., Zheng,L. et al. (2007) 23. German,M.A., Pillay,M., Jeong,D.H., Hetawal,A., Luo,S., The TIGR Rice Genome Annotation Resource: improvements Janardhanan,P., Kannan,V., Rymarquis,L.A., Nobuta,K., and new features. Nucleic Acids Res., 35, D883–D887. German,R. et al. (2008) Global identification of microRNA-target 44. Jaillon,O., Aury,J.M., Noel,B., Policriti,A., Clepet,C., RNA pairs by parallel analysis of RNA ends. Nat. Biotechnol., Casagrande,A., Choisne,N., Aubourg,S., Vitulo,N., Jubin,C. et al. 26, 941–946. (2007) The grapevine genome sequence suggests ancestral 24. Addo-Quaye,C., Eshoo,T.W., Bartel,D.P. and Axtell,M.J. (2008) hexaploidization in major angiosperm phyla. Nature, 449, Endogenous siRNA and miRNA targets identified by sequencing 463–467. of the Arabidopsis degradome. Curr Biol., 18, 758–762. 45. German,M.A., Luo,S., Schroth,G., Meyers,B.C. and Green,P.J. 25. Wu,L., Zhang,Q., Zhou,H., Ni,F., Wu,X. and Qi,Y. (2009) Rice (2009) Construction of Parallel Analysis of RNA Ends (PARE) MicroRNA effector complexes and targets. Plant Cell, 21, libraries for the study of cleaved miRNA targets and the RNA 3421–3435. degradome. Nat. Protoc., 4, 356–362. 26. Pantaleo,V., Szittya,G., Moxon,S., Miozzi,L., Moulton,V., 46. Yang,J.H., Shao,P., Zhou,H., Chen,Y.Q. and Qu,L.H. (2010) Dalmay,T. and Burgyan,J. (2010) Identification of grapevine deepBase: a database for deeply annotating and mining deep microRNAs and their targets using high throughput sequencing sequencing data. Nucleic Acids Res., 38, D123–D130. and degradome analysis. Plant J., 62, 960–976. 47. Shin,C., Nam,J.W., Farh,K.K., Chiang,H.R., Shkumatava,A. and 27. Zhou,M., Gu,L., Li,P., Song,X., Wei,L., Chen,Z. and Cao,X. Bartel,D.P. (2010) Expanding the microRNA targeting code: (2010) Degradome sequencing reveals endogenous small functional sites with centered pairing. Mol. Cell, 38, 789–802. RNA targets in rice (Oryza sativa L. ssp. indica). Front. Biol., 5, 48. Orom,U.A., Nielsen,F.C. and Lund,A.H. (2008) MicroRNA-10a 67–90. binds the 5 UTR of ribosomal protein mRNAs and enhances 28. Barrett,T., Troup,D.B., Wilhite,S.E., Ledoux,P., Rudnev,D., their translation. Mol. Cell, 30, 460–471. Evangelista,C., Kim,I.F., Soboleva,A., Tomashevsky,M., 49. Tay,Y., Zhang,J., Thomson,A.M., Lim,B. and Rigoutsos,I. (2008) Marshall,K.A. et al. (2009) NCBI GEO: archive for MicroRNAs to Nanog, Oct4 and Sox2 coding regions high-throughput functional genomic data. Nucleic Acids Res., 37, modulate embryonic stem cell differentiation. Nature, 455, D885–D890. 1124–1128. Nucleic Acids Research, 2011, Vol. 39, Database issue D209 50. Karginov,F.V., Cheloufi,S., Chong,M.M., Stark,A., Smith,A.D. 53. Yeo,G.W., Coufal,N.G., Liang,T.Y., Peng,G.E., Fu,X.D. and and Hannon,G.J. (2010) Diverse endonucleolytic cleavage sites in Gage,F.H. (2009) An RNA code for the FOX2 splicing regulator the mammalian transcriptome depend upon microRNAs, Drosha, revealed by mapping RNA-protein interactions in stem cells. and additional nucleases. Mol. Cell, 38, 781–788. Nat. Struct. Mol. Biol., 16, 130–137. 51. Licatalosi,D.D. and Darnell,R.B. RNA processing and its 54. Xue,Y., Zhou,Y., Wu,T., Zhu,T., Ji,X., Kwon,Y.S., Zhang,C., regulation: global insights into biological networks. Yeo,G., Black,D.L., Sun,H. et al. (2009) Genome-wide analysis Nat. Rev. Genet., 11, 75–87. of PTB-RNA interactions reveals a strategy used by the general 52. Licatalosi,D.D., Mele,A., Fak,J.J., Ule,J., Kayikci,M., Chi,S.W., splicing repressor to modulate exon inclusion or skipping. Clark,T.A., Schweitzer,A.C., Blume,J.E., Wang,X. et al. (2008) Mol. Cell, 36, 996–1006. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature, 456, 464–469.

Journal

Nucleic Acids ResearchOxford University Press

Published: Jan 30, 2011

There are no references for this article.