Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein–RNA interaction networks from large-scale CLIP-Seq data

starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein–RNA interaction networks from... D92–D97 Nucleic Acids Research, 2014, Vol. 42, Database issue Published online 1 December 2013 doi:10.1093/nar/gkt1248 starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein–RNA interaction networks from large-scale CLIP-Seq data Jun-Hao Li, Shun Liu, Hui Zhou, Liang-Hu Qu* and Jian-Hua Yang* RNA Information Center, Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory for Biocontrol, Sun Yat-sen University, Guangzhou 510275, PR China Received September 15, 2013; Revised October 28, 2013; Accepted November 9, 2013 (miRNAs), long non-coding RNAs (lncRNAs), pseudo- ABSTRACT genes and circular RNAs (circRNAs). These RNA mol- Although microRNAs (miRNAs), other non-coding ecules are emerging as key regulators of diverse cellular RNAs (ncRNAs) (e.g. lncRNAs, pseudogenes and processes, including proliferation, apoptosis, differenti- circRNAs) and competing endogenous RNAs ation and the cell cycle (1–7). (ceRNAs) have been implicated in cell-fate determin- Although many studies that address these ncRNAs have ation and in various human diseases, surprisingly little focused on defining their protein-coding gene regulatory functions, increasing numbers of researchers are assessing is known about the regulatory interaction networks the regulatory interactions between ncRNAs classes, as well among the multiple classes of RNAs. In this study, as the relationships between RNA-binding proteins (RBP) we developed starBase v2.0 (http://starbase.sysu. and ncRNAs. Several well-characterized lncRNAs (e.g. edu.cn/) to systematically identify the RNA–RNA and HOTAIR) exert their functions cooperatively with RBPs protein–RNA interaction networks from 108 CLIP-Seq (e.g. EZH2) in cancers (8,9). Multiple classes of ncRNAs (PAR-CLIP, HITS-CLIP, iCLIP, CLASH) data sets (lncRNAs, circRNAs, pseudogenes) and protein-coding generated by 37 independent studies. By analyzing mRNAs function as key competing endogenous RNAs millions of RNA-binding protein binding sites, we (ceRNAs) and ‘super-sponges’ to regulate the expression identified 9000 miRNA-circRNA, 16 000 miRNA- of mRNAs in plants and mammalian cells (4,6,7,10–14). pseudogene and 285 000 protein–RNA regulatory rela- However, the understanding of ceRNA mechanisms and tionships. Moreover, starBase v2.0 has been updated its consequences are in their infancy, and further experi- to provide the most comprehensive CLIP-Seq experi- mental evidences and large-scale bioinformatic efforts for ceRNAs are needed. Despite these intriguing studies of in- mentally supported miRNA-mRNA and miRNA- dividual miRNA-ncRNA and protein–RNA interactions, lncRNA interaction networks to date. We identified generalizing these findings to thousands of RNAs remains 10 000 ceRNA pairs from CLIP-supported miRNA a daunting challenge. target sites. By combining 13 functional genomic Recent advances in high-throughput sequencing of annotations, we developed miRFunction and immunoprecipitated RNAs after cross-linking (CLIP- ceRNAFunction web servers to predict the function Seq, HITS-CLIP, PAR-CLIP, CLASH, iCLIP) provide of miRNAs and other ncRNAs from the miRNA- powerful ways to identify biologically relevant miRNA- mediated regulatory networks. Finally, we developed target and RBP–RNA interactions (5,15,16). The applica- interactive web implementations to provide visualiza- tion of CLIP-Seq methods has reliably identified tion, analysis and downloading of the aforementioned Argonaute (Ago) and other RBP binding sites (5,15,16). We and others have used CLIP-seq data generated from large-scale data sets. This study will greatly expand HEK293 cells to characterize miRNA-mRNA and our understanding of ncRNA functions and their miRNA-lncRNA interactions (17–21). With the increasing coordinated regulatory networks. amount of CLIP-Seq data available, there is a great need to integrate these large-scale data sets to explore the INTRODUCTION miRNA-pseudogene, miRNA-circRNA and protein– Eukaryotic genomes encode thousands of short and long RNA interactions and to further construct ceRNA regu- non-coding RNAs (ncRNAs), such as microRNAs latory networks involving mRNAs and ncRNAs. *To whom correspondence should be addressed. Tel: +86 20 84112399; Fax: +86 20 84036551; Email: lssqlh@mail.sysu.edu.cn Correspondence may also be addressed to Jian-Hua Yang. Tel: +86 20 84112517; Fax: +86 20 84036551; Email: yangjh7@mail.sysu.edu.cn The Author(s) 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2014, Vol. 42, Database issue D93 To facilitate the annotation, visualization, analysis and ce6/ce10 assemblies, respectively, by using the UCSC discovery of these interaction networks from large-scale LiftOver Tool (24). CLIP-Seq data, we have updated starBase (17) to version 2.0 (starBase v2.0) (Figure 1). In starBase v2.0, Ago CLIP-supported miRNA target prediction from we performed a large-scale integration of public RBP public database binding sites generated by high-throughput CLIP-Seq Conserved miRNA families were defined as those labeled technology and provided the most comprehensive RBP with ‘highly conserved’ or ‘conserved’ in TargetScan data set for various cell types that are presently available. Release 6.2 (25). miRNA IDs from miRBase Release 20 By analyzing millions of Ago and other RBP binding sites, were used (26). Genomic coordinates of these conserved we constructed the most comprehensive miRNA-lncRNA, miRNAs target sites predicted by TargetScan (25), miRNA-pseudogene, miRNA-circRNA, miRNA-mRNA miRanda/mirSVR (27), PITA (28), Pictar 2.0 (19) and and protein–RNA interaction networks. RNA22 (29) were collected and converted to hg19, mm9/ mm10 and ce6/ce10 assemblies using LiftOver, respectively. The resulting coordinates were intersected with the previ- MATERIALS AND METHODS ously described Ago CLIP clusters using BEDTools v2.16.2 Integration of Ago and other RBP binding sites from (30). The target sites that overlap with any entry of the Ago published CLIP data CLIP clusters were considered as CLIP-supported sites. HITS-CLIP, PAR-CLIP, iCLIP and CLASH data were MicroRNA target scanning in annotated transcripts retrieved from the Gene Expression Omnibus (22), the supplementary data of original references or directly Human gene annotations were acquired from GENCODE from authors on request (Supplementary Table S1). v17 (31). Protein-coding transcripts were defined as those Although Ago PAR-CLIP raw data were preprocessed with ‘protein_coding’ gene biotype and ‘protein_coding’ with the FASTX-Toolkit v0.0.13 and reanalyzed using transcript biotype. The lncRNAs transcripts were PARalyzer v1.1 (23), other CLIP-identified binding defined as those with ‘processed_transcript’, ‘lincRNA’, sites clusters/peaks were used directly. All binding sites ‘3prime_overlapping_ncrna’, ‘antisense’, ‘non_coding’, coordinates were converted to hg19, mm9/mm10 and ‘sense_intronic’ or ‘sense_overlapping’ gene biotype. Technologies(108 datasets from 37 studies) miRNA-target interactions HITS-CLIP (47 datasets) PAR-CLIP (51 datasets) miRNA-mRNA interactions miRNA-lncRNA interactions iCLIP (9 datasets) CLASH (1 datasets) miRNA-pseudogene interactions miRNA-circRNA interactions mapping and predict ceRNAs peak identification RBP Binding Sites miRNA-mediated ceRNA Networks miRNAs common miRNAs miRNAs predict miRNA targets predict gene function Functional Annotations miRNA-mRNA interactions Ago Enrichment P-value and False Discovery Rate (FDR) Gene Ontology: miRNA (1)Biological Process (2)Molecular Function (3)Cell Component miRNA-ncRNA (lncRNAs, pseudogenes, circRNAs) function Pathways: Ago (4)KEGG Pathways (5)PANTHER Pathways (6)Biocarta Pathways (7)Reactome Pathways (8)All Canonical Pathways Protein-RNA interactions MSigDB and other resources: RBP RBP (9)Cancer Gene Neighborhoods (10)Oncogenic Signatures (11)Chemical or Genetic Perturbations (12)Disease Ontology (13)Transcription Factor Targets Figure 1. A system-level overview of the starBase v2.0 core framework. A total of 108 data sets of CLIP-seq experiments were compiled to achieve various RBP target sites. Interactions between miRNAs and target genes were predicted and used to construct miRNA-mediated ceRNA networks. Functional predictions of miRNAs and associated genes were achieved by enrichment analysis of 13 functional genomic annotations. All results generated by starBase were deposited in MySQL relational databases and displayed in the visual browser and web pages. D94 Nucleic Acids Research, 2014, Vol. 42, Database issue Small non-coding RNA (sncRNA) transcripts were Other RBP binding sites in annotated transcripts defined as those with ‘snRNA’, ‘snoRNA’, ‘rRNA’, The aforementioned RBP CLIP clusters were used to ‘Mt_tRNA’, ‘Mt_rRNA’, ‘misc_RNA’ or ‘miRNA’ gene intersect with the coordinates of all annotated transcripts biotype. Pseudogene transcripts were defined as those with to find their RBP binding sites. ‘polymorphic_pseudogene’, ‘pseudogene’, ‘IG_C_ pseudogene’, ‘IG_J_pseudogene’, ‘IG_V_pseudogene’, Other annotation data sets ‘TR_V_pseudogene’ or ‘TR_J_pseudogene’ gene biotype. All refSeq genes were downloaded from the UCSC bio- Mouse and Caenorhabditis elegans gene annotations informatics Web sites (38). Other known ncRNAs were were extracted from Ensembl Gene Release 72 and downloaded from the Ensembl database (39) or the LiftOver to mm9/mm10 and ce6/ce10, respectively. UCSC Web sites (38) The human (UCSC hg19), mouse Protein-coding, lncRNAs, sncRNAs and pseudogenes (UCSC mm9/mm10) and C. elegans (UCSC ce6/ce10) were classified using a similar method. Human, mouse genome sequences were downloaded from the UCSC and C. elegans circRNA annotations were downloaded bioinformatics Web sites (38). from circBase v0.1 (6). These transcripts were scanned to find conserved miRNAs target sites using miRanda v3.3a with the DATABASE CONTENT ‘-strict’ parameter. The target sites that overlap with any The genome-wide binding map of Ago and other RBPs entry of the aforementioned AGO CLIP clusters were con- sidered as the CLIP-supported target sites. To depict a comprehensive binding map of Ago and other RBP, we integrated 108 published CLIP-seq data generated from various tissues or cell lines under different Identification of ceRNA pairs with hypergeometric test treatments in 37 independent studies (detailed in ‘Materials and Methods’, Supplementary Table S1). For A hypergeometric test (14) is executed for each ceRNA the Ago protein, a total of 1 007 618, 26 833 and 4842 pair separately, which is defined by four parameters: (i) unique binding site clusters were compiled in human, N is the total number of miRNAs used to predict targets; mouse and C. elegans, respectively (Table 1). These (ii) K is the number of miRNAs that interact with the clusters were used in the following analysis to obtain chosen gene of interest; (iii) n is the number of miRNAs CLIP-supported miRNA target sites of high confidence. that interact with the candidate ceRNA of the chosen Millions of binding site clusters of 42 other RBPs were gene; and (iv) c is the common miRNA number between also achieved (Supplementary Table S1 and Table 1). these two genes. The test calculates the P-value by using the following formula: The annotation and identification of miRNA-mRNA and miRNA-ncRNA interactions K N  K minðK,nÞ To inspect genome-wide interactions between miRNAs i n  i P ¼ and their target genes, we retrieved the conserved miRNA target sites predicted by five algorithms i¼c (TargetScan, miRanda, Pictar2, PITA and RNA22) from public databases, which were intersected with the Multiple miRNAs belonging to the same family were aforementioned Ago CLIP clusters to gain CLIP-sup- combined into one, and the hypergeometric test counted ported sites. Using this approach, we characterized every miRNA family only once, even if it had 500 000 interactions between 818 conserved miRNAs multiple binding sites at the same 3 -UTR of protein- and 20 480 protein-coding genes. coding genes or transcript of non-coding genes. All We also investigated the potential regulatory relation- P-values were subject to false discovery rate (FDR) cor- ships between miRNAs and non-coding RNAs. We rection (32). performed conserved miRNA target site scanning on the transcripts of lncRNAs, sncRNAs, pseudogenes and circRNAs using miRanda, and filtered the resulting can- Enrichment analysis for functional terms didates with the previously described Ago CLIP clusters. Although less than CLIP-supported miRNA-mRNA GO ontology data (33) for the NCBI RefSeq genes were interactions, the thousands of CLIP-supported miRNA- downloaded from the NCBI ftp site. The Kyoto ncRNA interactions suggested that miRNAs might Encyclopedia of Genes and Genomes (KEGG) pathways regulate other ncRNAs as well. (34) were downloaded from the KEGG database. The protein analysis through evolutionary relationships The annotation and identification of miRNA-mediated (PANTHER) pathways was downloaded from the ceRNA regulatory networks PANTHER database (35). The Reactome (36) and other pathways were downloaded from the molecular To construct and characterize the miRNA-mediated signatures database (MSigDB) (37). Enrichment ana- ceRNA network, a workflow was developed to identify lysis for these pathways in the data set was determined the ceRNA pairs (Figure 1). First, CLIP-supported using a hypergeometric test with Bonferroni and FDR miRNA-mRNA, miRNA-lncRNA, miRNA-circRNA correction (32). and miRNA-pseudogene interactions were combined. Nucleic Acids Research, 2014, Vol. 42, Database issue D95 Table 1. The data sets that are incorporated into starBase v2.0 Species Experiments RBPs Cell lines/ ABSs RBSs miRNA-mRNA miRNA-ncRNA ceRNA protein–RNA tissues Human 85 36 18 1 007 618 8 206 884 423 975 35 459 11 439 242 017 Mouse 21 11 16 26 833 1 857 199 64 749 234 829 51 542 C. elegans 2 2 2 4842 1360 12 883 140 2 411 These statistics show the numbers of sequencing experiments (CLIP-Seq), RNA-binding proteins (RBPs) covered in these experiments, cell lines or tissues used in these experiments, Ago binding sites (ABSs), other RNA-binding protein binding sites (RBSs), miRNA-mRNA interactions, miRNA- ncRNA interactions, ceRNA pairs and protein–RNA interactions that are incorporated into starBase. These data are from three organisms: human (hg19), mouse (mm9) and C. elegans (ce6). Next, hypergeometric test was used to predict ceRNA microRNAs’ is required. In contrast from the options pairs among mRNAs, lncRNAs, circRNAs and pseudo- earlier in text, it allows users to select one or more genes. Finally, all ceRNA pairs with FDR<0.05 were miRNAs in the drop-down list. The results page shows imported into mySQL database and displayed in a web the enrichment analysis for 13 functional prediction page. In this study, we identified approximately 10 000 categories. The running parameters, selected miRNA ceRNA pairs from CLIP-supported miRNA target sites. target genes and every outcome in the 13 categories of Surprisingly, many nodes of ceRNA networks are function prediction are available for users to download. lncRNAs, circRNAs and pseudogenes. Several experimen- For ceRNA functions prediction, the query page also tally validated ceRNAs were recaptured in our starBase presents users five options and gene symbol is required v2.0, e.g. PTEN ceRNA: DCBLD2 (P< 0.001), JARID2 to enter. The results page is similar to the miRNA function predictions page. (P< 0.005), LRCH1 (P< 0.00005), TNRC6A (P< 0.00005) (12–14). EXAMPLE APPLICATIONS WEB INTERFACE In the following section, example applications of starBase v2.0 are illustrated. The web-based exploration of miRNA-mRNA, miRNA-ncRNA and protein–RNA regulatory relationships The targetome of hsa-miR-21-5p Multiple web interfaces are applied to display three types Assume that we are interested in the targetome of hsa- of regulatory relationships. As an example, we explore miR-21-5p. Given the constraints requiring target sites miRNA-mRNA interactions to introduce the platform to have a number of supporting experiments no less application. In the query page of miRNA-mRNA inter- than one and to be predicted by at least three of the five actions, users can enter a gene symbol and select one programs (Supplementary Figure S2A), our platform miRNA to browse their relationships. The number of sup- returns 173 CLIP-supported hsa-miR-21-5p target sites porting experiments can be adjusted to control the strin- in 155 protein-coding genes among which ZNF367, gency of the predictions. All relationships will be RHOB and PELI1 rank top three by read numbers displayed in the results page if users do not submit a con- (Supplementary Figure S2B). These results coincide with straint. Once users click on a non-zero number in the data in an experimentally validated database named table, more details are shown. For instance, users can miRTarBase (40), suggesting that these genes are likely click the target location to link to the deepView genome targeted by hsa-miR-21-5p. browser (17) and view data across the entire genome (Supplementary Figure S1). More information about the The identification of ‘super-sponges’ of miRNAs platform application is described in the relevant web Inspired by the observation that CDR1as circRNA (6,7) interfaces. acts as a miR-7 super-sponge that contains multiple target sites from the same miRNA at the same transcript or The web-based exploration of ceRNA regulatory networks 3 -UTR, we tested whether the other class of ncRNAs In the query page, users enter a gene symbol that is used and protein-coding genes hosted in our database also for ceRNAs prediction. The option of ‘minimum common can act as miRNA super-sponges. We can recapitulate miRNA number’ denotes the minimum number of the known CDR1as circRNA as a miR-7 super-sponge miRNAs shared by the input gene and its ceRNA candi- using the miRNA-circRNA interactions web page dates. In the results page, users can click on one of the (Supplementary Figure S3A and B). The results page tabular FDR values for details. sorted by the number of miRNA target sites showed that CDR1as circRNA contains 52 miR-7 target sites The web-based functional annotation of genes from overlapped with CLIP-Seq data (Supplementary Figure miRNA-mediated regulatory networks S3B). The same strategy was applied to search potential For miRNA function predictions, there are five options on super-sponges among mRNAs, lncRNAs and pseudo- the query page, and the option ‘Select one or multiple genes, resulting in tens of candidates, such as XIST and D96 Nucleic Acids Research, 2014, Vol. 42, Database issue HOXD-AS1 lncRNA genes and ONECUT2 and CDK6 cancer genomics data from the Gene Expression Omnibus mRNAs (Supplementary Figure S3C). (GEO), The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) into ceRNAs of the oncogenes starBase to improve our understanding of miRNA- mediated regulatory networks in developmental, physio- Recently, the ceRNA hypothesis has been proposed (3) logical and pathological processes. and efforts have been made to decipher the roles of ceRNA cross talk in regulating cancer-associated genes such as tumor suppressor PTEN (12–14). Requiring a AVAILABILITY minimum common miRNA number of ten and a FDR threshold of 0.05 on the ‘ceRNA Network’ web page starBase v2.0 is freely available at http://starbase.sysu.edu. (Supplementary Figure S4A), our platform produced a cn/. The starBase data files can be downloaded and used in ceRNA network involving PTEN and 123 genes, among accordance with the GNU Public License and the license which the published PTEN ceRNAs LRCH1, TNRC6A of primary data sources. and SMAD5 (12–14) were recaptured (Supplementary Figure S4B). SUPPLEMENTARY DATA We were also able to predict a batch of other cancer- associated genes that were entangled within the highly Supplementary Data are available at NAR Online. sophisticated networks of ceRNAs. For example, NFIB, an oncogene upregulated in small cell lung cancer (41) and estrogen receptor-negative breast cancer (42), was found ACKNOWLEDGEMENTS in multiple ceRNA pairs with other well-described cancer- Ministry of Science and Technology of China, National related genes, such as MLL and KDSR (Supplementary Basic Research Program [No 2011CB811300]; the Figure S4C). National Natural Science Foundation of China [31230042, 30900820, 81070589, 31370791]; the funds CONCLUSIONS from Guangdong Province [S2012010010510, S2013010012457]; Project of Science and Technology By analyzing a large set of Ago and RBP binding sites New Star in ZhuJiang Guangzhou city [2012J2200025]; derived from all available CLIP-Seq experimental tech- Fundamental Research Funds for the Central niques (PAR-CLIP, HITS-CLIP, iCLIP, CLASH), we Universities [2011330003161070]; China Postdoctoral have shown extensive and complex RNA–RNA and Science Foundation [200902348]. This research is sup- protein–RNA interaction networks. ported in part by the Guangdong Province Key Compared with the previous version of starBase (v1.0) Laboratory of Computational Science and the and other databases, the distinctive features of starBase Guangdong Province Computational Science Innovative v2.0 include the following: (i) starBase v2.0 is the first Research Team. database that provides the miRNA-pseudogene inter- action networks; (ii) starBase v2.0 drafts the first inter- action maps between miRNAs and circRNAs; (iii) FUNDING unlike other databases or tools (12,14,43) that predict Funding for open access charge: Ministry of Science and ceRNA regulatory networks using computationally pre- Technology of China, National Basic Research Program dicted miRNA targets, starBase v2.0 provides an [No. 2011CB811300]. enhanced resolution to determine ceRNA functional networks based on miRNA-target interactions Conflict of interest statement. None declared. overlapping with high-throughput CLIP-Seq data; (iv) starBase v2.0 provides the most comprehensive miRNA- lncRNA interactions to date; and (v) starBase v2.0 REFERENCES provides a variety of interfaces and graphic visualizations to facilitate analysis of the massive and heterogeneous 1. Batista,P.J. and Chang,H.Y. (2013) Long noncoding RNAs: cellular address codes in development and disease. Cell, 152, CLIP-Seq, RBP binding sites, miRNA targets and 1298–1307. ceRNA regulatory networks in normal tissues and 2. Yates,L.A., Norbury,C.J. and Gilbert,R.J. (2013) The long and cancer cells. short of microRNA. Cell, 153, 516–519. 3. Salmena,L., Poliseno,L., Tay,Y., Kats,L. and Pandolfi,P.P. (2011) A ceRNA hypothesis: the rosetta stone of a hidden RNA FUTURE DIRECTIONS language? Cell, 146, 353–358. 4. Poliseno,L., Salmena,L., Zhang,J., Carver,B., Haveman,W.J. and As CLIP-Seq technology is applied to a broader set of Pandolfi,P.P. (2010) A coding-independent function of gene and species, cell lines, tissues, conditions and RBPs, we will pseudogene mRNAs regulates tumour biology. Nature, 465, 1033–1038. continuously maintain and update the database. starBase 5. Konig,J., Zarnack,K., Luscombe,N.M. and Ule,J. (2011) Protein- will continue to expand the storage space and improve the RNA interactions: new genomic technologies and perspectives. computer server performance for storing and analyzing Nat. Rev. Genet., 13, 77–83. these new data, and improve the database to accept new 6. Memczak,S., Jens,M., Elefsinioti,A., Torti,F., Krueger,J., user data uploads. In addition, we intend to integrate the Rybak,A., Maier,L., Mackowiak,S.D., Gregersen,L.H., Nucleic Acids Research, 2014, Vol. 42, Database issue D97 Munschauer,M. et al. (2013) Circular RNAs are a large class of (2013) The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res., 41, D64–D69. animal RNAs with regulatory potency. Nature, 495, 333–338. 25. Grimson,A., Farh,K.K., Johnston,W.K., Garrett-Engele,P., 7. Hansen,T.B., Jensen,T.I., Clausen,B.H., Bramsen,J.B., Finsen,B., Lim,L.P. and Bartel,D.P. (2007) MicroRNA targeting specificity Damgaard,C.K. and Kjems,J. (2013) Natural RNA circles in mammals: determinants beyond seed pairing. Mol. Cell, 27, function as efficient microRNA sponges. Nature, 495, 384–388. 8. Gupta,R.A., Shah,N., Wang,K.C., Kim,J., Horlings,H.M., 91–105. 26. Kozomara,A. and Griffiths-Jones,S. (2011) miRBase: integrating Wong,D.J., Tsai,M.C., Hung,T., Argani,P., Rinn,J.L. et al. (2010) Long non-coding RNA HOTAIR reprograms chromatin state to microRNA annotation and deep-sequencing data. Nucleic Acids Res., 39, D152–D157. promote cancer metastasis. Nature, 464, 1071–1076. 27. Betel,D., Koppal,A., Agius,P., Sander,C. and Leslie,C. (2010) 9. Khalil,A.M., Guttman,M., Huarte,M., Garber,M., Raj,A., Rivea Comprehensive modeling of microRNA targets predicts Morales,D., Thomas,K., Presser,A., Bernstein,B.E., van functional non-conserved and non-canonical sites. Genome Biol., Oudenaarden,A. et al. (2009) Many human large intergenic 11, R90. noncoding RNAs associate with chromatin-modifying complexes 28. Kertesz,M., Iovino,N., Unnerstall,U., Gaul,U. and Segal,E. (2007) and affect gene expression. Proc. Natl Acad. Sci. USA, 106, The role of site accessibility in microRNA target recognition. 11667–11672. Nat. Genet., 39, 1278–1284. 10. Cesana,M., Cacchiarelli,D., Legnini,I., Santini,T., Sthandier,O., 29. Miranda,K.C., Huynh,T., Tay,Y., Ang,Y.S., Tam,W.L., Chinappi,M., Tramontano,A. and Bozzoni,I. (2011) A long Thomson,A.M., Lim,B. and Rigoutsos,I. (2006) A pattern-based noncoding RNA controls muscle differentiation by functioning as method for the identification of MicroRNA binding sites and a competing endogenous RNA. Cell, 147, 358–369. their corresponding heteroduplexes. Cell, 126, 1203–1217. 11. Franco-Zorrilla,J.M., Valli,A., Todesco,M., Mateos,I., Puga,M.I., 30. Quinlan,A.R. and Hall,I.M. (2010) BEDTools: a flexible suite Rubio-Somoza,I., Leyva,A., Weigel,D., Garcia,J.A. and of utilities for comparing genomic features. Bioinformatics, 26, Paz-Ares,J. (2007) Target mimicry provides a new mechanism for 841–842. regulation of microRNA activity. Nat. Genet., 39, 1033–1037. 31. Harrow,J., Frankish,A., Gonzalez,J.M., Tapanari,E., 12. Tay,Y., Kats,L., Salmena,L., Weiss,D., Tan,S.M., Ala,U., Diekhans,M., Kokocinski,F., Aken,B.L., Barrell,D., Zadissa,A., Karreth,F., Poliseno,L., Provero,P., Di Cunto,F. et al. (2011) Searle,S. et al. (2012) GENCODE: the reference human genome Coding-independent regulation of the tumor suppressor PTEN by annotation for The ENCODE Project. Genome Res., 22, competing endogenous mRNAs. Cell, 147, 344–357. 1760–1774. 13. Karreth,F.A., Tay,Y., Perna,D., Ala,U., Tan,S.M., Rust,A.G., 32. Benjamini,Y. and Hochberg,Y. (1995) Controlling the false DeNicola,G., Webster,K.A., Weiss,D., Perez-Mancera,P.A. et al. discovery rate: a practical and powerful approach to multiple (2011) In vivo identification of tumor- suppressive PTEN ceRNAs testing. J. R. Stat. Soc. Ser. B, 57, 289–300. in an oncogenic BRAF-induced mouse model of melanoma. Cell, 33. Ashburner,M., Ball,C.A., Blake,J.A., Botstein,D., Butler,H., 147, 382–395. Cherry,J.M., Davis,A.P., Dolinski,K., Dwight,S.S., Eppig,J.T. 14. Sumazin,P., Yang,X., Chiu,H.S., Chung,W.J., Iyer,A., Llobet- et al. (2000) Gene Ontology: tool for the unification of biology. Navas,D., Rajbhandari,P., Bansal,M., Guarnieri,P., Silva,J. et al. Nat. Genet., 25, 25–29. (2011) An extensive microRNA-mediated network of RNA-RNA 34. Kanehisa,M., Goto,S., Sato,Y., Furumichi,M. and Tanabe,M. interactions regulates established oncogenic pathways in (2012) KEGG for integration and interpretation of glioblastoma. Cell, 147, 370–381. large-scale molecular data sets. Nucleic Acids Res., 40, 15. Darnell,R.B. (2010) HITS-CLIP: panoramic views of protein- D109–DD114. RNA regulation in living cells. Wiley Interdiscip. Rev. RNA, 1, 35. Mi,H., Lazareva-Ulitsky,B., Loo,R., Kejariwal,A., Vandergriff,J., 266–286. Rabkin,S., Guo,N., Muruganujan,A., Doremieux,O., 16. Ascano,M., Hafner,M., Cekan,P., Gerstberger,S. and Tuschl,T. Campbell,M.J. et al. (2005) The PANTHER database of protein (2012) Identification of RNA-protein interaction networks using families, subfamilies, functions and pathways. Nucleic Acids Res., PAR-CLIP. Wiley Interdiscip. Rev. RNA, 3, 159–177. 33, D284–D288. 17. Yang,J.H., Li,J.H., Shao,P., Zhou,H., Chen,Y.Q. and Qu,L.H. 36. Matthews,L., Gopinath,G., Gillespie,M., Caudy,M., Croft,D., (2011) starBase: a database for exploring microRNA-mRNA de Bono,B., Garapati,P., Hemish,J., Hermjakob,H., Jassal,B. interaction maps from Argonaute CLIP-Seq and Degradome-Seq et al. (2009) Reactome knowledgebase of human biological data. Nucleic Acids Res., 39, D202–D209. pathways and processes. Nucleic Acids Res., 37, D619–D622. 18. Khorshid,M., Rodak,C. and Zavolan,M. (2011) CLIPZ: a 37. Liberzon,A., Subramanian,A., Pinchback,R., Thorvaldsdottir,H., database and analysis environment for experimentally determined Tamayo,P. and Mesirov,J.P. (2011) Molecular signatures database binding sites of RNA-binding proteins. Nucleic Acids Res., 39, (MSigDB) 3.0. Bioinformatics, 27, 1739–1740. D245–D252. 38. Kuhn,R.M., Karolchik,D., Zweig,A.S., Wang,T., Smith,K.E., 19. Anders,G., Mackowiak,S.D., Jens,M., Maaskola,J., Kuntzagk,A., Rosenbloom,K.R., Rhead,B., Raney,B.J., Pohl,A., Pheasant,M. Rajewsky,N., Landthaler,M. and Dieterich,C. (2012) doRiNA: a et al. (2009) The UCSC Genome Browser Database: update 2009. database of RNA interactions in post-transcriptional regulation. Nucleic Acids Res., 37, D755–D761. Nucleic Acids Res., 40, D180–D186. 39. Hubbard,T.J., Aken,B.L., Ayling,S., Ballester,B., Beal,K., 20. Jalali,S., Bhartiya,D., Lalwani,M.K., Sivasubbu,S. and Scaria,V. Bragin,E., Brent,S., Chen,Y., Clapham,P., Clarke,L. et al. (2009) (2013) Systematic transcriptome wide analysis of lncRNA-miRNA Ensembl 2009. Nucleic Acids Res., 37, D690–D697. interactions. PLoS One, 8, e53823. 40. Hsu,S.D., Lin,F.M., Wu,W.Y., Liang,C., Huang,W.C., 21. Paraskevopoulou,M.D., Georgakilas,G., Kostoulas,N., Reczko,M., Chan,W.L., Tsai,W.T., Chen,G.Z., Lee,C.J., Chiu,C.M. et al. Maragkakis,M., Dalamagas,T.M. and Hatzigeorgiou,A.G. (2013) (2011) miRTarBase: a database curates experimentally validated DIANA-LncBase: experimentally verified and computationally microRNA-target interactions. Nucleic Acids Res., 39, predicted microRNA targets on long non-coding RNAs. Nucleic D163–DD169. Acids Res., 41, D239–D245. 41. Dooley,A.L., Winslow,M.M., Chiang,D.Y., Banerji,S., 22. Barrett,T., Wilhite,S.E., Ledoux,P., Evangelista,C., Kim,I.F., Stransky,N., Dayton,T.L., Snyder,E.L., Senna,S., Whittaker,C.A., Tomashevsky,M., Marshall,K.A., Phillippy,K.H., Sherman,P.M., Bronson,R.T. et al. (2011) Nuclear factor I/B is an oncogene in Holko,M. et al. (2013) NCBI GEO: archive for functional small cell lung cancer. Gene Dev., 25, 1470–1475. genomics data sets—update. Nucleic Acids Res., 41, D991–D995. 42. Moon,H.G., Hwang,K.T., Kim,J.A., Kim,H.S., Lee,M.J., 23. Corcoran,D.L., Georgiev,S., Mukherjee,N., Gottwein,E., Jung,E.M., Ko,E., Han,W. and Noh,D.Y. (2011) NFIB is a Skalsky,R.L., Keene,J.D. and Ohler,U. (2011) PARalyzer: potential target for estrogen receptor-negative breast cancers. Mol. definition of RNA binding sites from PAR-CLIP short-read Oncol., 5, 538–544. sequence data. Genome Biol., 12, R79. 43. Liu,K., Yan,Z., Li,Y. and Sun,Z. (2013) Linc2GO: a human 24. Meyer,L.R., Zweig,A.S., Hinrichs,A.S., Karolchik,D., Kuhn,R.M., LincRNA function annotation resource based on ceRNA Wong,M., Sloan,C.A., Rosenbloom,K.R., Roe,G., Rhead,B. et al. hypothesis. Bioinformatics, 29, 2221–2222. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nucleic Acids Research Oxford University Press

starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein–RNA interaction networks from large-scale CLIP-Seq data

Loading next page...
 
/lp/oxford-university-press/starbase-v2-0-decoding-mirna-cerna-mirna-ncrna-and-protein-rna-5a1wm00IO8

References (48)

Publisher
Oxford University Press
Copyright
© The Author(s) 2013. Published by Oxford University Press.
ISSN
0305-1048
eISSN
1362-4962
DOI
10.1093/nar/gkt1248
pmid
24297251
Publisher site
See Article on Publisher Site

Abstract

D92–D97 Nucleic Acids Research, 2014, Vol. 42, Database issue Published online 1 December 2013 doi:10.1093/nar/gkt1248 starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein–RNA interaction networks from large-scale CLIP-Seq data Jun-Hao Li, Shun Liu, Hui Zhou, Liang-Hu Qu* and Jian-Hua Yang* RNA Information Center, Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory for Biocontrol, Sun Yat-sen University, Guangzhou 510275, PR China Received September 15, 2013; Revised October 28, 2013; Accepted November 9, 2013 (miRNAs), long non-coding RNAs (lncRNAs), pseudo- ABSTRACT genes and circular RNAs (circRNAs). These RNA mol- Although microRNAs (miRNAs), other non-coding ecules are emerging as key regulators of diverse cellular RNAs (ncRNAs) (e.g. lncRNAs, pseudogenes and processes, including proliferation, apoptosis, differenti- circRNAs) and competing endogenous RNAs ation and the cell cycle (1–7). (ceRNAs) have been implicated in cell-fate determin- Although many studies that address these ncRNAs have ation and in various human diseases, surprisingly little focused on defining their protein-coding gene regulatory functions, increasing numbers of researchers are assessing is known about the regulatory interaction networks the regulatory interactions between ncRNAs classes, as well among the multiple classes of RNAs. In this study, as the relationships between RNA-binding proteins (RBP) we developed starBase v2.0 (http://starbase.sysu. and ncRNAs. Several well-characterized lncRNAs (e.g. edu.cn/) to systematically identify the RNA–RNA and HOTAIR) exert their functions cooperatively with RBPs protein–RNA interaction networks from 108 CLIP-Seq (e.g. EZH2) in cancers (8,9). Multiple classes of ncRNAs (PAR-CLIP, HITS-CLIP, iCLIP, CLASH) data sets (lncRNAs, circRNAs, pseudogenes) and protein-coding generated by 37 independent studies. By analyzing mRNAs function as key competing endogenous RNAs millions of RNA-binding protein binding sites, we (ceRNAs) and ‘super-sponges’ to regulate the expression identified 9000 miRNA-circRNA, 16 000 miRNA- of mRNAs in plants and mammalian cells (4,6,7,10–14). pseudogene and 285 000 protein–RNA regulatory rela- However, the understanding of ceRNA mechanisms and tionships. Moreover, starBase v2.0 has been updated its consequences are in their infancy, and further experi- to provide the most comprehensive CLIP-Seq experi- mental evidences and large-scale bioinformatic efforts for ceRNAs are needed. Despite these intriguing studies of in- mentally supported miRNA-mRNA and miRNA- dividual miRNA-ncRNA and protein–RNA interactions, lncRNA interaction networks to date. We identified generalizing these findings to thousands of RNAs remains 10 000 ceRNA pairs from CLIP-supported miRNA a daunting challenge. target sites. By combining 13 functional genomic Recent advances in high-throughput sequencing of annotations, we developed miRFunction and immunoprecipitated RNAs after cross-linking (CLIP- ceRNAFunction web servers to predict the function Seq, HITS-CLIP, PAR-CLIP, CLASH, iCLIP) provide of miRNAs and other ncRNAs from the miRNA- powerful ways to identify biologically relevant miRNA- mediated regulatory networks. Finally, we developed target and RBP–RNA interactions (5,15,16). The applica- interactive web implementations to provide visualiza- tion of CLIP-Seq methods has reliably identified tion, analysis and downloading of the aforementioned Argonaute (Ago) and other RBP binding sites (5,15,16). We and others have used CLIP-seq data generated from large-scale data sets. This study will greatly expand HEK293 cells to characterize miRNA-mRNA and our understanding of ncRNA functions and their miRNA-lncRNA interactions (17–21). With the increasing coordinated regulatory networks. amount of CLIP-Seq data available, there is a great need to integrate these large-scale data sets to explore the INTRODUCTION miRNA-pseudogene, miRNA-circRNA and protein– Eukaryotic genomes encode thousands of short and long RNA interactions and to further construct ceRNA regu- non-coding RNAs (ncRNAs), such as microRNAs latory networks involving mRNAs and ncRNAs. *To whom correspondence should be addressed. Tel: +86 20 84112399; Fax: +86 20 84036551; Email: lssqlh@mail.sysu.edu.cn Correspondence may also be addressed to Jian-Hua Yang. Tel: +86 20 84112517; Fax: +86 20 84036551; Email: yangjh7@mail.sysu.edu.cn The Author(s) 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2014, Vol. 42, Database issue D93 To facilitate the annotation, visualization, analysis and ce6/ce10 assemblies, respectively, by using the UCSC discovery of these interaction networks from large-scale LiftOver Tool (24). CLIP-Seq data, we have updated starBase (17) to version 2.0 (starBase v2.0) (Figure 1). In starBase v2.0, Ago CLIP-supported miRNA target prediction from we performed a large-scale integration of public RBP public database binding sites generated by high-throughput CLIP-Seq Conserved miRNA families were defined as those labeled technology and provided the most comprehensive RBP with ‘highly conserved’ or ‘conserved’ in TargetScan data set for various cell types that are presently available. Release 6.2 (25). miRNA IDs from miRBase Release 20 By analyzing millions of Ago and other RBP binding sites, were used (26). Genomic coordinates of these conserved we constructed the most comprehensive miRNA-lncRNA, miRNAs target sites predicted by TargetScan (25), miRNA-pseudogene, miRNA-circRNA, miRNA-mRNA miRanda/mirSVR (27), PITA (28), Pictar 2.0 (19) and and protein–RNA interaction networks. RNA22 (29) were collected and converted to hg19, mm9/ mm10 and ce6/ce10 assemblies using LiftOver, respectively. The resulting coordinates were intersected with the previ- MATERIALS AND METHODS ously described Ago CLIP clusters using BEDTools v2.16.2 Integration of Ago and other RBP binding sites from (30). The target sites that overlap with any entry of the Ago published CLIP data CLIP clusters were considered as CLIP-supported sites. HITS-CLIP, PAR-CLIP, iCLIP and CLASH data were MicroRNA target scanning in annotated transcripts retrieved from the Gene Expression Omnibus (22), the supplementary data of original references or directly Human gene annotations were acquired from GENCODE from authors on request (Supplementary Table S1). v17 (31). Protein-coding transcripts were defined as those Although Ago PAR-CLIP raw data were preprocessed with ‘protein_coding’ gene biotype and ‘protein_coding’ with the FASTX-Toolkit v0.0.13 and reanalyzed using transcript biotype. The lncRNAs transcripts were PARalyzer v1.1 (23), other CLIP-identified binding defined as those with ‘processed_transcript’, ‘lincRNA’, sites clusters/peaks were used directly. All binding sites ‘3prime_overlapping_ncrna’, ‘antisense’, ‘non_coding’, coordinates were converted to hg19, mm9/mm10 and ‘sense_intronic’ or ‘sense_overlapping’ gene biotype. Technologies(108 datasets from 37 studies) miRNA-target interactions HITS-CLIP (47 datasets) PAR-CLIP (51 datasets) miRNA-mRNA interactions miRNA-lncRNA interactions iCLIP (9 datasets) CLASH (1 datasets) miRNA-pseudogene interactions miRNA-circRNA interactions mapping and predict ceRNAs peak identification RBP Binding Sites miRNA-mediated ceRNA Networks miRNAs common miRNAs miRNAs predict miRNA targets predict gene function Functional Annotations miRNA-mRNA interactions Ago Enrichment P-value and False Discovery Rate (FDR) Gene Ontology: miRNA (1)Biological Process (2)Molecular Function (3)Cell Component miRNA-ncRNA (lncRNAs, pseudogenes, circRNAs) function Pathways: Ago (4)KEGG Pathways (5)PANTHER Pathways (6)Biocarta Pathways (7)Reactome Pathways (8)All Canonical Pathways Protein-RNA interactions MSigDB and other resources: RBP RBP (9)Cancer Gene Neighborhoods (10)Oncogenic Signatures (11)Chemical or Genetic Perturbations (12)Disease Ontology (13)Transcription Factor Targets Figure 1. A system-level overview of the starBase v2.0 core framework. A total of 108 data sets of CLIP-seq experiments were compiled to achieve various RBP target sites. Interactions between miRNAs and target genes were predicted and used to construct miRNA-mediated ceRNA networks. Functional predictions of miRNAs and associated genes were achieved by enrichment analysis of 13 functional genomic annotations. All results generated by starBase were deposited in MySQL relational databases and displayed in the visual browser and web pages. D94 Nucleic Acids Research, 2014, Vol. 42, Database issue Small non-coding RNA (sncRNA) transcripts were Other RBP binding sites in annotated transcripts defined as those with ‘snRNA’, ‘snoRNA’, ‘rRNA’, The aforementioned RBP CLIP clusters were used to ‘Mt_tRNA’, ‘Mt_rRNA’, ‘misc_RNA’ or ‘miRNA’ gene intersect with the coordinates of all annotated transcripts biotype. Pseudogene transcripts were defined as those with to find their RBP binding sites. ‘polymorphic_pseudogene’, ‘pseudogene’, ‘IG_C_ pseudogene’, ‘IG_J_pseudogene’, ‘IG_V_pseudogene’, Other annotation data sets ‘TR_V_pseudogene’ or ‘TR_J_pseudogene’ gene biotype. All refSeq genes were downloaded from the UCSC bio- Mouse and Caenorhabditis elegans gene annotations informatics Web sites (38). Other known ncRNAs were were extracted from Ensembl Gene Release 72 and downloaded from the Ensembl database (39) or the LiftOver to mm9/mm10 and ce6/ce10, respectively. UCSC Web sites (38) The human (UCSC hg19), mouse Protein-coding, lncRNAs, sncRNAs and pseudogenes (UCSC mm9/mm10) and C. elegans (UCSC ce6/ce10) were classified using a similar method. Human, mouse genome sequences were downloaded from the UCSC and C. elegans circRNA annotations were downloaded bioinformatics Web sites (38). from circBase v0.1 (6). These transcripts were scanned to find conserved miRNAs target sites using miRanda v3.3a with the DATABASE CONTENT ‘-strict’ parameter. The target sites that overlap with any The genome-wide binding map of Ago and other RBPs entry of the aforementioned AGO CLIP clusters were con- sidered as the CLIP-supported target sites. To depict a comprehensive binding map of Ago and other RBP, we integrated 108 published CLIP-seq data generated from various tissues or cell lines under different Identification of ceRNA pairs with hypergeometric test treatments in 37 independent studies (detailed in ‘Materials and Methods’, Supplementary Table S1). For A hypergeometric test (14) is executed for each ceRNA the Ago protein, a total of 1 007 618, 26 833 and 4842 pair separately, which is defined by four parameters: (i) unique binding site clusters were compiled in human, N is the total number of miRNAs used to predict targets; mouse and C. elegans, respectively (Table 1). These (ii) K is the number of miRNAs that interact with the clusters were used in the following analysis to obtain chosen gene of interest; (iii) n is the number of miRNAs CLIP-supported miRNA target sites of high confidence. that interact with the candidate ceRNA of the chosen Millions of binding site clusters of 42 other RBPs were gene; and (iv) c is the common miRNA number between also achieved (Supplementary Table S1 and Table 1). these two genes. The test calculates the P-value by using the following formula: The annotation and identification of miRNA-mRNA and miRNA-ncRNA interactions K N  K minðK,nÞ To inspect genome-wide interactions between miRNAs i n  i P ¼ and their target genes, we retrieved the conserved miRNA target sites predicted by five algorithms i¼c (TargetScan, miRanda, Pictar2, PITA and RNA22) from public databases, which were intersected with the Multiple miRNAs belonging to the same family were aforementioned Ago CLIP clusters to gain CLIP-sup- combined into one, and the hypergeometric test counted ported sites. Using this approach, we characterized every miRNA family only once, even if it had 500 000 interactions between 818 conserved miRNAs multiple binding sites at the same 3 -UTR of protein- and 20 480 protein-coding genes. coding genes or transcript of non-coding genes. All We also investigated the potential regulatory relation- P-values were subject to false discovery rate (FDR) cor- ships between miRNAs and non-coding RNAs. We rection (32). performed conserved miRNA target site scanning on the transcripts of lncRNAs, sncRNAs, pseudogenes and circRNAs using miRanda, and filtered the resulting can- Enrichment analysis for functional terms didates with the previously described Ago CLIP clusters. Although less than CLIP-supported miRNA-mRNA GO ontology data (33) for the NCBI RefSeq genes were interactions, the thousands of CLIP-supported miRNA- downloaded from the NCBI ftp site. The Kyoto ncRNA interactions suggested that miRNAs might Encyclopedia of Genes and Genomes (KEGG) pathways regulate other ncRNAs as well. (34) were downloaded from the KEGG database. The protein analysis through evolutionary relationships The annotation and identification of miRNA-mediated (PANTHER) pathways was downloaded from the ceRNA regulatory networks PANTHER database (35). The Reactome (36) and other pathways were downloaded from the molecular To construct and characterize the miRNA-mediated signatures database (MSigDB) (37). Enrichment ana- ceRNA network, a workflow was developed to identify lysis for these pathways in the data set was determined the ceRNA pairs (Figure 1). First, CLIP-supported using a hypergeometric test with Bonferroni and FDR miRNA-mRNA, miRNA-lncRNA, miRNA-circRNA correction (32). and miRNA-pseudogene interactions were combined. Nucleic Acids Research, 2014, Vol. 42, Database issue D95 Table 1. The data sets that are incorporated into starBase v2.0 Species Experiments RBPs Cell lines/ ABSs RBSs miRNA-mRNA miRNA-ncRNA ceRNA protein–RNA tissues Human 85 36 18 1 007 618 8 206 884 423 975 35 459 11 439 242 017 Mouse 21 11 16 26 833 1 857 199 64 749 234 829 51 542 C. elegans 2 2 2 4842 1360 12 883 140 2 411 These statistics show the numbers of sequencing experiments (CLIP-Seq), RNA-binding proteins (RBPs) covered in these experiments, cell lines or tissues used in these experiments, Ago binding sites (ABSs), other RNA-binding protein binding sites (RBSs), miRNA-mRNA interactions, miRNA- ncRNA interactions, ceRNA pairs and protein–RNA interactions that are incorporated into starBase. These data are from three organisms: human (hg19), mouse (mm9) and C. elegans (ce6). Next, hypergeometric test was used to predict ceRNA microRNAs’ is required. In contrast from the options pairs among mRNAs, lncRNAs, circRNAs and pseudo- earlier in text, it allows users to select one or more genes. Finally, all ceRNA pairs with FDR<0.05 were miRNAs in the drop-down list. The results page shows imported into mySQL database and displayed in a web the enrichment analysis for 13 functional prediction page. In this study, we identified approximately 10 000 categories. The running parameters, selected miRNA ceRNA pairs from CLIP-supported miRNA target sites. target genes and every outcome in the 13 categories of Surprisingly, many nodes of ceRNA networks are function prediction are available for users to download. lncRNAs, circRNAs and pseudogenes. Several experimen- For ceRNA functions prediction, the query page also tally validated ceRNAs were recaptured in our starBase presents users five options and gene symbol is required v2.0, e.g. PTEN ceRNA: DCBLD2 (P< 0.001), JARID2 to enter. The results page is similar to the miRNA function predictions page. (P< 0.005), LRCH1 (P< 0.00005), TNRC6A (P< 0.00005) (12–14). EXAMPLE APPLICATIONS WEB INTERFACE In the following section, example applications of starBase v2.0 are illustrated. The web-based exploration of miRNA-mRNA, miRNA-ncRNA and protein–RNA regulatory relationships The targetome of hsa-miR-21-5p Multiple web interfaces are applied to display three types Assume that we are interested in the targetome of hsa- of regulatory relationships. As an example, we explore miR-21-5p. Given the constraints requiring target sites miRNA-mRNA interactions to introduce the platform to have a number of supporting experiments no less application. In the query page of miRNA-mRNA inter- than one and to be predicted by at least three of the five actions, users can enter a gene symbol and select one programs (Supplementary Figure S2A), our platform miRNA to browse their relationships. The number of sup- returns 173 CLIP-supported hsa-miR-21-5p target sites porting experiments can be adjusted to control the strin- in 155 protein-coding genes among which ZNF367, gency of the predictions. All relationships will be RHOB and PELI1 rank top three by read numbers displayed in the results page if users do not submit a con- (Supplementary Figure S2B). These results coincide with straint. Once users click on a non-zero number in the data in an experimentally validated database named table, more details are shown. For instance, users can miRTarBase (40), suggesting that these genes are likely click the target location to link to the deepView genome targeted by hsa-miR-21-5p. browser (17) and view data across the entire genome (Supplementary Figure S1). More information about the The identification of ‘super-sponges’ of miRNAs platform application is described in the relevant web Inspired by the observation that CDR1as circRNA (6,7) interfaces. acts as a miR-7 super-sponge that contains multiple target sites from the same miRNA at the same transcript or The web-based exploration of ceRNA regulatory networks 3 -UTR, we tested whether the other class of ncRNAs In the query page, users enter a gene symbol that is used and protein-coding genes hosted in our database also for ceRNAs prediction. The option of ‘minimum common can act as miRNA super-sponges. We can recapitulate miRNA number’ denotes the minimum number of the known CDR1as circRNA as a miR-7 super-sponge miRNAs shared by the input gene and its ceRNA candi- using the miRNA-circRNA interactions web page dates. In the results page, users can click on one of the (Supplementary Figure S3A and B). The results page tabular FDR values for details. sorted by the number of miRNA target sites showed that CDR1as circRNA contains 52 miR-7 target sites The web-based functional annotation of genes from overlapped with CLIP-Seq data (Supplementary Figure miRNA-mediated regulatory networks S3B). The same strategy was applied to search potential For miRNA function predictions, there are five options on super-sponges among mRNAs, lncRNAs and pseudo- the query page, and the option ‘Select one or multiple genes, resulting in tens of candidates, such as XIST and D96 Nucleic Acids Research, 2014, Vol. 42, Database issue HOXD-AS1 lncRNA genes and ONECUT2 and CDK6 cancer genomics data from the Gene Expression Omnibus mRNAs (Supplementary Figure S3C). (GEO), The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) into ceRNAs of the oncogenes starBase to improve our understanding of miRNA- mediated regulatory networks in developmental, physio- Recently, the ceRNA hypothesis has been proposed (3) logical and pathological processes. and efforts have been made to decipher the roles of ceRNA cross talk in regulating cancer-associated genes such as tumor suppressor PTEN (12–14). Requiring a AVAILABILITY minimum common miRNA number of ten and a FDR threshold of 0.05 on the ‘ceRNA Network’ web page starBase v2.0 is freely available at http://starbase.sysu.edu. (Supplementary Figure S4A), our platform produced a cn/. The starBase data files can be downloaded and used in ceRNA network involving PTEN and 123 genes, among accordance with the GNU Public License and the license which the published PTEN ceRNAs LRCH1, TNRC6A of primary data sources. and SMAD5 (12–14) were recaptured (Supplementary Figure S4B). SUPPLEMENTARY DATA We were also able to predict a batch of other cancer- associated genes that were entangled within the highly Supplementary Data are available at NAR Online. sophisticated networks of ceRNAs. For example, NFIB, an oncogene upregulated in small cell lung cancer (41) and estrogen receptor-negative breast cancer (42), was found ACKNOWLEDGEMENTS in multiple ceRNA pairs with other well-described cancer- Ministry of Science and Technology of China, National related genes, such as MLL and KDSR (Supplementary Basic Research Program [No 2011CB811300]; the Figure S4C). National Natural Science Foundation of China [31230042, 30900820, 81070589, 31370791]; the funds CONCLUSIONS from Guangdong Province [S2012010010510, S2013010012457]; Project of Science and Technology By analyzing a large set of Ago and RBP binding sites New Star in ZhuJiang Guangzhou city [2012J2200025]; derived from all available CLIP-Seq experimental tech- Fundamental Research Funds for the Central niques (PAR-CLIP, HITS-CLIP, iCLIP, CLASH), we Universities [2011330003161070]; China Postdoctoral have shown extensive and complex RNA–RNA and Science Foundation [200902348]. This research is sup- protein–RNA interaction networks. ported in part by the Guangdong Province Key Compared with the previous version of starBase (v1.0) Laboratory of Computational Science and the and other databases, the distinctive features of starBase Guangdong Province Computational Science Innovative v2.0 include the following: (i) starBase v2.0 is the first Research Team. database that provides the miRNA-pseudogene inter- action networks; (ii) starBase v2.0 drafts the first inter- action maps between miRNAs and circRNAs; (iii) FUNDING unlike other databases or tools (12,14,43) that predict Funding for open access charge: Ministry of Science and ceRNA regulatory networks using computationally pre- Technology of China, National Basic Research Program dicted miRNA targets, starBase v2.0 provides an [No. 2011CB811300]. enhanced resolution to determine ceRNA functional networks based on miRNA-target interactions Conflict of interest statement. None declared. overlapping with high-throughput CLIP-Seq data; (iv) starBase v2.0 provides the most comprehensive miRNA- lncRNA interactions to date; and (v) starBase v2.0 REFERENCES provides a variety of interfaces and graphic visualizations to facilitate analysis of the massive and heterogeneous 1. Batista,P.J. and Chang,H.Y. (2013) Long noncoding RNAs: cellular address codes in development and disease. Cell, 152, CLIP-Seq, RBP binding sites, miRNA targets and 1298–1307. ceRNA regulatory networks in normal tissues and 2. Yates,L.A., Norbury,C.J. and Gilbert,R.J. (2013) The long and cancer cells. short of microRNA. Cell, 153, 516–519. 3. Salmena,L., Poliseno,L., Tay,Y., Kats,L. and Pandolfi,P.P. (2011) A ceRNA hypothesis: the rosetta stone of a hidden RNA FUTURE DIRECTIONS language? Cell, 146, 353–358. 4. Poliseno,L., Salmena,L., Zhang,J., Carver,B., Haveman,W.J. and As CLIP-Seq technology is applied to a broader set of Pandolfi,P.P. (2010) A coding-independent function of gene and species, cell lines, tissues, conditions and RBPs, we will pseudogene mRNAs regulates tumour biology. Nature, 465, 1033–1038. continuously maintain and update the database. starBase 5. Konig,J., Zarnack,K., Luscombe,N.M. and Ule,J. (2011) Protein- will continue to expand the storage space and improve the RNA interactions: new genomic technologies and perspectives. computer server performance for storing and analyzing Nat. Rev. Genet., 13, 77–83. these new data, and improve the database to accept new 6. Memczak,S., Jens,M., Elefsinioti,A., Torti,F., Krueger,J., user data uploads. In addition, we intend to integrate the Rybak,A., Maier,L., Mackowiak,S.D., Gregersen,L.H., Nucleic Acids Research, 2014, Vol. 42, Database issue D97 Munschauer,M. et al. (2013) Circular RNAs are a large class of (2013) The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res., 41, D64–D69. animal RNAs with regulatory potency. Nature, 495, 333–338. 25. Grimson,A., Farh,K.K., Johnston,W.K., Garrett-Engele,P., 7. Hansen,T.B., Jensen,T.I., Clausen,B.H., Bramsen,J.B., Finsen,B., Lim,L.P. and Bartel,D.P. (2007) MicroRNA targeting specificity Damgaard,C.K. and Kjems,J. (2013) Natural RNA circles in mammals: determinants beyond seed pairing. Mol. Cell, 27, function as efficient microRNA sponges. Nature, 495, 384–388. 8. Gupta,R.A., Shah,N., Wang,K.C., Kim,J., Horlings,H.M., 91–105. 26. Kozomara,A. and Griffiths-Jones,S. (2011) miRBase: integrating Wong,D.J., Tsai,M.C., Hung,T., Argani,P., Rinn,J.L. et al. (2010) Long non-coding RNA HOTAIR reprograms chromatin state to microRNA annotation and deep-sequencing data. Nucleic Acids Res., 39, D152–D157. promote cancer metastasis. Nature, 464, 1071–1076. 27. Betel,D., Koppal,A., Agius,P., Sander,C. and Leslie,C. (2010) 9. Khalil,A.M., Guttman,M., Huarte,M., Garber,M., Raj,A., Rivea Comprehensive modeling of microRNA targets predicts Morales,D., Thomas,K., Presser,A., Bernstein,B.E., van functional non-conserved and non-canonical sites. Genome Biol., Oudenaarden,A. et al. (2009) Many human large intergenic 11, R90. noncoding RNAs associate with chromatin-modifying complexes 28. Kertesz,M., Iovino,N., Unnerstall,U., Gaul,U. and Segal,E. (2007) and affect gene expression. Proc. Natl Acad. Sci. USA, 106, The role of site accessibility in microRNA target recognition. 11667–11672. Nat. Genet., 39, 1278–1284. 10. Cesana,M., Cacchiarelli,D., Legnini,I., Santini,T., Sthandier,O., 29. Miranda,K.C., Huynh,T., Tay,Y., Ang,Y.S., Tam,W.L., Chinappi,M., Tramontano,A. and Bozzoni,I. (2011) A long Thomson,A.M., Lim,B. and Rigoutsos,I. (2006) A pattern-based noncoding RNA controls muscle differentiation by functioning as method for the identification of MicroRNA binding sites and a competing endogenous RNA. Cell, 147, 358–369. their corresponding heteroduplexes. Cell, 126, 1203–1217. 11. Franco-Zorrilla,J.M., Valli,A., Todesco,M., Mateos,I., Puga,M.I., 30. Quinlan,A.R. and Hall,I.M. (2010) BEDTools: a flexible suite Rubio-Somoza,I., Leyva,A., Weigel,D., Garcia,J.A. and of utilities for comparing genomic features. Bioinformatics, 26, Paz-Ares,J. (2007) Target mimicry provides a new mechanism for 841–842. regulation of microRNA activity. Nat. Genet., 39, 1033–1037. 31. Harrow,J., Frankish,A., Gonzalez,J.M., Tapanari,E., 12. Tay,Y., Kats,L., Salmena,L., Weiss,D., Tan,S.M., Ala,U., Diekhans,M., Kokocinski,F., Aken,B.L., Barrell,D., Zadissa,A., Karreth,F., Poliseno,L., Provero,P., Di Cunto,F. et al. (2011) Searle,S. et al. (2012) GENCODE: the reference human genome Coding-independent regulation of the tumor suppressor PTEN by annotation for The ENCODE Project. Genome Res., 22, competing endogenous mRNAs. Cell, 147, 344–357. 1760–1774. 13. Karreth,F.A., Tay,Y., Perna,D., Ala,U., Tan,S.M., Rust,A.G., 32. Benjamini,Y. and Hochberg,Y. (1995) Controlling the false DeNicola,G., Webster,K.A., Weiss,D., Perez-Mancera,P.A. et al. discovery rate: a practical and powerful approach to multiple (2011) In vivo identification of tumor- suppressive PTEN ceRNAs testing. J. R. Stat. Soc. Ser. B, 57, 289–300. in an oncogenic BRAF-induced mouse model of melanoma. Cell, 33. Ashburner,M., Ball,C.A., Blake,J.A., Botstein,D., Butler,H., 147, 382–395. Cherry,J.M., Davis,A.P., Dolinski,K., Dwight,S.S., Eppig,J.T. 14. Sumazin,P., Yang,X., Chiu,H.S., Chung,W.J., Iyer,A., Llobet- et al. (2000) Gene Ontology: tool for the unification of biology. Navas,D., Rajbhandari,P., Bansal,M., Guarnieri,P., Silva,J. et al. Nat. Genet., 25, 25–29. (2011) An extensive microRNA-mediated network of RNA-RNA 34. Kanehisa,M., Goto,S., Sato,Y., Furumichi,M. and Tanabe,M. interactions regulates established oncogenic pathways in (2012) KEGG for integration and interpretation of glioblastoma. Cell, 147, 370–381. large-scale molecular data sets. Nucleic Acids Res., 40, 15. Darnell,R.B. (2010) HITS-CLIP: panoramic views of protein- D109–DD114. RNA regulation in living cells. Wiley Interdiscip. Rev. RNA, 1, 35. Mi,H., Lazareva-Ulitsky,B., Loo,R., Kejariwal,A., Vandergriff,J., 266–286. Rabkin,S., Guo,N., Muruganujan,A., Doremieux,O., 16. Ascano,M., Hafner,M., Cekan,P., Gerstberger,S. and Tuschl,T. Campbell,M.J. et al. (2005) The PANTHER database of protein (2012) Identification of RNA-protein interaction networks using families, subfamilies, functions and pathways. Nucleic Acids Res., PAR-CLIP. Wiley Interdiscip. Rev. RNA, 3, 159–177. 33, D284–D288. 17. Yang,J.H., Li,J.H., Shao,P., Zhou,H., Chen,Y.Q. and Qu,L.H. 36. Matthews,L., Gopinath,G., Gillespie,M., Caudy,M., Croft,D., (2011) starBase: a database for exploring microRNA-mRNA de Bono,B., Garapati,P., Hemish,J., Hermjakob,H., Jassal,B. interaction maps from Argonaute CLIP-Seq and Degradome-Seq et al. (2009) Reactome knowledgebase of human biological data. Nucleic Acids Res., 39, D202–D209. pathways and processes. Nucleic Acids Res., 37, D619–D622. 18. Khorshid,M., Rodak,C. and Zavolan,M. (2011) CLIPZ: a 37. Liberzon,A., Subramanian,A., Pinchback,R., Thorvaldsdottir,H., database and analysis environment for experimentally determined Tamayo,P. and Mesirov,J.P. (2011) Molecular signatures database binding sites of RNA-binding proteins. Nucleic Acids Res., 39, (MSigDB) 3.0. Bioinformatics, 27, 1739–1740. D245–D252. 38. Kuhn,R.M., Karolchik,D., Zweig,A.S., Wang,T., Smith,K.E., 19. Anders,G., Mackowiak,S.D., Jens,M., Maaskola,J., Kuntzagk,A., Rosenbloom,K.R., Rhead,B., Raney,B.J., Pohl,A., Pheasant,M. Rajewsky,N., Landthaler,M. and Dieterich,C. (2012) doRiNA: a et al. (2009) The UCSC Genome Browser Database: update 2009. database of RNA interactions in post-transcriptional regulation. Nucleic Acids Res., 37, D755–D761. Nucleic Acids Res., 40, D180–D186. 39. Hubbard,T.J., Aken,B.L., Ayling,S., Ballester,B., Beal,K., 20. Jalali,S., Bhartiya,D., Lalwani,M.K., Sivasubbu,S. and Scaria,V. Bragin,E., Brent,S., Chen,Y., Clapham,P., Clarke,L. et al. (2009) (2013) Systematic transcriptome wide analysis of lncRNA-miRNA Ensembl 2009. Nucleic Acids Res., 37, D690–D697. interactions. PLoS One, 8, e53823. 40. Hsu,S.D., Lin,F.M., Wu,W.Y., Liang,C., Huang,W.C., 21. Paraskevopoulou,M.D., Georgakilas,G., Kostoulas,N., Reczko,M., Chan,W.L., Tsai,W.T., Chen,G.Z., Lee,C.J., Chiu,C.M. et al. Maragkakis,M., Dalamagas,T.M. and Hatzigeorgiou,A.G. (2013) (2011) miRTarBase: a database curates experimentally validated DIANA-LncBase: experimentally verified and computationally microRNA-target interactions. Nucleic Acids Res., 39, predicted microRNA targets on long non-coding RNAs. Nucleic D163–DD169. Acids Res., 41, D239–D245. 41. Dooley,A.L., Winslow,M.M., Chiang,D.Y., Banerji,S., 22. Barrett,T., Wilhite,S.E., Ledoux,P., Evangelista,C., Kim,I.F., Stransky,N., Dayton,T.L., Snyder,E.L., Senna,S., Whittaker,C.A., Tomashevsky,M., Marshall,K.A., Phillippy,K.H., Sherman,P.M., Bronson,R.T. et al. (2011) Nuclear factor I/B is an oncogene in Holko,M. et al. (2013) NCBI GEO: archive for functional small cell lung cancer. Gene Dev., 25, 1470–1475. genomics data sets—update. Nucleic Acids Res., 41, D991–D995. 42. Moon,H.G., Hwang,K.T., Kim,J.A., Kim,H.S., Lee,M.J., 23. Corcoran,D.L., Georgiev,S., Mukherjee,N., Gottwein,E., Jung,E.M., Ko,E., Han,W. and Noh,D.Y. (2011) NFIB is a Skalsky,R.L., Keene,J.D. and Ohler,U. (2011) PARalyzer: potential target for estrogen receptor-negative breast cancers. Mol. definition of RNA binding sites from PAR-CLIP short-read Oncol., 5, 538–544. sequence data. Genome Biol., 12, R79. 43. Liu,K., Yan,Z., Li,Y. and Sun,Z. (2013) Linc2GO: a human 24. Meyer,L.R., Zweig,A.S., Hinrichs,A.S., Karolchik,D., Kuhn,R.M., LincRNA function annotation resource based on ceRNA Wong,M., Sloan,C.A., Rosenbloom,K.R., Roe,G., Rhead,B. et al. hypothesis. Bioinformatics, 29, 2221–2222.

Journal

Nucleic Acids ResearchOxford University Press

Published: Jan 30, 2014

There are no references for this article.