Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

LncPheDB: a genome-wide lncRNAs regulated phenotypes database in plants

LncPheDB: a genome-wide lncRNAs regulated phenotypes database in plants aBIOTECH (2022) 3:169–177 https://doi.org/10.1007/s42994-022-00084-3 aBIOTECH BRIEF COMMUNICATION LncPheDB: a genome-wide lncRNAs regulated phenotypes database in plants 1 1 1 1 2 1 Danjing Lou , Fei Li , Jinyue Ge , Weiya Fan , Ziran Liu , Yanyan Wang , 1 1 1 1 Jingfen Huang , Meng Xing , Wenlong Guo , Shizhuang Wang , 1,3 1 1,4,5 1,3& Weihua Qiao , Zhenyun Han , Qian Qian , Qingwen Yang , 1,3,6& Xiaoming Zheng National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China College of Life Science, Shenyang Normal University, Shenyang 110034, China National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya 572000, China State Key Laboratory of Rice Biology, China National Rice Research Institute, Chinese Academy of Agricultural Sciences, Hangzhou 310006, China Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China International Rice Research Institute, DAPO box 7777 Metro Manila, The Philippines Received: 30 June 2022 / Accepted: 12 September 2022 / Published online: 5 October 2022 Abstract LncPheDB (https://www.lncphedb.com/) is a systematic resource of genome-wide long non-coding RNAs (lncRNAs)-phenotypes associations for multiple species. It was established to display the gen- ome-wide lncRNA annotations, target genes prediction, variant-trait associations, gene-phenotype correlations, lncRNA-phenotype correlations, and the similar non-coding regions of the queried sequence in multiple species. LncPheDB sorted out a total of 203,391 lncRNA sequences, 2000 phe- notypes, and 120,271 variants of nine species (Zea mays L., Gossypium barbadense L., Triticum aestivum L., Lycopersicon esculentum Mille, Oryza sativa L., Hordeum vulgare L., Sorghum bicolor L., Glycine max L., and Cucumis sativus L.). By exploring the relationship between lncRNAs and the genomic position of variants in genome-wide association analysis, a total of 68,862 lncRNAs were found to be related to the diversity of agronomic traits. More importantly, to facilitate the study of the functions of lncRNAs, we analyzed the possible target genes of lncRNAs, constructed a blast tool for performing similar frag- mentation studies in all species, linked the pages of phenotypic studies related to lncRNAs that possess similar fragments and constructed their regulatory networks. In addition, LncPheDB also provides a user-friendly interface, a genome visualization platform, and multi-level and multi-modal convenient data search engine. We believe that LncPheDB plays a crucial role in mining lncRNA-related plant data. Keywords LncRNA, GWAS, Phenotype, SNP, Plants INTRODUCTION LncRNAs are a class of non-coding RNAs that are more than 200 nucleotides in length. Initially, this type of RNA was once considered to be ‘‘junk’’ material in the gen- & Correspondence: yangqingwen@caas.cn (Q. Yang), ome. However, as the research continues, there is zhengxiaoming@caas.cn (X. Zheng) The Author(s) 2022 170 aBIOTECH (2022) 3:169–177 growing evidence that lncRNAs are key players in lncRNA genome position, sequence, and structure, the growth and development, metabolism and regulatory expression in tissues, and the query and visual display processes in a variety of organisms, particularly in of gene regulation networks. However, the database can mammals and humans (Kopp and Mendell 2018; Kung only perform a Basic Local Alignment Search Tool et al. 2013; Morris and Mattick 2014; Sun et al. 2018; (BLAST) analysis of single species. The CANTATAdb 2.0 Uchida and Dimmeler 2015;Wuetal. 2017). However, database (Szczes´niak et al. 2019), which contains the study of lncRNAs in plants remains in its infancy. lncRNAs of plants and algae, leverages on JBrowse, eFP Currently, it has been found in plants that lncRNAs not Browser, EPexplorer, and other analysis tools to search only play an important role in regulating growth and for the maximum peptide length, maximum expression developmental processes such as growth hormone level, number of lncRNA exons, and other information of transport and signal transduction in plants. It also plays lncRNAs in species. The GreeNC database (Gallart et al. an important role in improving crop yield (Wang et al. 2016) can extract the position, sequence, coding 2018), leaf distortion (Liu et al. 2018), plant fertility potential, folding energy, and other information of (Fang et al. 2019; Zhao et al. 2018), fruit fertility (Fan lncRNAs in various species; it can be used to perform a et al. 2016) and other important agronomic traits. But BLAST analysis of one or more species. Most of the the vast majority of lncRNA regulatory explorations databases constructed by researchers in the early days with clear mechanisms are nowadays performed in focused on some basic annotation information about the Arabidopsis thaliana. Our understanding of the mecha- sequence and position of lncRNAs. However, they lacked nisms regulating lncRNAs in crop species remains lim- comprehensive annotation information. In addition, ited. In addition, in recent years, transcriptome data very few databases could provide information about the have been used to carry out a large number of lncRNAs- correlation between lncRNAs and phenotypes, the sim- related studies (Katayama et al. 2005; Osato et al. 2003; ilarity of lncRNAs among multiple species and display Terryn and Rouze´ 2000; Wang et al. 2005; Zhang et al. the possible correlation between these similar frag- 2006, 2014; Zhu and Deng 2012). Studies have shown ments and phenotypes. The RiceLncPedia database that there are 32,397 lncRNAs in maize, 11,565 lncRNAs (Zhang et al. 2021), a newly built database, has com- in rice, and 12,577 lncRNAs in soybean (Jin et al. 2021). prehensive annotation information of lncRNAs. For It has also been revealed that lncRNAs are generally instance, the database collects multi-omics information, characterized by low expression, poor conservativeness such as quantitative trait locus, GWAS, transposons, and among different species, and tissue specificity (Derrien variant sites (SNPs). However, it only shows the et al. 2012; Cabili et al. 2011). These characteristics lncRNAs of rice, but no blast tool is available to study make the study of lncRNAs functions a herculean task. the similarity of lncRNAs among different species. At present, although a large number of lncRNAs have Therefore, it is necessary to build a database that been identified through transcriptome research, the explores the similarity of lncRNAs in multiple species lncRNAs whose functions have been further verified are and combines lncRNAs with GWAS. less than 1% (Quek et al. 2015). Furthermore, the In this study, we built a database containing the genome-wide association study (GWAS) of multiple lncRNAs information of nine common crops, including species revealed that 84% of trait-related variation loci Zea mays L., Gossypium barbadense L., Triticum aestivum are located in non-coding sequences (Cheetham et al. L., Lycopersicon esculentum Mille, Oryza sativa L., Hor- 2013). However, the non-coding regions in the genome deum vulgare L., Sorghum bicolor L., Glycine max L., and lack annotations and other relevant information. This Cucumis sativus L. The database provides information hinders our further research on the non-coding regions. about the sequence and position of lncRNAs, the dis- The lncRNAs database is a very good tool to facilitate tribution of lncRNAs in the genome, the population a detailed and accurate study of lncRNAs. In recent variation of lncRNAs, and the phenotypic traits that may years, a total of 20 plant-related lncRNA databases have be regulated, among others. In addition, the database been established. They have averaged a whopping 530 can also use the BLAST tool to investigate the conser- citations since publication. But most of these databases vativeness of target gene sequences in various species provide the basic information of lncRNAs in species and and the phenotypic conditions that may be regulated. target gene prediction according to transcriptome data. Our database is designed to further improve the anno- For instance, the PLncDB database (Jin et al. 2021) can tation information of lncRNAs in plants to further provide basic information about various plants, such as explore the possible functions of lncRNAs. The Author(s) 2022 aBIOTECH (2022) 3:169–177 171 MATERIALS AND METHODS correlation analysis data were removed. We found 497 articles with data that are significantly related to gen- Data collection and sorting ome-wide variation loci and phenotypic traits. Finally, 421 articles were further screened according to the P- –3 For the LncPheDB database, we selected nine important value (P \ 10 ) of significant GWAS data. In addition, model plants (including Zea mays L., Gossypium bar- the basic information of these articles is listed in Sup- badense L., Triticum aestivum L., Lycopersicon esculen- plemental Table S2. tum Mille, Oryza sativa L., Hordeum vulgare L., Sorghum To link the lncRNAs data with the GWAS result data, bicolor L., Glycine max L., and Cucumis sativus L.) with we used the BWA tool (version 0.7.17) to unify the SNPs great economic value and a high-quality reference gen- from GWAS data in each species and the reference ome. According to the data sequencing method and data genome from lncRNAs data in the same species into the sequencing depth, we extracted a total of 2324 RNA same reference genome. Afterward, we first mapped the sequencing (RNA-Seq) datasets from the National Cen- long segments according to the distance between SNPs ter for Biotechnology Information (NCBI) Sequence (The distance between variant sites was shorter than Read Archive (SRA) database (https://www.ncbi.nlm. the length of the region of linkage disequilibrium (LD)) nih.gov/sra/) (Supplemental Table S1). Using the SRA (Supplemental Table S3), and then amplified the map- toolkit (Version 2.8) under the Linux system, we first ped long segments according to the LD of each species, converted the extracted SRA file into Fastq format and if the lncRNAs and genes are within the incremental trimmed the adapter sequences using Trim Galore region, these lncRNAs are considered to regulate the (version 0.50) (https://www.bioinformatics.babraham. corresponding phenotype and are associated with ac.uk/projects/trim galore/) to obtain clean data. genes. At the same time, we also amplified a single site HIAST2 (Kim et al. 2015) was used to make a compar- in the GWAS results based on the length of the region of ison between the clean data and the reference genome; LD of each species, and based on the positional rela- afterward, the clean data were assembled with StringTie tionship between the gene or lncRNA and the amplified (Pertea et al. 2015). StringTie-merge was used to obtain segment, to determine the phenotypes that lncRNAs or the transcript set of each species. The transcripts were genes may regulate (Guttman and Rinn 2012; Guttman filtered out according to the following criteria: tran- et al. 2011; Huarte et al. 2010; Lee 2009; Martianov script length less than 200 base pairs and open reading et al. 2007; Nagano et al. 2008; Rinn and Chang 2012; frame greater than 120 amino acids. Finally, BLASTx Sleutels et al. 2002). was used to search the SWISS-PROT database to filtered out the transcripts that may encode small peptides with Implementation the parameters -e 1.0e-4-S 1. A comparison between the database and the Rfam database was performed to filter LncPheDB was implemented using PostgreSQL (https:// out tRNAs, rRNAs, sRNAs, and miRNAs. The transcripts www.postgresql.org; a powerful, open-source object- were collected after the filtering. The CPC (Kong et al. relational database system with over 30 years of active 2007), CREMA (Simopoulos et al. 2018), PLEK (Li et al. development that has earned it a strong reputation for 2014), and RNAplonc (Negri et al. 2019) programs were reliability, feature robustness, and performance) and used to calculate the protein-coding ability of tran- Django development server (https://docs.djangopro scripts, and the non-protein-coding transcripts detected ject.com/en/2.2/intro/tutorial01/#the-development- in at least two software were used as candidate lncRNAs server; a lightweight web server written purely in (Fig. 1B). In addition, to enrich lncRNAs types, we sor- Python). Web user interfaces were developed using ted out the lncRNAs sequences of the nine species Django (https://www.djangoproject.com; a high-level mentioned above in the RNAcentral Database (The et al. Python web framework that encourages rapid devel- 2017) and the EVLncRNAs Databases (Zhou et al. 2018). opment and clean, pragmatic design), HTML5, CSS3, To extract comprehensive and high-quality informa- AJAX (Asynchronous JavaScript and XML; a set of web tion from published GWAS articles, we used the key- development techniques used to create asynchronous words ‘‘species’’ and ‘‘GWAS’’ to search for articles applications without interfering with the display and published in PubMed and we obtained 2227 relevant behavior of the existing page), JQuery (a cross-platform research articles that were published after 2009. and feature-rich JavaScript library; http://jquery.com, Afterward, Articles were selected if there were a large version 1.10.2), Vue (https://vuejs.org; the Progressive number of candidates for significant SNP-phenotype JavaScript Framework, version 2.6.14), layui (https:// correlation analysis data, while articles with segmental github.com/sentsin/layui/; a classic modular front-end and phenotypic correlation data or no SNP-phenotype UI framework), and Boot-Strap (an open-source toolkit The Author(s) 2022 172 aBIOTECH (2022) 3:169–177 Fig. 1 Data processing workflow and outcomes of LncPheDB. A The nine species included in the database. B The data processing workflow of lncRNA and the curation process adopted by the GWAS is on the right. C Summary of the data contained in LncPheDB. D Database statistics in this study for developing web projects with HTML, CSS, and JS; to phenotypes. First, by carrying out RNA-seq analysis https://getbootstrap.com, version4.6.0). For dynamic and sorting out the data of various non-coding region genome visualization and analysis, JBrowse Genome databases in RNAcentral and EVLncRNAs, we obtained a Browser (a fast, scalable genome browser built com- total of 203,391 LncRNA sequences. Precisely, 32,397, pletely with JavaScript and HTML5; https://jbrowse. 32,192, 43,659, 8,741, 11,565, 25,884, 27,623, 12,577, org/jbrowse1.html, version 1.16.11) was adopted to 8,753 lncRNAs were obtained for Zea mays L., Gossyp- generate interactive charts. ium barbadense L., Triticum aestivum L., Lycopersicon esculentum Mille, Oryza sativa L., Hordeum vulgare L., Sorghum Bicolor L., Glycine max L., and Cucumis sativus RESULTS L., respectively. And based on the standard screening process, we integrated 2,000 important agronomic traits GWAS revealed many genetic variants associated with and 120,271 SNPs that have a significant effect on the phenotypes. Thousands of GWAS studies have revealed phenotype of the nine species from the 421 articles. that 93% of common genetic variants associated with Among them, Oryza sativa L. and Zea mays L. have 764 specific traits or diseases are located in non-coding and 573 traits, respectively, which account for 66.85% regions (Finucane et al. 2015; Schaid et al. 2018). Of of all traits, while Gossypium barbadense L. has the least these, more than 90% of the variants were SNPs. In traits, which account for 0.5%. Meanwhile, 68,862 addition, the density of SNPs in lncRNA regions is sim- lncRNA sequences that can regulate important agro- ilar to that in protein-coding regions. Some lncRNA nomic traits were predicted (Table 1). intervals even have higher SNP densities than the In addition, to make it easier and more efficient for genomic mean (Jin et al. 2011). SNP variants in lncRNA users to use the data. We provide a web service inter- can affect mRNA expression through variable shear, face-LncPheDB. LncPheDB provides a user-friendly localization, and stability of mRNA. Therefore, the interface, a visual platform and a variety of search association between lncRNA SNPs and phenotypes options. The LncPheDB database mainly provides the needs to be studied in depth. It has been shown that reference genome information of nine species (the size lncRNAs can influence complex traits at multiple levels of the reference genome, number of chromosomes, and of epigenetic regulation, transcriptional regulation, and number of protein-coding genes). Basic information post-transcriptional regulation (Zhang et al. 2018). To regarding all lncRNAs and phenotype-related lncRNAs provide a comprehensive resource for linking lncRNAs (e. g. species, lncRNA identity (ID), chromosome, start The Author(s) 2022 aBIOTECH (2022) 3:169–177 173 Table1 Detail information about LncPheDB Species Phenotype Var Publications LncRNAs lncRNAs (Phenotype) Version Zea mays L. 573 71,058 151 32,397 28,164 B73_RefGen_v4 Gossypium barbadense L. 10 111 2 32,192 813 GCA_008761655.1 Triticum aestivum L. 50 755 11 43,659 4773 refseqv1.0 Lycopersicon esculentum Mille 132 787 9 8741 1212 ITAG4.0 Oryza sativa L. 764 23,690 117 11,565 8384 MSU_osa1r7 Hordeum vulgare L. 17 750 6 25,884 5508 version.1.0 Sorghum Bicolor L. 250 17,855 57 27,623 16,431 GCF_000003195.3 Glycine max L. 193 5129 66 12,577 3273 GCF_000004515.5 Cucumis sativus L. 11 136 2 8753 304 GCF_000004075.3 site, termination site, and positive and negative chain), mainly includes phenotype-related lncRNA ID, species, as well as basic information of GWAS results (e. g. GWAS chromosome position, lncRNA initiation and termina- phenotypic traits, location of peak in genome, and P- tion sites, Positive and negative chains, regulated phe- value) is provided. Furthermore, LncPheDB also pro- notype, Peak Position, P-value of phenotype-SNP vides functional information on genes associated with correlation, mapped genes, and sequence of mapped lncRNAs and protein sequence information of genes in genes. In this module, we merge adjacent significant various species (by searching the SWISS-PROT data- SNPs whose distance is less than the species LD into a base), and the regulatory network information of single association signal based on the LD decay of each lncRNAs related to phenotypes (Fig. 2). species. The SNP with the minimum P value in a signal LncPheDB provides two search engines: the lncRNA region was considered to be the lead SNP. Finally, the search engine and the GWAS search engine. The lncRNA related lncRNA and mRNA were predicted according to module provides comprehensive lncRNA-phenotype the LD of each species. This module focuses on explor- correlation data in each species, which are created in ing the linkage among SNPs and the linkage between the form of columns into tables. Each correlation data SNPs and lncRNA or mRNA. There are also more Fig. 2 Database contents and functions of LncPheDB The Author(s) 2022 174 aBIOTECH (2022) 3:169–177 phenotypes highlighted in this module, such as: the genes of known and predicted lncRNAs by psRobot (Wu SNPs 201,770,002 (P = 3.65E-59), 201,770,047 et al. 2012), psMimic (Wu et al. 2013) and IntaRNA (P = 4.97E-07), and 201,770,048 (P = 3.65E-59) loca- (Mann et al. 2017), which were presented in the form of ted on chromosome 2 are significantly associated with regulatory networks, marked them with different colors, maize leaves, and the SNPs is located within the lncRNA and set three buttons, which allow users to hide cor- URS0000D75A41_4577.4871 (201,769,823–201,770, responding genes by clicking the corresponding but- 124). So we speculate that lncRNA URS0000D75A41_ tons. In addition to downloading the information from 4577.4871 may be associated with maize leaves. In the corresponding search page, users can also download addition, for lncRNAs of interest, users can use our the reference genome information for each species, the database for in-depth exploration. For instance, for lncRNA fasta sequence files, lncRNA Potential Encoding maize lncRNA EL0549, after selecting the maize species, File, lncRNA Expression File and the GFF files for if you enter lncRNA EL0549 and click ‘‘search’’, you can database construction via the download page. Moreover, easily find information regarding the position of lncRNA users can also download the GWAS information file EL0549, relevant GWAS information, and the informa- (such as associated phenotypic information, SNP, tion that EL0549 regulates maize’s flour fiber content, p-value, and information about studies) and the gene proline content, breakdown viscosity, flour fiber con- GFF file of each species. tent, flour protein content, ear infructescence position, and maize kernels. To further determine the biological processes between lncRNA and traits, such as maize DISCUSSION entrainment, protein content, and fiber concentration, among others, Users can click ‘‘Function’’ to view the With the development of sequencing technology in the functional information of genes associated with past few years, a large number of lncRNAs have been lncRNAs. Meanwhile, users can also click ‘‘Sequence’’ to identified and great progress has been made in the view the protein sequence of genes (Supplemental study of lncRNAs in plants. However, compared with the Fig. S1). By phenotype, lncRNA/Gene ID or GWAS locus lncRNAs in animals and humans, there is a very limited input, the GWAS module can be used to obtain pheno- understanding of lncRNAs in plants, especially in terms type-associated genes or lncRNAs for each species, of the mechanism of lncRNAs in regulating important genome-wide variant loci significantly associated with agronomic traits and affecting the yield and quality of phenotypes, correlation P values, etc. The correlation model plants (Heo et al. 2013; Liu et al. 2012;Mann data for this module are mainly obtained based on the et al. 2017; Xiao et al. 2009; Yang et al. 2014). With the deepening of research, some well-annotated databases, amplification of individual variant loci, emphasizing the relative position between the variant loci and the such as PLncDB V2.0 (Jin et al., 2011) and GREENC lncRNA or gene. In the GWAS module, users can explore (Gallart et al. 2016), have given comprehensive anno- the phenotypes of their interest. For instance, the key- tations to some basic information of lncRNAs, such as word ‘‘100 grain weight’’ can be used for maize (Sup- the position and sequence. Researchers have shifted plemental Fig. S2). All search results can be downloaded their focus from identifying new lncRNAs to the func- in the form of a list. The combination of this lncRNA tional research of lncRNAs. In recent years, researchers module and the GWAS module allows for a more com- have investigated the functions of lncRNAs in plants. prehensive genome-wide prediction of phenotypic traits However, at present, the identified lncRNAs whose that may be regulated by lncRNAs or gene. Meanwhile, regulatory mechanism has been clarified are less than we also added the JBrowse genome browser, which 1% (Quek et al. 2015). In addition, the research results allows users to intuitively search for the relative posi- of some lncRNAs provide a low reference value for the tion distribution of lncRNAs and genes on study of other lncRNAs due to the differences in types chromosomes. and functions of lncRNAs, which affect gene expression To study the sequence similarity, we designed a Blast in a wide range at different levels. Therefore, research- tool (version 2.12). By searching specific species in the ers’ understanding and research on lncRNAs are limited. whole database, the BLAST service enables users to At present, it is imperative to use a genome-wide search for similar lncRNA sequences. In the BLAST database to investigate the relationship between results, users can directly view the phenotypic traits lncRNAs and phenotypes and explore the potential related to lncRNAs with similar fragments by clicking regulatory mechanism of lncRNAs. the ‘‘Click here to search LncRNA: lncRNA ID’’ tab. To Compared with other plant lncRNA databases, enable users to view LncRNA and its regulated target LncPheDB focuses on exploring data resources about genes clearly and concisely, we predicted the target lncRNA-regulated phenotypes. Using standardized The Author(s) 2022 aBIOTECH (2022) 3:169–177 175 Acknowledgements We thank all the members who participated screening criteria, LncPheDB manually sorted a total of in the construction of this database. Thanks for the support of the 203,391 lncRNA sequences, 2000 phenotypes, and Key Laboratory of Grain Crop Genetic Resources Evaluation and 120,271 SNPs. Finally, it listed 68,862 lncRNA sequen- Utilization. ces that are associated with agronomic traits. And according to the study. The lncRNA osa-eTM160 (Osa- Authors’ contributions Danjing Lou is involved in conceptual- izing, writing and editing this manuscript. Xiaoming Zheng, eTM160 is a 688 bp long lncRNA transcribed Qingwen Yang, Qian Qian conceived the project. Danjing Lou, Fei between LOC_Os03g12815 and LOC_Os03g12820 of Li, Jinyue Ge, Weiya Fan, Ziran Liu, Yanyan Wang, Jingfen Huang, rice chromosome 3) in rice has a role in regulating rice Meng Xing, Wenlong Guo, Shizhuang Wang, Weihua Qiao and fertility and seed size by competitively binding Zhenyun Han analysed the data. OsmiR160 with OsARF18. However, the potential regu- Funding This work was supported by the National Key Research latory significance of lncRNA URS00008EDDE3_ and Development Program of China (2021YFD1200101 to Z.X.M.), 39947.4350 (also known as osa-eTM160) on rice seed the National Natural Science Foundation of China (31670211 and fertility, days to flowering, seed weight, arsenic accu- 31970237 to Z.X.M.), Sanya Yazhou Bay Science and Technology mulation, germination rate and grain Mn concentration City (SKJC-2020–02-001 to Z.X.M.), the Central Public-interest Scientific Institution Basal Research Fund (S2021ZD01 to Z.X.M.). is predicted in our database, which further confirms the significance of our database. Moreover, users can use the Data availability LncPheDB is freely available at https://www. lncRNA sequences they are investigating to conduct a lncphedb.com/. BLAST comparison with all species in the data resource to identify the conservative lncRNA-regulated pheno- Code availability This study involves database building code. types. Furthermore, LncPheDB also provides users with Declarations convenient browsing and search services. Thus, users can search lncRNAs correlation from various aspects, Conflict of interest Author declares no conflicts of interests. such as Gene ID, LncRNA ID, genome position, SNP, and phenotype. To help users explore the potential molec- Ethical approval This manuscript is not involved in any animal experiments. ular regulatory mechanism of lncRNAs in complex traits, we summarized and sorted out the target gene predic- Consent to participate Necessary approval is obtained. tion of lncRNAs and visually displayed it in the form of a regulation network. Users can hide or display the cor- Consent for publication Necessary approval is obtained. responding data by clicking different buttons. Open Access This article is licensed under a Creative Commons As a future perspective, by focusing on the study of Attribution 4.0 International License, which permits use, sharing, data resources regarding lncRNA-regulated phenotypes, adaptation, distribution and reproduction in any medium or for- we will add more lncRNA-related phenotypes for more mat, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons species. In addition, since we found that the number of licence, and indicate if changes were made. The images or other relevant studies was unexpectedly large when collecting third party material in this article are included in the article’s and sorting out data, we will sort out more data Creative Commons licence, unless indicated otherwise in a credit regarding lncRNA-regulated phenotypes with clear line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted regulatory mechanisms and predictions from existing by statutory regulation or exceeds the permitted use, you will studies and timely update the data resources. To further need to obtain permission directly from the copyright holder. To clarify the regulatory mechanism of lncRNAs, we will view a copy of this licence, visit http://creativecommons.org/ add more sequence information of miRNAs that are licenses/by/4.0/. complementary to lncRNAs and increase the tissue- specific expression information of lncRNAs. Meanwhile, to enrich the transcriptome information of rice, we will References add relevant transcriptome data in our research to facilitate scientific research and utilization. Notwith- Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, standing, we also encourage all researchers to submit Rinn JL (2011) Integrative annotation of human large their relevant studies via the contact page. We believe intergenic noncoding RNAs reveals global properties and that LncPheDB will provide assistance for the study of specific subclasses. Gene Dev 25:1915–1927. https://doi. org/10.1101/gad.17446611 the functions of lncRNAs. Cheetham SW, Gruhl F, Mattick JS, Dinger ME (2013) Long noncoding RNAs and the genetics of cancer. Brit J Cancer Supplementary InformationThe online version contains 108:2419–2425. https://doi.org/10.1038/bjc.2013.233 supplementary material available at https://doi.org/10.1007/ Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, s42994-022-00084-3. Guernec G, Martin D, Merkel A, Knowles DG, Lagarde J, The Author(s) 2022 176 aBIOTECH (2022) 3:169–177 Veeravalli L, Ruan X, Ruan Y, Lassmann T, Carninci P, Brown Kopp F, Mendell JT (2018) Functional classification and experi- JB, Lipovich L, Gonzalez JM, Thomas M, Davis CA, Shiekhattar mental dissection of long noncoding RNAs. Cell 172:393–407. R, Gingeras TR, Hubbard TJ, Notredame C, Harrow J, Guigo´R https://doi.org/10.1016/j.cell.2018.01.011 (2012) The GENCODE v7 catalog of human long noncoding Kung JTY, Colognori D, Lee JT (2013) Long noncoding RNAs: past, RNAs: analysis of their gene structure, evolution, and present, and future. Genetics 193:651–669. https://doi.org/ expression. Genome Res 22:1775–1789. https://doi.org/10. 10.1534/genetics.112.146704 1101/gr.132159.111 Lee JT (2009) Lessons from X-chromosome inactivation: long Fan Y, Yang J, Mathioni SM, Yu J, Shen J, Yang X, Wang L, Zhang Q, ncRNA as guides and tethers to the epigenome. Gene Dev Cai Z, Xu C, Li X, Xiao J, Meyers BC, Zhang Q (2016) PMS1T, 23:1831–1842. https://doi.org/10.1101/gad.1811209 producing phased small-interfering RNAs, regulates photope- Li A, Zhang J, Zhou Z (2014) PLEK: a tool for predicting long non- riod-sensitive male sterility in rice. PNAS 113:15144–15149. coding RNAs and messenger RNAs based on an improved https://doi.org/10.1073/pnas.1619159114 k-mer scheme. BMC Bioinformatics 15:311. https://doi.org/ Fang J, ZhangF,WangH,WangW,ZhaoF,LiZ,Sun C, Chen F, Xu F, 10.1186/1471-2105-15-311 ChangS,WuL,Bu Q,WangP,Xie J, Chen F, HuangX,Zhang Y, Liu J, Jung C, Xu J, Wang H, Deng S, Bernad L, Arenas-Huertero C, Zhu X, Han B, Deng X, Chu C (2019) Ef-cd locus shortens rice Chua N (2012) Genome-wide analysis uncovers regulation of maturity duration without yield penalty. PNAS long intergenic noncoding RNAs in Arabidopsis. Plant Cell 116:18717–18722. https://doi.org/10.1073/pnas.1815030116 24:4333–4345. https://doi.org/10.1105/tpc.112.102855 Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y, Loh P, Liu X, Li D, Zhang D, Yin D, Zhao Y, Ji C, Zhao X, Li X, He Q, Chen R, Anttila V, Xu H, Zang C, Farh K, Pipke S, Day FR, Consortium R, Hu S, Zhu L (2018) A novel antisense long noncoding RNA, Purcell S, Stahl E, Lindstrom S, Perry JRB, Okada Y, TWISTED LEAF, maintains leaf blade flattening by regulating Raychaudhuri S, Daly MJ, Patterson N, Neale BM, Price AL its associated sense R2R3-MYB gene in rice. New Phytol (2015) Partitioning heritability by functional annotation 218:774–788. https://doi.org/10.1111/nph.15023 using genome-wide association summary statistics. Nat Genet Mann M, Wright PR, Backofen R (2017) IntaRNA 2.0: enhanced 47:1228–1235. https://doi.org/10.1038/ng.3404 and customizable prediction of RNA-RNA interactions. Guttman M, Donaghey J, Carey BW, Garber M, Grenier JK, Munson Nucleic Acids Res 45:W435–W439. https://doi.org/10. G, Young G, Lucas AB, Ach R, Bruhn L, Yang X, Amit I, Meissner 1093/nar/gkx279 A, Regev A, Rinn JL, Root DE, Lander ES (2011) lincRNAs act Martianov I, Ramadass A, Serra Barros A, Chow N, Akoulitchev A in the circuitry controlling pluripotency and differentiation. (2007) Repression of the human dihydrofolate reductase Nature 477:295–300. https://doi.org/10.1038/nature10398 gene by a non-coding interfering transcript. Nature Guttman M, Rinn JL (2012) Modular regulatory principles of large 445:666–670. https://doi.org/10.1038/nature05519 non-coding RNAs. Nature 482:339–346. https://doi.org/10. Morris KV, Mattick JS (2014) The rise of regulatory RNA. Nat Rev 1038/nature10887 Genet 15:423–437. https://doi.org/10.1038/nrg3722 Heo JB, Lee Y, Sung S (2013) Epigenetic regulation by long Nagano T, Mitchell JA, Sanz LA, Pauler FM, Ferguson-Smith AC, Feil noncoding RNAs in plants. Chromosome Res 21:685–693. R, Fraser P (2008) The air noncoding RNA epigenetically https://doi.org/10.1007/s10577-013-9392-6 silences transcription by targeting G9a to chromatin. Science Huarte M, Guttman M, Feldser D, Garber M, Koziol MJ, Kenzel- 322:1717–1720. https://doi.org/10.1126/science.1163802 mann-Broz D, Khalil AM, Zuk O, Amit I, Rabani M, Attardi LD, Negri TDC, Alves WAL, Bugatti PH, Saito PTM, Domingues DS, Regev A, Lander ES, Jacks T, Rinn JL (2010) A large intergenic Paschoal AR (2019) Pattern recognition analysis on long noncoding RNA induced by p53 mediates global gene noncoding RNAs: a tool for prediction in plants. Brief repression in the p53 response. Cell 142:409–419. https:// Bioinform 20:682–689. https://doi.org/10.1093/bib/bby034 doi.org/10.3410/f.5523957.5491055 Osato N, Yamada H, Satoh K, Ooka H, Yamamoto M, Suzuki K, Jin G, Sun J, Isaacs SD, Wiley KE, Kim ST, Chu LW, Zhang Z, Zhao H, Kawai J, Carninci P, Ohtomo Y, Murakami K, Matsubara K, Zheng SL, Isaacs WB, Xu J (2011) Human polymorphisms at Kikuchi S, Hayashizaki Y (2003) Antisense transcripts with long non-coding RNAs (lncRNAs) and association with rice full-length cDNAs. Genome Biol 5:R5. https://doi.org/10. prostate cancer risk. Carcinogenesis 32:1655–1659. https:// 1186/gb-2003-5-1-r5 doi.org/10.1093/carcin/bgr187 Paytuvı´ Gallart A, Hermoso Pulido A, Lagra´n AMD, I, Sanseverino Jin J, Lu P, Xu Y, Li Z, Yu S, Liu J, Wang H, Chua N, Cao P (2021) W, Aiese Cigliano R, (2016) GREENC: a Wiki-based database PLncDB V2.0: a comprehensive encyclopedia of plant long of plant lncRNAs. Nucleic Acids Res 44:D1161–D1166. noncoding RNAs. Nucleic Acids Res 49:D1489–D1495. https://doi.org/10.1093/nar/gkv1215 https://doi.org/10.1093/nar/gkaa910 Pertea M, Pertea GM, Antonescu CM, Chang T, Mendell JT, Salzberg Katayama S, Tomaru Y, Kasukawa T, Waki K, Nakanishi M, SL (2015) StringTie enables improved reconstruction of a Nakamura M, Nishida H, Yap CC, Suzuki M, Kawai J, Suzuki transcriptome from RNA-seq reads. Nat Biotechnol H, Carninci P, Hayashizaki Y, Wells C, Frith M, Ravasi T, Pang 33:290–295. https://doi.org/10.1038/nbt.3122 KC, Hallinan J, Mattick J, Hume DA, Lipovich L, Batalov S, Quek XC, Thomson DW, Maag JLV, Bartonicek N, Signal B, Clark MB, Engstro¨m PG, Mizuno Y, Faghihi MA, Sandelin A, Chalk AM, Gloss BS, Dinger ME (2015) lncRNAdb v2.0: expanding the Mottagui-Tabar S, Liang Z, Lenhard B, Wahlestedt C (2005) reference database for functional long noncoding RNAs. Antisense transcription in the mammalian transcriptome. Nucleic Acids Res 43:D168–D173. https://doi.org/10.1093/ Science 309:1564–1566. https://doi.org/10.1126/science. nar/gku988 1112009 Rinn JL, Chang HY (2012) Genome regulation by long noncoding Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced RNAs. Annu Rev Biochem 81:145–166. https://doi.org/10. aligner with low memory requirements. Nat Methods 1146/annurev-biochem-051410-092902 12:357–360. https://doi.org/10.1038/nmeth.3317 Schaid DJ, Chen W, Larson NB (2018) From genome-wide Kong L, Zhang Y, Ye Z, Liu X, Zhao S, Wei L, Gao G (2007) CPC: associations to candidate causal variants by statistical fine- assess the protein-coding potential of transcripts using mapping. Nat Rev Genet 19:491–504. https://doi.org/10. sequence features and support vector machine. Nucleic Acids 1038/s41576-018-0016-z Res 35:W345–W349. https://doi.org/10.1093/nar/gkm391 The Author(s) 2022 aBIOTECH (2022) 3:169–177 177 Simopoulos CMA, Weretilnyk EA, Golding GB (2018) Prediction of in plants. Plant Physiol 161:1875–1884. https://doi.org/10. plant lncRNA by ensemble machine learning classifiers. BMC 1104/pp.113.215962 Genomics 19:316. https://doi.org/10.1186/s12864-018- Wu H, Yang L, Chen L (2017) The diversity of long noncoding 4665-2 RNAs and their generation. Trends Genet 33:540–552. Sleutels F, Zwart R, Barlow DP (2002) The non-coding Air RNA is https://doi.org/10.1016/j.tig.2017.05.004 required for silencing autosomal imprinted genes. Nature Xiao B, Zhang X, Li Y, Tang Z, Yang S, Mu Y, Cui W, Ao H, Li K (2009) 415:810–813. https://doi.org/10.1038/415810a Identification, bioinformatic analysis and expression profiling Sun X, Zheng H, Sui N (2018) Regulation mechanism of long non- of candidate mRNA-like non-coding RNAs in Sus scrofa. coding RNA in plant response to stress. Biochem Bioph Res J Genet Genomics 36:695–702. https://doi.org/10.1016/ Co 503:402–407. https://doi.org/10.1016/j.bbrc.2018.07. S1673-8527(08)60162-9 072 Xu S, Dong Q, Deng M, Lin D, Xiao J, Cheng P, Xing L, Niu Y, Gao C, Szczes´niak MW, Bryzghalov O, Ciomborowska-Basheer J, Zhang W, Xu Y, Chong K (2021) The vernalization-induced Makałowska I (2019) CANTATAdb 2.0: Expanding the Collec- long non-coding RNA VAS functions with the transcription tion of Plant Long Noncoding RNAs. In: Chekanova JA, Wang factor TaRF2b to promote TaVRN1 expression for flowering HV (eds) Plant Long Non-Coding RNAs: Methods and Proto- in hexaploid wheat. Mol Plant 14:1525–1538. https://doi. cols. New York, NY, Springer, New York, pp 415–429 org/10.1016/j.molp.2021.05.026 Terryn N, Rouze´ P (2000) The sense of naturally transcribed Yang G, Lu X, Yuan L (2014) LncRNA: a link between RNA and antisense RNAs in plants. Trends Plant Sci 5:394–396. cancer. Biochim Biophys Acta Gene Regul Mech https://doi.org/10.1016/S1360-1385(00)01696-4 1839:1097–1109. https://doi.org/10.1016/j.bbagrm.2014. The RC, Petrov AI, Kay SJE, Kalvari I, Howe KL, Gray KA, Bruford 08.012 EA, Kersey PJ, Cochrane G, Finn RD, Bateman A, Kozomara A, Zhang Y, Liu XS, Liu Q, Wei L (2006) Genome-wide in silico Griffiths-Jones S, Frankish A, Zwieb CW, Lau BY, Williams KP, identification and analysis of cis natural antisense transcripts Chan PP, Lowe TM, Cannone JJ, Gutell R, Machnicka MA, (cis -NATs) in ten species. Nucleic Acids Res 34:3465–3475. Bujnicki JM, Yoshihama M, Kenmochi N, Chai B, Cole JR, https://doi.org/10.1093/nar/gkl473 Szymanski M, Karlowski WM, Wood V, Huala E, Berardini TZ, Zhang Y, Liao J, Li Z, Yu Y, Zhang J, Li Q, Qu L, Shu W, Chen Y (2014) Zhao Y, Chen R, Zhu W, Paraskevopoulou MD, Vlachos IS, Genome-wide screening and functional analysis identify a Hatzigeorgiou AG, Ma L, Zhang Z, Puetz J, Stadler PF, large number of long noncoding RNAs involved in the sexual McDonald D, Basu S, Fey P, Engel SR, Cherry JM, Volders P, reproduction of rice. Genome Biol 15:512. https://doi.org/10. Mestdagh P, Wower J, Clark MB, Quek XC, Dinger ME (2017) 1186/s13059-014-0512-1 RNAcentral: a comprehensive database of non-coding RNA Zhang Z, Xu Y, Yang F, Xiao B, Li G (2021) RiceLncPedia: a sequences. Nucleic Acids Res 45:D128–D134. https://doi. comprehensive database of rice long non-coding RNAs. Plant org/10.1093/nar/gkw1008 Biotechnol J 19:1492–1494. https://doi.org/10.1111/pbi. Uchida S, Dimmeler S (2015) Long noncoding RNAs in cardiovas- 13639 cular diseases. Circ Res 116:737–750. https://doi.org/10. Zhang Y, Tao Y, Liao Q (2018) Long noncoding RNA: a crosslink in 1161/CIRCRESAHA.116.302521 biological regulatory network. Brief Bioinformatics Wang X, Gaasterland T, Chua N (2005) Genome-wide prediction 19:930–945. https://doi.org/10.1093/bib/bbx042 and identification of cis-natural antisense transcripts in Zhao X, Li J, Lian B, Gu H, Li Y, Qi Y (2018) Global identification of Arabidopsis thaliana. Genome Biol 6:R30. https://doi.org/ Arabidopsis lncRNAs reveals the regulation of MAF4 by a 10.1186/gb-2005-6-4-r30 natural antisense RNA. NC 9:1–12 Wang Y, Luo X, Sun F, Hu J, Zha X, Su W, Yang J (2018) Zhou B, Zhao H, Yu J, Guo C, Dou X, Song F, Hu G, Cao Z, Qu Y, Yang Overexpressing lncRNA LAIR increases grain yield and Y, Zhou Y, Wang J (2018) EVLncRNAs: a manually curated regulates neighbouring gene cluster expression in rice. NC database for long non-coding RNAs validated by low- 9:1–9 throughput experiments. Nucleic Acids Res 46:D100–D105. Wu H, Ma Y, Chen T, Wang M, Wang X (2012) PsRobot: a web- https://doi.org/10.1093/nar/gkx677 based plant small RNA meta-analysis toolbox. Nucleic Acids Zhu D, Deng XW (2012) A non-coding RNA locus mediates Res 40:W22–W28. https://doi.org/10.1093/nar/gks554 environment-conditioned male sterility in rice. Cell Res Wu H, Wang Z, Wang M, Wang X (2013) Widespread long 22:791–792. https://doi.org/10.1038/cr.2012.43 noncoding RNAs as endogenous target mimics for microRNAs The Author(s) 2022 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png aBIOTECH Springer Journals

Loading next page...
 
/lp/springer-journals/lncphedb-a-genome-wide-lncrnas-regulated-phenotypes-database-in-plants-VpzO3Izb50
Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2022
eISSN
2662-1738
DOI
10.1007/s42994-022-00084-3
Publisher site
See Article on Publisher Site

Abstract

aBIOTECH (2022) 3:169–177 https://doi.org/10.1007/s42994-022-00084-3 aBIOTECH BRIEF COMMUNICATION LncPheDB: a genome-wide lncRNAs regulated phenotypes database in plants 1 1 1 1 2 1 Danjing Lou , Fei Li , Jinyue Ge , Weiya Fan , Ziran Liu , Yanyan Wang , 1 1 1 1 Jingfen Huang , Meng Xing , Wenlong Guo , Shizhuang Wang , 1,3 1 1,4,5 1,3& Weihua Qiao , Zhenyun Han , Qian Qian , Qingwen Yang , 1,3,6& Xiaoming Zheng National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China College of Life Science, Shenyang Normal University, Shenyang 110034, China National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya 572000, China State Key Laboratory of Rice Biology, China National Rice Research Institute, Chinese Academy of Agricultural Sciences, Hangzhou 310006, China Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China International Rice Research Institute, DAPO box 7777 Metro Manila, The Philippines Received: 30 June 2022 / Accepted: 12 September 2022 / Published online: 5 October 2022 Abstract LncPheDB (https://www.lncphedb.com/) is a systematic resource of genome-wide long non-coding RNAs (lncRNAs)-phenotypes associations for multiple species. It was established to display the gen- ome-wide lncRNA annotations, target genes prediction, variant-trait associations, gene-phenotype correlations, lncRNA-phenotype correlations, and the similar non-coding regions of the queried sequence in multiple species. LncPheDB sorted out a total of 203,391 lncRNA sequences, 2000 phe- notypes, and 120,271 variants of nine species (Zea mays L., Gossypium barbadense L., Triticum aestivum L., Lycopersicon esculentum Mille, Oryza sativa L., Hordeum vulgare L., Sorghum bicolor L., Glycine max L., and Cucumis sativus L.). By exploring the relationship between lncRNAs and the genomic position of variants in genome-wide association analysis, a total of 68,862 lncRNAs were found to be related to the diversity of agronomic traits. More importantly, to facilitate the study of the functions of lncRNAs, we analyzed the possible target genes of lncRNAs, constructed a blast tool for performing similar frag- mentation studies in all species, linked the pages of phenotypic studies related to lncRNAs that possess similar fragments and constructed their regulatory networks. In addition, LncPheDB also provides a user-friendly interface, a genome visualization platform, and multi-level and multi-modal convenient data search engine. We believe that LncPheDB plays a crucial role in mining lncRNA-related plant data. Keywords LncRNA, GWAS, Phenotype, SNP, Plants INTRODUCTION LncRNAs are a class of non-coding RNAs that are more than 200 nucleotides in length. Initially, this type of RNA was once considered to be ‘‘junk’’ material in the gen- & Correspondence: yangqingwen@caas.cn (Q. Yang), ome. However, as the research continues, there is zhengxiaoming@caas.cn (X. Zheng) The Author(s) 2022 170 aBIOTECH (2022) 3:169–177 growing evidence that lncRNAs are key players in lncRNA genome position, sequence, and structure, the growth and development, metabolism and regulatory expression in tissues, and the query and visual display processes in a variety of organisms, particularly in of gene regulation networks. However, the database can mammals and humans (Kopp and Mendell 2018; Kung only perform a Basic Local Alignment Search Tool et al. 2013; Morris and Mattick 2014; Sun et al. 2018; (BLAST) analysis of single species. The CANTATAdb 2.0 Uchida and Dimmeler 2015;Wuetal. 2017). However, database (Szczes´niak et al. 2019), which contains the study of lncRNAs in plants remains in its infancy. lncRNAs of plants and algae, leverages on JBrowse, eFP Currently, it has been found in plants that lncRNAs not Browser, EPexplorer, and other analysis tools to search only play an important role in regulating growth and for the maximum peptide length, maximum expression developmental processes such as growth hormone level, number of lncRNA exons, and other information of transport and signal transduction in plants. It also plays lncRNAs in species. The GreeNC database (Gallart et al. an important role in improving crop yield (Wang et al. 2016) can extract the position, sequence, coding 2018), leaf distortion (Liu et al. 2018), plant fertility potential, folding energy, and other information of (Fang et al. 2019; Zhao et al. 2018), fruit fertility (Fan lncRNAs in various species; it can be used to perform a et al. 2016) and other important agronomic traits. But BLAST analysis of one or more species. Most of the the vast majority of lncRNA regulatory explorations databases constructed by researchers in the early days with clear mechanisms are nowadays performed in focused on some basic annotation information about the Arabidopsis thaliana. Our understanding of the mecha- sequence and position of lncRNAs. However, they lacked nisms regulating lncRNAs in crop species remains lim- comprehensive annotation information. In addition, ited. In addition, in recent years, transcriptome data very few databases could provide information about the have been used to carry out a large number of lncRNAs- correlation between lncRNAs and phenotypes, the sim- related studies (Katayama et al. 2005; Osato et al. 2003; ilarity of lncRNAs among multiple species and display Terryn and Rouze´ 2000; Wang et al. 2005; Zhang et al. the possible correlation between these similar frag- 2006, 2014; Zhu and Deng 2012). Studies have shown ments and phenotypes. The RiceLncPedia database that there are 32,397 lncRNAs in maize, 11,565 lncRNAs (Zhang et al. 2021), a newly built database, has com- in rice, and 12,577 lncRNAs in soybean (Jin et al. 2021). prehensive annotation information of lncRNAs. For It has also been revealed that lncRNAs are generally instance, the database collects multi-omics information, characterized by low expression, poor conservativeness such as quantitative trait locus, GWAS, transposons, and among different species, and tissue specificity (Derrien variant sites (SNPs). However, it only shows the et al. 2012; Cabili et al. 2011). These characteristics lncRNAs of rice, but no blast tool is available to study make the study of lncRNAs functions a herculean task. the similarity of lncRNAs among different species. At present, although a large number of lncRNAs have Therefore, it is necessary to build a database that been identified through transcriptome research, the explores the similarity of lncRNAs in multiple species lncRNAs whose functions have been further verified are and combines lncRNAs with GWAS. less than 1% (Quek et al. 2015). Furthermore, the In this study, we built a database containing the genome-wide association study (GWAS) of multiple lncRNAs information of nine common crops, including species revealed that 84% of trait-related variation loci Zea mays L., Gossypium barbadense L., Triticum aestivum are located in non-coding sequences (Cheetham et al. L., Lycopersicon esculentum Mille, Oryza sativa L., Hor- 2013). However, the non-coding regions in the genome deum vulgare L., Sorghum bicolor L., Glycine max L., and lack annotations and other relevant information. This Cucumis sativus L. The database provides information hinders our further research on the non-coding regions. about the sequence and position of lncRNAs, the dis- The lncRNAs database is a very good tool to facilitate tribution of lncRNAs in the genome, the population a detailed and accurate study of lncRNAs. In recent variation of lncRNAs, and the phenotypic traits that may years, a total of 20 plant-related lncRNA databases have be regulated, among others. In addition, the database been established. They have averaged a whopping 530 can also use the BLAST tool to investigate the conser- citations since publication. But most of these databases vativeness of target gene sequences in various species provide the basic information of lncRNAs in species and and the phenotypic conditions that may be regulated. target gene prediction according to transcriptome data. Our database is designed to further improve the anno- For instance, the PLncDB database (Jin et al. 2021) can tation information of lncRNAs in plants to further provide basic information about various plants, such as explore the possible functions of lncRNAs. The Author(s) 2022 aBIOTECH (2022) 3:169–177 171 MATERIALS AND METHODS correlation analysis data were removed. We found 497 articles with data that are significantly related to gen- Data collection and sorting ome-wide variation loci and phenotypic traits. Finally, 421 articles were further screened according to the P- –3 For the LncPheDB database, we selected nine important value (P \ 10 ) of significant GWAS data. In addition, model plants (including Zea mays L., Gossypium bar- the basic information of these articles is listed in Sup- badense L., Triticum aestivum L., Lycopersicon esculen- plemental Table S2. tum Mille, Oryza sativa L., Hordeum vulgare L., Sorghum To link the lncRNAs data with the GWAS result data, bicolor L., Glycine max L., and Cucumis sativus L.) with we used the BWA tool (version 0.7.17) to unify the SNPs great economic value and a high-quality reference gen- from GWAS data in each species and the reference ome. According to the data sequencing method and data genome from lncRNAs data in the same species into the sequencing depth, we extracted a total of 2324 RNA same reference genome. Afterward, we first mapped the sequencing (RNA-Seq) datasets from the National Cen- long segments according to the distance between SNPs ter for Biotechnology Information (NCBI) Sequence (The distance between variant sites was shorter than Read Archive (SRA) database (https://www.ncbi.nlm. the length of the region of linkage disequilibrium (LD)) nih.gov/sra/) (Supplemental Table S1). Using the SRA (Supplemental Table S3), and then amplified the map- toolkit (Version 2.8) under the Linux system, we first ped long segments according to the LD of each species, converted the extracted SRA file into Fastq format and if the lncRNAs and genes are within the incremental trimmed the adapter sequences using Trim Galore region, these lncRNAs are considered to regulate the (version 0.50) (https://www.bioinformatics.babraham. corresponding phenotype and are associated with ac.uk/projects/trim galore/) to obtain clean data. genes. At the same time, we also amplified a single site HIAST2 (Kim et al. 2015) was used to make a compar- in the GWAS results based on the length of the region of ison between the clean data and the reference genome; LD of each species, and based on the positional rela- afterward, the clean data were assembled with StringTie tionship between the gene or lncRNA and the amplified (Pertea et al. 2015). StringTie-merge was used to obtain segment, to determine the phenotypes that lncRNAs or the transcript set of each species. The transcripts were genes may regulate (Guttman and Rinn 2012; Guttman filtered out according to the following criteria: tran- et al. 2011; Huarte et al. 2010; Lee 2009; Martianov script length less than 200 base pairs and open reading et al. 2007; Nagano et al. 2008; Rinn and Chang 2012; frame greater than 120 amino acids. Finally, BLASTx Sleutels et al. 2002). was used to search the SWISS-PROT database to filtered out the transcripts that may encode small peptides with Implementation the parameters -e 1.0e-4-S 1. A comparison between the database and the Rfam database was performed to filter LncPheDB was implemented using PostgreSQL (https:// out tRNAs, rRNAs, sRNAs, and miRNAs. The transcripts www.postgresql.org; a powerful, open-source object- were collected after the filtering. The CPC (Kong et al. relational database system with over 30 years of active 2007), CREMA (Simopoulos et al. 2018), PLEK (Li et al. development that has earned it a strong reputation for 2014), and RNAplonc (Negri et al. 2019) programs were reliability, feature robustness, and performance) and used to calculate the protein-coding ability of tran- Django development server (https://docs.djangopro scripts, and the non-protein-coding transcripts detected ject.com/en/2.2/intro/tutorial01/#the-development- in at least two software were used as candidate lncRNAs server; a lightweight web server written purely in (Fig. 1B). In addition, to enrich lncRNAs types, we sor- Python). Web user interfaces were developed using ted out the lncRNAs sequences of the nine species Django (https://www.djangoproject.com; a high-level mentioned above in the RNAcentral Database (The et al. Python web framework that encourages rapid devel- 2017) and the EVLncRNAs Databases (Zhou et al. 2018). opment and clean, pragmatic design), HTML5, CSS3, To extract comprehensive and high-quality informa- AJAX (Asynchronous JavaScript and XML; a set of web tion from published GWAS articles, we used the key- development techniques used to create asynchronous words ‘‘species’’ and ‘‘GWAS’’ to search for articles applications without interfering with the display and published in PubMed and we obtained 2227 relevant behavior of the existing page), JQuery (a cross-platform research articles that were published after 2009. and feature-rich JavaScript library; http://jquery.com, Afterward, Articles were selected if there were a large version 1.10.2), Vue (https://vuejs.org; the Progressive number of candidates for significant SNP-phenotype JavaScript Framework, version 2.6.14), layui (https:// correlation analysis data, while articles with segmental github.com/sentsin/layui/; a classic modular front-end and phenotypic correlation data or no SNP-phenotype UI framework), and Boot-Strap (an open-source toolkit The Author(s) 2022 172 aBIOTECH (2022) 3:169–177 Fig. 1 Data processing workflow and outcomes of LncPheDB. A The nine species included in the database. B The data processing workflow of lncRNA and the curation process adopted by the GWAS is on the right. C Summary of the data contained in LncPheDB. D Database statistics in this study for developing web projects with HTML, CSS, and JS; to phenotypes. First, by carrying out RNA-seq analysis https://getbootstrap.com, version4.6.0). For dynamic and sorting out the data of various non-coding region genome visualization and analysis, JBrowse Genome databases in RNAcentral and EVLncRNAs, we obtained a Browser (a fast, scalable genome browser built com- total of 203,391 LncRNA sequences. Precisely, 32,397, pletely with JavaScript and HTML5; https://jbrowse. 32,192, 43,659, 8,741, 11,565, 25,884, 27,623, 12,577, org/jbrowse1.html, version 1.16.11) was adopted to 8,753 lncRNAs were obtained for Zea mays L., Gossyp- generate interactive charts. ium barbadense L., Triticum aestivum L., Lycopersicon esculentum Mille, Oryza sativa L., Hordeum vulgare L., Sorghum Bicolor L., Glycine max L., and Cucumis sativus RESULTS L., respectively. And based on the standard screening process, we integrated 2,000 important agronomic traits GWAS revealed many genetic variants associated with and 120,271 SNPs that have a significant effect on the phenotypes. Thousands of GWAS studies have revealed phenotype of the nine species from the 421 articles. that 93% of common genetic variants associated with Among them, Oryza sativa L. and Zea mays L. have 764 specific traits or diseases are located in non-coding and 573 traits, respectively, which account for 66.85% regions (Finucane et al. 2015; Schaid et al. 2018). Of of all traits, while Gossypium barbadense L. has the least these, more than 90% of the variants were SNPs. In traits, which account for 0.5%. Meanwhile, 68,862 addition, the density of SNPs in lncRNA regions is sim- lncRNA sequences that can regulate important agro- ilar to that in protein-coding regions. Some lncRNA nomic traits were predicted (Table 1). intervals even have higher SNP densities than the In addition, to make it easier and more efficient for genomic mean (Jin et al. 2011). SNP variants in lncRNA users to use the data. We provide a web service inter- can affect mRNA expression through variable shear, face-LncPheDB. LncPheDB provides a user-friendly localization, and stability of mRNA. Therefore, the interface, a visual platform and a variety of search association between lncRNA SNPs and phenotypes options. The LncPheDB database mainly provides the needs to be studied in depth. It has been shown that reference genome information of nine species (the size lncRNAs can influence complex traits at multiple levels of the reference genome, number of chromosomes, and of epigenetic regulation, transcriptional regulation, and number of protein-coding genes). Basic information post-transcriptional regulation (Zhang et al. 2018). To regarding all lncRNAs and phenotype-related lncRNAs provide a comprehensive resource for linking lncRNAs (e. g. species, lncRNA identity (ID), chromosome, start The Author(s) 2022 aBIOTECH (2022) 3:169–177 173 Table1 Detail information about LncPheDB Species Phenotype Var Publications LncRNAs lncRNAs (Phenotype) Version Zea mays L. 573 71,058 151 32,397 28,164 B73_RefGen_v4 Gossypium barbadense L. 10 111 2 32,192 813 GCA_008761655.1 Triticum aestivum L. 50 755 11 43,659 4773 refseqv1.0 Lycopersicon esculentum Mille 132 787 9 8741 1212 ITAG4.0 Oryza sativa L. 764 23,690 117 11,565 8384 MSU_osa1r7 Hordeum vulgare L. 17 750 6 25,884 5508 version.1.0 Sorghum Bicolor L. 250 17,855 57 27,623 16,431 GCF_000003195.3 Glycine max L. 193 5129 66 12,577 3273 GCF_000004515.5 Cucumis sativus L. 11 136 2 8753 304 GCF_000004075.3 site, termination site, and positive and negative chain), mainly includes phenotype-related lncRNA ID, species, as well as basic information of GWAS results (e. g. GWAS chromosome position, lncRNA initiation and termina- phenotypic traits, location of peak in genome, and P- tion sites, Positive and negative chains, regulated phe- value) is provided. Furthermore, LncPheDB also pro- notype, Peak Position, P-value of phenotype-SNP vides functional information on genes associated with correlation, mapped genes, and sequence of mapped lncRNAs and protein sequence information of genes in genes. In this module, we merge adjacent significant various species (by searching the SWISS-PROT data- SNPs whose distance is less than the species LD into a base), and the regulatory network information of single association signal based on the LD decay of each lncRNAs related to phenotypes (Fig. 2). species. The SNP with the minimum P value in a signal LncPheDB provides two search engines: the lncRNA region was considered to be the lead SNP. Finally, the search engine and the GWAS search engine. The lncRNA related lncRNA and mRNA were predicted according to module provides comprehensive lncRNA-phenotype the LD of each species. This module focuses on explor- correlation data in each species, which are created in ing the linkage among SNPs and the linkage between the form of columns into tables. Each correlation data SNPs and lncRNA or mRNA. There are also more Fig. 2 Database contents and functions of LncPheDB The Author(s) 2022 174 aBIOTECH (2022) 3:169–177 phenotypes highlighted in this module, such as: the genes of known and predicted lncRNAs by psRobot (Wu SNPs 201,770,002 (P = 3.65E-59), 201,770,047 et al. 2012), psMimic (Wu et al. 2013) and IntaRNA (P = 4.97E-07), and 201,770,048 (P = 3.65E-59) loca- (Mann et al. 2017), which were presented in the form of ted on chromosome 2 are significantly associated with regulatory networks, marked them with different colors, maize leaves, and the SNPs is located within the lncRNA and set three buttons, which allow users to hide cor- URS0000D75A41_4577.4871 (201,769,823–201,770, responding genes by clicking the corresponding but- 124). So we speculate that lncRNA URS0000D75A41_ tons. In addition to downloading the information from 4577.4871 may be associated with maize leaves. In the corresponding search page, users can also download addition, for lncRNAs of interest, users can use our the reference genome information for each species, the database for in-depth exploration. For instance, for lncRNA fasta sequence files, lncRNA Potential Encoding maize lncRNA EL0549, after selecting the maize species, File, lncRNA Expression File and the GFF files for if you enter lncRNA EL0549 and click ‘‘search’’, you can database construction via the download page. Moreover, easily find information regarding the position of lncRNA users can also download the GWAS information file EL0549, relevant GWAS information, and the informa- (such as associated phenotypic information, SNP, tion that EL0549 regulates maize’s flour fiber content, p-value, and information about studies) and the gene proline content, breakdown viscosity, flour fiber con- GFF file of each species. tent, flour protein content, ear infructescence position, and maize kernels. To further determine the biological processes between lncRNA and traits, such as maize DISCUSSION entrainment, protein content, and fiber concentration, among others, Users can click ‘‘Function’’ to view the With the development of sequencing technology in the functional information of genes associated with past few years, a large number of lncRNAs have been lncRNAs. Meanwhile, users can also click ‘‘Sequence’’ to identified and great progress has been made in the view the protein sequence of genes (Supplemental study of lncRNAs in plants. However, compared with the Fig. S1). By phenotype, lncRNA/Gene ID or GWAS locus lncRNAs in animals and humans, there is a very limited input, the GWAS module can be used to obtain pheno- understanding of lncRNAs in plants, especially in terms type-associated genes or lncRNAs for each species, of the mechanism of lncRNAs in regulating important genome-wide variant loci significantly associated with agronomic traits and affecting the yield and quality of phenotypes, correlation P values, etc. The correlation model plants (Heo et al. 2013; Liu et al. 2012;Mann data for this module are mainly obtained based on the et al. 2017; Xiao et al. 2009; Yang et al. 2014). With the deepening of research, some well-annotated databases, amplification of individual variant loci, emphasizing the relative position between the variant loci and the such as PLncDB V2.0 (Jin et al., 2011) and GREENC lncRNA or gene. In the GWAS module, users can explore (Gallart et al. 2016), have given comprehensive anno- the phenotypes of their interest. For instance, the key- tations to some basic information of lncRNAs, such as word ‘‘100 grain weight’’ can be used for maize (Sup- the position and sequence. Researchers have shifted plemental Fig. S2). All search results can be downloaded their focus from identifying new lncRNAs to the func- in the form of a list. The combination of this lncRNA tional research of lncRNAs. In recent years, researchers module and the GWAS module allows for a more com- have investigated the functions of lncRNAs in plants. prehensive genome-wide prediction of phenotypic traits However, at present, the identified lncRNAs whose that may be regulated by lncRNAs or gene. Meanwhile, regulatory mechanism has been clarified are less than we also added the JBrowse genome browser, which 1% (Quek et al. 2015). In addition, the research results allows users to intuitively search for the relative posi- of some lncRNAs provide a low reference value for the tion distribution of lncRNAs and genes on study of other lncRNAs due to the differences in types chromosomes. and functions of lncRNAs, which affect gene expression To study the sequence similarity, we designed a Blast in a wide range at different levels. Therefore, research- tool (version 2.12). By searching specific species in the ers’ understanding and research on lncRNAs are limited. whole database, the BLAST service enables users to At present, it is imperative to use a genome-wide search for similar lncRNA sequences. In the BLAST database to investigate the relationship between results, users can directly view the phenotypic traits lncRNAs and phenotypes and explore the potential related to lncRNAs with similar fragments by clicking regulatory mechanism of lncRNAs. the ‘‘Click here to search LncRNA: lncRNA ID’’ tab. To Compared with other plant lncRNA databases, enable users to view LncRNA and its regulated target LncPheDB focuses on exploring data resources about genes clearly and concisely, we predicted the target lncRNA-regulated phenotypes. Using standardized The Author(s) 2022 aBIOTECH (2022) 3:169–177 175 Acknowledgements We thank all the members who participated screening criteria, LncPheDB manually sorted a total of in the construction of this database. Thanks for the support of the 203,391 lncRNA sequences, 2000 phenotypes, and Key Laboratory of Grain Crop Genetic Resources Evaluation and 120,271 SNPs. Finally, it listed 68,862 lncRNA sequen- Utilization. ces that are associated with agronomic traits. And according to the study. The lncRNA osa-eTM160 (Osa- Authors’ contributions Danjing Lou is involved in conceptual- izing, writing and editing this manuscript. Xiaoming Zheng, eTM160 is a 688 bp long lncRNA transcribed Qingwen Yang, Qian Qian conceived the project. Danjing Lou, Fei between LOC_Os03g12815 and LOC_Os03g12820 of Li, Jinyue Ge, Weiya Fan, Ziran Liu, Yanyan Wang, Jingfen Huang, rice chromosome 3) in rice has a role in regulating rice Meng Xing, Wenlong Guo, Shizhuang Wang, Weihua Qiao and fertility and seed size by competitively binding Zhenyun Han analysed the data. OsmiR160 with OsARF18. However, the potential regu- Funding This work was supported by the National Key Research latory significance of lncRNA URS00008EDDE3_ and Development Program of China (2021YFD1200101 to Z.X.M.), 39947.4350 (also known as osa-eTM160) on rice seed the National Natural Science Foundation of China (31670211 and fertility, days to flowering, seed weight, arsenic accu- 31970237 to Z.X.M.), Sanya Yazhou Bay Science and Technology mulation, germination rate and grain Mn concentration City (SKJC-2020–02-001 to Z.X.M.), the Central Public-interest Scientific Institution Basal Research Fund (S2021ZD01 to Z.X.M.). is predicted in our database, which further confirms the significance of our database. Moreover, users can use the Data availability LncPheDB is freely available at https://www. lncRNA sequences they are investigating to conduct a lncphedb.com/. BLAST comparison with all species in the data resource to identify the conservative lncRNA-regulated pheno- Code availability This study involves database building code. types. Furthermore, LncPheDB also provides users with Declarations convenient browsing and search services. Thus, users can search lncRNAs correlation from various aspects, Conflict of interest Author declares no conflicts of interests. such as Gene ID, LncRNA ID, genome position, SNP, and phenotype. To help users explore the potential molec- Ethical approval This manuscript is not involved in any animal experiments. ular regulatory mechanism of lncRNAs in complex traits, we summarized and sorted out the target gene predic- Consent to participate Necessary approval is obtained. tion of lncRNAs and visually displayed it in the form of a regulation network. Users can hide or display the cor- Consent for publication Necessary approval is obtained. responding data by clicking different buttons. Open Access This article is licensed under a Creative Commons As a future perspective, by focusing on the study of Attribution 4.0 International License, which permits use, sharing, data resources regarding lncRNA-regulated phenotypes, adaptation, distribution and reproduction in any medium or for- we will add more lncRNA-related phenotypes for more mat, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons species. In addition, since we found that the number of licence, and indicate if changes were made. The images or other relevant studies was unexpectedly large when collecting third party material in this article are included in the article’s and sorting out data, we will sort out more data Creative Commons licence, unless indicated otherwise in a credit regarding lncRNA-regulated phenotypes with clear line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted regulatory mechanisms and predictions from existing by statutory regulation or exceeds the permitted use, you will studies and timely update the data resources. To further need to obtain permission directly from the copyright holder. To clarify the regulatory mechanism of lncRNAs, we will view a copy of this licence, visit http://creativecommons.org/ add more sequence information of miRNAs that are licenses/by/4.0/. complementary to lncRNAs and increase the tissue- specific expression information of lncRNAs. Meanwhile, to enrich the transcriptome information of rice, we will References add relevant transcriptome data in our research to facilitate scientific research and utilization. Notwith- Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, standing, we also encourage all researchers to submit Rinn JL (2011) Integrative annotation of human large their relevant studies via the contact page. We believe intergenic noncoding RNAs reveals global properties and that LncPheDB will provide assistance for the study of specific subclasses. Gene Dev 25:1915–1927. https://doi. org/10.1101/gad.17446611 the functions of lncRNAs. Cheetham SW, Gruhl F, Mattick JS, Dinger ME (2013) Long noncoding RNAs and the genetics of cancer. Brit J Cancer Supplementary InformationThe online version contains 108:2419–2425. https://doi.org/10.1038/bjc.2013.233 supplementary material available at https://doi.org/10.1007/ Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, s42994-022-00084-3. Guernec G, Martin D, Merkel A, Knowles DG, Lagarde J, The Author(s) 2022 176 aBIOTECH (2022) 3:169–177 Veeravalli L, Ruan X, Ruan Y, Lassmann T, Carninci P, Brown Kopp F, Mendell JT (2018) Functional classification and experi- JB, Lipovich L, Gonzalez JM, Thomas M, Davis CA, Shiekhattar mental dissection of long noncoding RNAs. Cell 172:393–407. R, Gingeras TR, Hubbard TJ, Notredame C, Harrow J, Guigo´R https://doi.org/10.1016/j.cell.2018.01.011 (2012) The GENCODE v7 catalog of human long noncoding Kung JTY, Colognori D, Lee JT (2013) Long noncoding RNAs: past, RNAs: analysis of their gene structure, evolution, and present, and future. Genetics 193:651–669. https://doi.org/ expression. Genome Res 22:1775–1789. https://doi.org/10. 10.1534/genetics.112.146704 1101/gr.132159.111 Lee JT (2009) Lessons from X-chromosome inactivation: long Fan Y, Yang J, Mathioni SM, Yu J, Shen J, Yang X, Wang L, Zhang Q, ncRNA as guides and tethers to the epigenome. Gene Dev Cai Z, Xu C, Li X, Xiao J, Meyers BC, Zhang Q (2016) PMS1T, 23:1831–1842. https://doi.org/10.1101/gad.1811209 producing phased small-interfering RNAs, regulates photope- Li A, Zhang J, Zhou Z (2014) PLEK: a tool for predicting long non- riod-sensitive male sterility in rice. PNAS 113:15144–15149. coding RNAs and messenger RNAs based on an improved https://doi.org/10.1073/pnas.1619159114 k-mer scheme. BMC Bioinformatics 15:311. https://doi.org/ Fang J, ZhangF,WangH,WangW,ZhaoF,LiZ,Sun C, Chen F, Xu F, 10.1186/1471-2105-15-311 ChangS,WuL,Bu Q,WangP,Xie J, Chen F, HuangX,Zhang Y, Liu J, Jung C, Xu J, Wang H, Deng S, Bernad L, Arenas-Huertero C, Zhu X, Han B, Deng X, Chu C (2019) Ef-cd locus shortens rice Chua N (2012) Genome-wide analysis uncovers regulation of maturity duration without yield penalty. PNAS long intergenic noncoding RNAs in Arabidopsis. Plant Cell 116:18717–18722. https://doi.org/10.1073/pnas.1815030116 24:4333–4345. https://doi.org/10.1105/tpc.112.102855 Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y, Loh P, Liu X, Li D, Zhang D, Yin D, Zhao Y, Ji C, Zhao X, Li X, He Q, Chen R, Anttila V, Xu H, Zang C, Farh K, Pipke S, Day FR, Consortium R, Hu S, Zhu L (2018) A novel antisense long noncoding RNA, Purcell S, Stahl E, Lindstrom S, Perry JRB, Okada Y, TWISTED LEAF, maintains leaf blade flattening by regulating Raychaudhuri S, Daly MJ, Patterson N, Neale BM, Price AL its associated sense R2R3-MYB gene in rice. New Phytol (2015) Partitioning heritability by functional annotation 218:774–788. https://doi.org/10.1111/nph.15023 using genome-wide association summary statistics. Nat Genet Mann M, Wright PR, Backofen R (2017) IntaRNA 2.0: enhanced 47:1228–1235. https://doi.org/10.1038/ng.3404 and customizable prediction of RNA-RNA interactions. Guttman M, Donaghey J, Carey BW, Garber M, Grenier JK, Munson Nucleic Acids Res 45:W435–W439. https://doi.org/10. G, Young G, Lucas AB, Ach R, Bruhn L, Yang X, Amit I, Meissner 1093/nar/gkx279 A, Regev A, Rinn JL, Root DE, Lander ES (2011) lincRNAs act Martianov I, Ramadass A, Serra Barros A, Chow N, Akoulitchev A in the circuitry controlling pluripotency and differentiation. (2007) Repression of the human dihydrofolate reductase Nature 477:295–300. https://doi.org/10.1038/nature10398 gene by a non-coding interfering transcript. Nature Guttman M, Rinn JL (2012) Modular regulatory principles of large 445:666–670. https://doi.org/10.1038/nature05519 non-coding RNAs. Nature 482:339–346. https://doi.org/10. Morris KV, Mattick JS (2014) The rise of regulatory RNA. Nat Rev 1038/nature10887 Genet 15:423–437. https://doi.org/10.1038/nrg3722 Heo JB, Lee Y, Sung S (2013) Epigenetic regulation by long Nagano T, Mitchell JA, Sanz LA, Pauler FM, Ferguson-Smith AC, Feil noncoding RNAs in plants. Chromosome Res 21:685–693. R, Fraser P (2008) The air noncoding RNA epigenetically https://doi.org/10.1007/s10577-013-9392-6 silences transcription by targeting G9a to chromatin. Science Huarte M, Guttman M, Feldser D, Garber M, Koziol MJ, Kenzel- 322:1717–1720. https://doi.org/10.1126/science.1163802 mann-Broz D, Khalil AM, Zuk O, Amit I, Rabani M, Attardi LD, Negri TDC, Alves WAL, Bugatti PH, Saito PTM, Domingues DS, Regev A, Lander ES, Jacks T, Rinn JL (2010) A large intergenic Paschoal AR (2019) Pattern recognition analysis on long noncoding RNA induced by p53 mediates global gene noncoding RNAs: a tool for prediction in plants. Brief repression in the p53 response. Cell 142:409–419. https:// Bioinform 20:682–689. https://doi.org/10.1093/bib/bby034 doi.org/10.3410/f.5523957.5491055 Osato N, Yamada H, Satoh K, Ooka H, Yamamoto M, Suzuki K, Jin G, Sun J, Isaacs SD, Wiley KE, Kim ST, Chu LW, Zhang Z, Zhao H, Kawai J, Carninci P, Ohtomo Y, Murakami K, Matsubara K, Zheng SL, Isaacs WB, Xu J (2011) Human polymorphisms at Kikuchi S, Hayashizaki Y (2003) Antisense transcripts with long non-coding RNAs (lncRNAs) and association with rice full-length cDNAs. Genome Biol 5:R5. https://doi.org/10. prostate cancer risk. Carcinogenesis 32:1655–1659. https:// 1186/gb-2003-5-1-r5 doi.org/10.1093/carcin/bgr187 Paytuvı´ Gallart A, Hermoso Pulido A, Lagra´n AMD, I, Sanseverino Jin J, Lu P, Xu Y, Li Z, Yu S, Liu J, Wang H, Chua N, Cao P (2021) W, Aiese Cigliano R, (2016) GREENC: a Wiki-based database PLncDB V2.0: a comprehensive encyclopedia of plant long of plant lncRNAs. Nucleic Acids Res 44:D1161–D1166. noncoding RNAs. Nucleic Acids Res 49:D1489–D1495. https://doi.org/10.1093/nar/gkv1215 https://doi.org/10.1093/nar/gkaa910 Pertea M, Pertea GM, Antonescu CM, Chang T, Mendell JT, Salzberg Katayama S, Tomaru Y, Kasukawa T, Waki K, Nakanishi M, SL (2015) StringTie enables improved reconstruction of a Nakamura M, Nishida H, Yap CC, Suzuki M, Kawai J, Suzuki transcriptome from RNA-seq reads. Nat Biotechnol H, Carninci P, Hayashizaki Y, Wells C, Frith M, Ravasi T, Pang 33:290–295. https://doi.org/10.1038/nbt.3122 KC, Hallinan J, Mattick J, Hume DA, Lipovich L, Batalov S, Quek XC, Thomson DW, Maag JLV, Bartonicek N, Signal B, Clark MB, Engstro¨m PG, Mizuno Y, Faghihi MA, Sandelin A, Chalk AM, Gloss BS, Dinger ME (2015) lncRNAdb v2.0: expanding the Mottagui-Tabar S, Liang Z, Lenhard B, Wahlestedt C (2005) reference database for functional long noncoding RNAs. Antisense transcription in the mammalian transcriptome. Nucleic Acids Res 43:D168–D173. https://doi.org/10.1093/ Science 309:1564–1566. https://doi.org/10.1126/science. nar/gku988 1112009 Rinn JL, Chang HY (2012) Genome regulation by long noncoding Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced RNAs. Annu Rev Biochem 81:145–166. https://doi.org/10. aligner with low memory requirements. Nat Methods 1146/annurev-biochem-051410-092902 12:357–360. https://doi.org/10.1038/nmeth.3317 Schaid DJ, Chen W, Larson NB (2018) From genome-wide Kong L, Zhang Y, Ye Z, Liu X, Zhao S, Wei L, Gao G (2007) CPC: associations to candidate causal variants by statistical fine- assess the protein-coding potential of transcripts using mapping. Nat Rev Genet 19:491–504. https://doi.org/10. sequence features and support vector machine. Nucleic Acids 1038/s41576-018-0016-z Res 35:W345–W349. https://doi.org/10.1093/nar/gkm391 The Author(s) 2022 aBIOTECH (2022) 3:169–177 177 Simopoulos CMA, Weretilnyk EA, Golding GB (2018) Prediction of in plants. Plant Physiol 161:1875–1884. https://doi.org/10. plant lncRNA by ensemble machine learning classifiers. BMC 1104/pp.113.215962 Genomics 19:316. https://doi.org/10.1186/s12864-018- Wu H, Yang L, Chen L (2017) The diversity of long noncoding 4665-2 RNAs and their generation. Trends Genet 33:540–552. Sleutels F, Zwart R, Barlow DP (2002) The non-coding Air RNA is https://doi.org/10.1016/j.tig.2017.05.004 required for silencing autosomal imprinted genes. Nature Xiao B, Zhang X, Li Y, Tang Z, Yang S, Mu Y, Cui W, Ao H, Li K (2009) 415:810–813. https://doi.org/10.1038/415810a Identification, bioinformatic analysis and expression profiling Sun X, Zheng H, Sui N (2018) Regulation mechanism of long non- of candidate mRNA-like non-coding RNAs in Sus scrofa. coding RNA in plant response to stress. Biochem Bioph Res J Genet Genomics 36:695–702. https://doi.org/10.1016/ Co 503:402–407. https://doi.org/10.1016/j.bbrc.2018.07. S1673-8527(08)60162-9 072 Xu S, Dong Q, Deng M, Lin D, Xiao J, Cheng P, Xing L, Niu Y, Gao C, Szczes´niak MW, Bryzghalov O, Ciomborowska-Basheer J, Zhang W, Xu Y, Chong K (2021) The vernalization-induced Makałowska I (2019) CANTATAdb 2.0: Expanding the Collec- long non-coding RNA VAS functions with the transcription tion of Plant Long Noncoding RNAs. In: Chekanova JA, Wang factor TaRF2b to promote TaVRN1 expression for flowering HV (eds) Plant Long Non-Coding RNAs: Methods and Proto- in hexaploid wheat. Mol Plant 14:1525–1538. https://doi. cols. New York, NY, Springer, New York, pp 415–429 org/10.1016/j.molp.2021.05.026 Terryn N, Rouze´ P (2000) The sense of naturally transcribed Yang G, Lu X, Yuan L (2014) LncRNA: a link between RNA and antisense RNAs in plants. Trends Plant Sci 5:394–396. cancer. Biochim Biophys Acta Gene Regul Mech https://doi.org/10.1016/S1360-1385(00)01696-4 1839:1097–1109. https://doi.org/10.1016/j.bbagrm.2014. The RC, Petrov AI, Kay SJE, Kalvari I, Howe KL, Gray KA, Bruford 08.012 EA, Kersey PJ, Cochrane G, Finn RD, Bateman A, Kozomara A, Zhang Y, Liu XS, Liu Q, Wei L (2006) Genome-wide in silico Griffiths-Jones S, Frankish A, Zwieb CW, Lau BY, Williams KP, identification and analysis of cis natural antisense transcripts Chan PP, Lowe TM, Cannone JJ, Gutell R, Machnicka MA, (cis -NATs) in ten species. Nucleic Acids Res 34:3465–3475. Bujnicki JM, Yoshihama M, Kenmochi N, Chai B, Cole JR, https://doi.org/10.1093/nar/gkl473 Szymanski M, Karlowski WM, Wood V, Huala E, Berardini TZ, Zhang Y, Liao J, Li Z, Yu Y, Zhang J, Li Q, Qu L, Shu W, Chen Y (2014) Zhao Y, Chen R, Zhu W, Paraskevopoulou MD, Vlachos IS, Genome-wide screening and functional analysis identify a Hatzigeorgiou AG, Ma L, Zhang Z, Puetz J, Stadler PF, large number of long noncoding RNAs involved in the sexual McDonald D, Basu S, Fey P, Engel SR, Cherry JM, Volders P, reproduction of rice. Genome Biol 15:512. https://doi.org/10. Mestdagh P, Wower J, Clark MB, Quek XC, Dinger ME (2017) 1186/s13059-014-0512-1 RNAcentral: a comprehensive database of non-coding RNA Zhang Z, Xu Y, Yang F, Xiao B, Li G (2021) RiceLncPedia: a sequences. Nucleic Acids Res 45:D128–D134. https://doi. comprehensive database of rice long non-coding RNAs. Plant org/10.1093/nar/gkw1008 Biotechnol J 19:1492–1494. https://doi.org/10.1111/pbi. Uchida S, Dimmeler S (2015) Long noncoding RNAs in cardiovas- 13639 cular diseases. Circ Res 116:737–750. https://doi.org/10. Zhang Y, Tao Y, Liao Q (2018) Long noncoding RNA: a crosslink in 1161/CIRCRESAHA.116.302521 biological regulatory network. Brief Bioinformatics Wang X, Gaasterland T, Chua N (2005) Genome-wide prediction 19:930–945. https://doi.org/10.1093/bib/bbx042 and identification of cis-natural antisense transcripts in Zhao X, Li J, Lian B, Gu H, Li Y, Qi Y (2018) Global identification of Arabidopsis thaliana. Genome Biol 6:R30. https://doi.org/ Arabidopsis lncRNAs reveals the regulation of MAF4 by a 10.1186/gb-2005-6-4-r30 natural antisense RNA. NC 9:1–12 Wang Y, Luo X, Sun F, Hu J, Zha X, Su W, Yang J (2018) Zhou B, Zhao H, Yu J, Guo C, Dou X, Song F, Hu G, Cao Z, Qu Y, Yang Overexpressing lncRNA LAIR increases grain yield and Y, Zhou Y, Wang J (2018) EVLncRNAs: a manually curated regulates neighbouring gene cluster expression in rice. NC database for long non-coding RNAs validated by low- 9:1–9 throughput experiments. Nucleic Acids Res 46:D100–D105. Wu H, Ma Y, Chen T, Wang M, Wang X (2012) PsRobot: a web- https://doi.org/10.1093/nar/gkx677 based plant small RNA meta-analysis toolbox. Nucleic Acids Zhu D, Deng XW (2012) A non-coding RNA locus mediates Res 40:W22–W28. https://doi.org/10.1093/nar/gks554 environment-conditioned male sterility in rice. Cell Res Wu H, Wang Z, Wang M, Wang X (2013) Widespread long 22:791–792. https://doi.org/10.1038/cr.2012.43 noncoding RNAs as endogenous target mimics for microRNAs The Author(s) 2022

Journal

aBIOTECHSpringer Journals

Published: Sep 1, 2022

Keywords: LncRNA; GWAS; Phenotype; SNP; Plants

References