Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

QPromoters: sequence based prediction of promoter strength in Saccharomyces cerevisiae

QPromoters: sequence based prediction of promoter strength in Saccharomyces cerevisiae ALL LIFE 2023, VOL. 16, NO. 1, 2168304 https://doi.org/10.1080/26895293.2023.2168304 QPromoters: sequence based prediction of promoter strength in Saccharomyces cesrevisiae a b c a Devang Haresh Liya , Mirudula Elanchezhian , Mukulika Pahari , Nithishwer Mouroug Anand , Shivani d e f,g Suresh , Nivedha Balaji and Ashwin Kumar Jainarayanan a b Department of Physical Sciences, Indian Institute of Science Education and Research, Mohali, India; Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India; Department of Computer Engineering, Ramrao Adik Institute of Technology, DY Patil Deemed to be University, Navi Mumbai, India; Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield, Sheffield, e f UK; School of Biology and Environmental Sciences (SBES), University College Dublin, Dublin, Ireland; Kennedy Institute of Rheumatology, University of Oxford, Oxford, UK; Interdisciplinary Bioscience Doctoral Training Program and Exeter College, University of Oxford, Oxford, UK ABSTRACT ARTICLE HISTORY Received 12 November 2021 Promoters play a key role in influencing transcriptional regulation for fine-tuning the expression of Accepted 10 December 2022 genes. Heterologous promoter engineering has been a widely used concept to control the level of transcription in all model organisms. The strength of a promoter is mainly determined by its KEYWORDS nucleotide composition. Many promoter libraries have been curated, but few have attempted to Computational life sciences; develop theoretical methods to predict the strength of promoters from their nucleotide sequence. bioinformatics and system Such theoretical methods are not only valuable in the design of promoters with specified strength biology but are also meaningful in understanding the mechanistic role of promoters in transcriptional regulation. In this study, we present a theoretical model to describe the relationship between promoter strength and nucleotide sequence in Saccharomyces cerevisiae. We infer from our analy- sis that the −49–10 sequence with respect to the Transcription Start Site represents the minimal region that can be used to predict promoter strength. https://qpromoters.com/ and a standalone tool https://github.com/DevangLiya/QPromoters to quickly quantify the strength of Saccharomyces cerevisiae promoters. an open-source script that can be utilized to quan- Author summary tify promoter strength in Saccharomyces cerevisiae and Regulating gene expression is a crucial aspect of streamline the process of promoter design. metabolic engineering and synthetic biology. Pro- moter engineering plays a vital part in modulating Introduction transcriptional capacity and hence controlling gene expression. While there are tools to identify promoter Saccharomyces cerevisiae (S. cerevisiae), commonly regions in the eukaryotic genome, there are no simple known as brewer’s yeast, is a widely used eukary- tools to predict the strength of promoters in eukary- otic model organism in synthetic biology – it has otes. Previous studies have shown that there exists a applications in the production of biofuels, recombi- relationship between the promoter strength and the nant proteins and bulk chemicals (Nevoigt, 2008;Tang natural log of the promoter score. We use this rela- et al. 2020). Promoters are basic transcriptional ele- tionship to identify the minimal promoter region in mentsthatplayakeyroleinmanipulatinggenetic Saccharomyces cerevisiaethatcanbeusedtopredictthe and metabolic pathways in S. cerevisiae by the reg- strength of promoters. We have used a set of 18 stan- ulation of protein expression both quantitatively and dard promoters whose strengths were experimentally temporally (Scalcinati et al. 2012; Latimer et al. 2014) determined in previous studies to verify our model. and are one of the most crucial component of yeast We were able to classify promoters into three broad synthetic biology toolbox (Hubmann et al. 2014;Red- classes, namely weak,moderate,and strong,withhigh den et al. 2015;Machens et al. 2017;Portela et al. 2017; confidence. We have also developed a website and Ottoz and Rudolf 2018;Decoene et al. 2019; Kotopka CONTACT Ashwin Kumar Jainarayanan ashwin.jainarayanan@dtc.ox.ac.uk Supplemental data for this article can be accessed here. https://doi.org/10.1080/26895293.2023.2168304 © 2023 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. 2 D. HARESH ET AL. and Smolke 2020;Liu et al. 2020;Fengand Marchisio transcription activators and repressors, respectively. 2021). Promoters in S. cerevisiae have multiple com- The UAS enhances gene expression, provides addi- ponents which together account for successful tran- tional stability, and plays a role in regulating the PIC scriptional regulation. The key components of a yeast formation process (West et al. 1984;Bitteretal. 1991). promoter are an upstream activator sequence (UAS), The UASs and URSs in S. cerevisiae are typically 10 bp an upstream repressor sequence (URS), a nucleosome- long butcan vary from 5to30bpinlength(Stewart disfavoring sequence and a core promoter region. et al. 2012). The core promoter is the DNA sequence nearest to The disfavoring nucleotide sequence is a stretch of the start codon, which interacts with RNA polymerase DNA that decreases nucleosome occupancy to facili- II (pol-II) and other general transcriptional factors tate transcription (Struhl and Segal 2013). Poly(dA:dT) to form the pre-initiation complex (PIC) (Tang et al. tract, a homopolymeric stretch of deoxyadenosine 2020). Thecoreregionalsocontainsthe TATA box, nucleotides, is a well-known nucleosome-disfavoring the transcription start site (TSS), a PIC localization sequence commonly present in promoters (Workman stretch and a TSS scanning region for pol-II (Lubliner 2006). et al. 2013). The binding of general transcription factor The structural properties of the promoter are vital proteins andhistonestothe TATA boxfacilitates the for successful transcription. The flexibility of the pro- subsequent binding of pol-II, which along with several motershouldbeoptimal to make sure that thebinding transcription factor proteins, constructs a transcrip- sitesare accessible andproperlypositionedtoenable tion initiation complex that starts the mRNA synthesis their recognition by transcriptional machinery (Jiang from the TSS (Kanhere and Bansal 2005;Jiang and and Pugh 2009). In this regard, the bendability, or the Pugh 2009). The nucleotide composition of different propensity of each tri-nucleotide to bend, is essential regions in the core promoter strongly influences the (Kanhere and Bansal 2005). Existing studies indicate sensitivity of the promoter. Studies have shown that the presence of regions of low bendability about 100- promoters with A/T – or T/C-rich PIC regions have 200 bp upstream to the start codon (Miele et al. 2008) higher sensitivities than promoters containing G/C- (illustrated in Figure 1 by a jagged line). Studies also rich sequences (Lubliner et al. 2015). The position of indicate that the low bendability is caused by a com- the different regions of the core promoter is illustrated bination of A/T richness and di- and tri-nucleotide in Figure 1. composition (Akan and Deloukas 2008). The UAS and URS are the regulatory components of Promoters in S. cerevisiae canbeeitherconstitutive, a promoter and are located upstream to the core pro- that are relatively unaeff cted by internal and external moter region. UASs and URSs act as binding sites for signals and maintain stable levels of transcription, or Figure 1. A schematic of promoter architecture in S. cerevisiae: The text in crimson (top) denotes the conditions necessary for high sen- sitivity, while the green text (bottom) denotes the conditions for lower sensitivity. The length of the different regions of the promoter are also given in bp. The jagged line at the bottom denotes the part of the promoter that is rigid in nature. ALL LIFE 3 inducible, which can initiate a drastic change in tran- to calculate the ‘promoter score’ for all the promoters scriptional levels in response to specific stimuli. These in the downstream analysis. stimuli, called inducers, range from molecules such as Taking inspiration from the well-established lin- metabolites, amino acids, and sugars to metal ions and ear relationship between the total promoter score and environmental factors like pH and stress (Weinhandl the promoter strength in E. coli (Berg and von Hip- et al. 2014;Gasseretal. 2015;Kim et al. 2015;Fischer pel 1987), we modeled the promoter strength using a et al. 2016;Rajkumaretal. 2016). Using endogenous linear model with the promoter score as, promoters of S. cerevisiae for synthetic biology appli- Promoter strength = C0+ C1× (Promoter score) cationshas disadvantagesowing to an insucffi iencyof well-characterized promoters (Chen et al. 2018;Zhou Lee et al. have characterized the strength of 19 con- et al. 2018). Thus,itisofutmostimportancetocharac- stitutive promoters in S. cerevisiae using three u fl o- terize and quantify the strength of various S. cerevisiae rescence markers (Venus, mRuby2, and mTurquoise2) promoters and create a database of the same. (Lee et al. 2015). We have used 18 of these promoter Previous work has established a two-step approach strengths in this study. The sequence for the pro- for the quantitative prediction of the strength of pro- moter pREV1 was not found in EPD, due to which moters in Escherichia coli (E. coli), a prokaryotic we have dropped it from our analysis. The normalized model organism (Li and Zhang 2014;Bharanikumar u fl orescence values folded over the background were et al. 2018). Thelinearrelationshipbetween thetotal obtained from the authors of Lee et al. (2015). The log promoter score and the promoter strength is well- ofthesevaluesgivesthepromoterstrength.Thesewere established in E. coli (Berg and von Hippel 1987;Bha- further divided by the strength of the strongest pro- ranikumar et al. 2018). In this study, we have presented moter(pTDH3) from Leeetal. (2015). We note that a similar simplified model of the promoter strength in this step does not alter the linear relationship that is S. cerevisiae based on the promoter sequence. being tested but merely acts as a scaling. These val- We have done an extensive literature survey, and ues finally constitute the result space or the ‘Promoter we observethatthere areseveralestablished methods strength’inEq. 1. We also have used anotherset of to quantify the strength of promoters (Rhodius Virgil vfi e promoters from Decoene et al. ( 2019), as the train- and Mutalik Vivek 2010;Yadaetal. 2011;Bharaniku- ing set to further test the robustness of our model mar et al. 2018;Hayat et al. 2020;Zhaoetal. 2020;Li (Decoene et al. 2019). These uo fl rescence values were et al. 2022;Zhaoetal. 2022). We would like to high- also subjectedtothe normalizations describedabove. light the fact that most of these methods deal with E. We define a ‘segment score’ which is simply the total coli as the model organism. Our study is the only study score of a given segment of a given promoter as cal- that deals with the promoter sequences associated with culated from PSSM. This score is then divided by the eukaryotes to the best of our knowledge. highest score (corresponding to pTDH3) to obtain the feature space or ‘Promoter score’ part of the Eq. 1. We then performed Ordinary Linear Regression Materials and methods (OLS) using the statsmodels package in python. C0 The core promoter sequences of 5117 promoters in S. and C1 were left as free parameters to obtain the best cerevisiae were retrieved using the Sequence Retrieval tfi . The quality of tfi was then assessed using reduced in Tool Eukaryotic Promoter Database (EPD) (Dreos r-squared and F-statistic. The significance of model et al. 2017). The core promoter sequence consisted of parameters were assessed using the t-statistic. We also −49–10 sequences with reference to the Transcription tfi tedalinear modeltothe residues to look forbiases Start Site (TSS). in the model. Scipy, statsmodels and Seaborn pack- A Position Frequency Matrix (PFM) was generated ages were used to perform, visualize, and test the linear from the motif of all 5117 promoter sequences. The regression (Seabold and Perktold 2010). PFM was then converted to Position Weight Matrix (PWM) or Position-Specific Scoring Matrix (PSSM) Results and discussion using the functions from biopython (Cock et al. 2009). The motif landscape was visualized using WebLogo The 5117 native S. cerevisiae promoters from the EPD (Crooks et al. 2004). The resulting PSSM was then used that were included in this study represent a diverse 4 D. HARESH ET AL. Figure 2. Motif of −49–10 region: Motif logo generated from all 5117 S. cerevisiae promoters from Eukaryotic Promoter Database. The motif is generated for −49–10 sequence with respect to the Transcription Start Site. population of transcriptional regulators. Motif analy- sis on this set, shown in Figure 2,revealedthatthe promoters were diverse in terms of nucleotide compo- sition. We see that the conservation along the entire promoter length is low, except for the TSS. Based on this, there are two possible regions or segments of the −49–10 sequence that can be considered for modeling the linear relationship in Eq. 1. 1) Highly conserved −9–1segment,(2) −49 toXsegment whereXvaries from −48 to 10. Results of modelling the promoter strength using these segments are discussed in the fol- lowing sections along with their possible biological implications. Figure 3. Various fit statistics for the linear regression of segment scores against the mRuby2 fluorescence: One of the ends of the promoter is fixed at −49 and nucleotides are added on the other −49 to X region end towards the TSS. The values of R-squared, Adj. R-squared, and p-value for F-statistic are tabulated in Table S1. Similar plots for We sought to determine the shortest sequence that Venus and mTurquoise2 fluorescence are given in Fig S2. could model the relationship between experimental promoter strength and the segment scores. We first fix further analysis focused on this 60 bp segment to ease one end of such a segment at−49 position with respect the integration with the EPD. Figure 4 shows the to TSS and add nucleotides towards TSS until posi- plot of normalized −49–10 scores and normalized tion ‘X’. Scores of the segments thus obtained were mRuby2 u fl orescence along with the best tfi model. used to perform a linear regression described by the We seethatthe residues forthismodel arerandomly Eq.1.The qualityoffitestimatorsfor each such regres- distributed around 0, indicating that the errors are sion is shown in Figure 3. It was observed that the uncorrelated, and the quantile-quantile plot shows that quality of tfi generally improves as more and more errors are normally distributed, thus validating all the nucleotides are added. This is indicated by larger R- assumptions for linear regression. Similar trends were squared values and lower p-values for F-statistic. A observed using Venus and mTurquoise2 u fl orescence saturation is reached at X =−1, after which adding as seen in Figures S2 and S3, respectively. more nucleotides does not improve the quality of tfi by an appreciable amount, as shown in Figure 3 and S2). This saturation is sustained until X = 10, indi- −9–1 region cating that the quality of tfi from −49 to −1and −49–10 is mostly similar (Figures S4–S6). Since the Motif analysis shown in Figure 2 showed that the−9–1 −49–10 sequence was readily available in the EPD, region to be the most conserved stretch. Previous work ALL LIFE 5 Figure 4. Best fit model for −49–10 segment: (A) Plot of normalized −49–10 score and normalized mRuby2 fluorescence along with the best fit model. (B) Residues obtained from the best fit model. (C) Quantile-Quantile plot of residues against normally distributed theoretical quantiles. Figure 5. Best fit model for−9–1 segment: (A) Plot of normalized−9–1 score and normalized mRuby2 fluorescence along with the best fit model. (B) Residues obtained from the best fit model. (C) Quantile-Quantile plot of residues against normally distributed quantiles. suggests that conserved sequences contribute signifi- cantly to the binding specificity of the promoter region (Berg and von Hippel 1987). As mentioned earlier, a high binding specificity is indicative of high pro- moter strength. Consequently, we sought to determine whether this hypothesis holds true for the promoters included in our analysis. The quality of tfi observed in this case, however, is extremely poor as seen in the Figure 5.The p-value of F-statistic is 0.54 for regres- sion usingmRuby2strengths.Therefore,wecannot reject the null hypothesis that there is no relation- Figure 6. Best fit values of model parameters: The best fit values ship between −9–1 score and promoter strength. This of C0 and C1 obtained using different fluorescence data as a proxy demonstrates that the above argument is not a domi- for promoter strength. The black dot shows the mean value of C0 nant mechanism in defining the promoter strength. and C1 weighted by error bars. The data corresponding to this plot can be found in Table S2. Values of model parameters alongwiththeir errors.ValuesofC0and C1 obtained We found that the value of intercept C0 in the model using mTorquoise2 u fl orescence were slightly different is closetozerofor thebestfitmodel andthe valueof from those obtained using Venus or mRuby2 u fl ores- slope C1 was between 0.8 and 1.1. Figure 6 shows the cence. These differences likely stem from the stochas- best fit values of C0 and C1 for different fluorescence tic gene expression and noisy uo fl rescence signals in 6 D. HARESH ET AL. Figure 7. A sample output from QPromoters application: (A)−49–10 score of the user’s promoter is shown as a horizontal line on a plot of scores of the characterized promoters from Lee et al. (2015) (B) Arrow shows the score of the user’s promoter in reference to scores of all 5117 S. cerevisiae promoters in EPD. the experiments. However, they agree with each other Conclusion within error bars. The value of C1 using the Decoene Weanalyzedthecorepromoterregionof−49–10, with et al. (2019)dataset waswithinthe 1-sigmaerror bars respecttoTSS, to findasimple correlationbetween of Venus and mRuby2 u fl orescence. The value of C0 the score of this region and the experimental promoter dieff rs significantly for this dataset, but we note that activity. Analysis of the core promoter region revealed this is likely due to different experimental conditions that there is a correlation between the promoter score anddoesnotchangetherelativestrengthsofotherpro- and experimental promoter strength. Particularly, the moters (Figures S7 and S8). The mean values of C0 and −49–10 region of the core promoter was seen to be C1 are 0.025 and 0.87, respectively. the best predictor of the promoter strength. We also observed a similar quality of tfi between the −49 to QPromoters software −1and −40–10 regions. The biological basis and sig- nificance of this sustained quality of tfi needs further We have developed an open-source, free-to-use stan- investigation. In addition to these findings, we have dalone tool andawebsitetouse ourfindings to predict developed an open-source, free-to-use tool to predict the strength of S. cerevisiae promoters. The standalone the promoter strength of unknown promoters in S. tool can be found at https://github.com/DevangLiya/ cerevisiae. QPromoters, and the website can be found at https:// Using computational tools to determine the essen- qpromoters.com/.Users caneitherenter the −49–10 tiality of genes and strength of gene regulatory sequence of the engineered promoter or can retrieve elements is of significant use in synthetic biology, as this sequence directly from EPD by entering the this tool will help in constructing recombinant circuits. EPDnew ID of the promoter. The tool then outputs This tool would also be helpful in experiments where the promoter score, promoter score normalized by fine-tuned regulation of gene expression is required pTDH3score,promoterstrengthusing themodel and in studies involving transcription kinetics where described by equation Eq. 1, a plot showing the loca- characterizing promoter strength might be required. tion of the user’s promoter with reference to the 18 Consequently, these in-silico methods can precede characterized promoters, and a histogram showing and lower the risk of failure in in vivo experiments. the location of the user’s promoter with reference to Moreover, our web tool is useful in characterizing all 5117 EPD promoters. An example of the figure the strength of existing promoters in the EPD as returned by the program is shown in Figure 7. ALL LIFE 7 well as predicting the strength of other engineered References S. cerevisiae promoters on the basis of the promoter Akan P, Deloukas P. 2008. DNA sequence and structural prop- sequence. erties as predictors of human and mouse promoters. Gene. 410:165–176. doi:10.1016/j.gene.2007.12.011. Berg OG, von Hippel PH. 1987. Selection of DNA binding Acknowledgments sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. Journal of The authors thank Shubham Kumar Sinha and Swaroopa Molecular Biology. 193:723–743. doi:10.1016/0022-2836(87) Nakkeeran from the Indian Institute of Science Education 90354-8. and Research, Mohali, for their comments and suggestions Bharanikumar R, Premkumar KAR, Palaniappan A. 2018.Pro- throughout the work. AKJ is supported by Clarendon Fund moterPredict: sequence-based modelling of Escherichia coli (http://www.ox.ac.uk/clarendon/about), SKP scholarship, Exe- sigma(70) promoter strength yields logarithmic dependence ter College (https://www.exeter.ox.ac.uk/wp-content/uploads/ between promoter strength and sequence. PeerJ. 6:e5862. 2019/09/SKP-2020.pdf) and UKRI-BBSRC grant BB/M011- doi:10.7717/peerj.5862. 224/1, Oxford Interdisciplinary Bioscience DTP at the Univer- Bitter GA, Chang KK, Egan KM. 1991. A multi-component sity of Oxford. DHL and NMA are supported by the INSPIRE upstream activation sequence of the Saccharomyces cere- scholarship (https://www.online-inspire.gov.in/). AKJ concep- visiae glyceraldehyde-3-phosphate dehydrogenase gene pro- tualized and designed the project. DHL curated the data moter. Molecular and General Genetics MGG. 231:22–32. and worked on the formal analysis to quantify the promoter doi:10.1007/BF00293817. strength. SS helped organizing and standardizing the datasets. Chen X, GaoC,Guo L, Hu G, LuoQ,Liu J, NielsenJ, NMA and NB did the promoter landscape analysis and visu- Liu L. 2018. Dceo biotechnology: tools to design, con- alizations. ME and SS mined the S. cerevisiae promoters’ flu- struct, evaluate, and optimize the metabolic pathway for orescence data from published datasets and resources for val- biosynthesis of chemicals. Chemical Reviews. 118:4–72. idating the tool. MP designed web implementation and pro- doi:10.1021/acs.chemrev.6b00804. videdtechnical assistance.MEhelpedintesting thetooland Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, improving the user experience. AKJ, SS, DHL, and ME wrote Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon the manuscript. All the authors have read and approved this MJL. 2009. Biopython: freely available Python tools for com- manuscript. putational molecular biology and bioinformatics. Bioinfor- matics. 25:1422–1423. doi:10.1093/bioinformatics/btp163. Crooks GE, Hon G, Chandonia JM, Brenner SE. 2004. Disclosure statement WebLogo: a sequence logo generator. Genome Research. No potential conflict of interest was reported by the author(s). 14:1188–1190. doi:10.1101/gr.849004. DecoeneT,DeMaeseneireSL, De MeyM. 2019.Modulating transcription through development of semi-synthetic yeast core promoters. PLOS ONE. 14:e0224476. doi:10.1371/jour Funding nal.pone.0224476. The authors did not receive funding from any source for this Dreos R, Ambrosini G, Groux R, Cavin Perier R, Bucher P. work and was carried out of scientific interest. 2017. The eukaryotic promoter database in its 30th year: focus on non-vertebrate organisms. Nucleic Acids Research. 45:D51–D55. doi:10.1093/nar/gkw1069. Data availability Feng X, Marchisio MA. 2021. Saccharomyces cerevisiae pro- moter engineering before and during the synthetic biology The uo fl rescence data used in this work are openly era. Biology (Basel). 10(6):504. doi:10.3390/biology10060 available in the following publications: Lee et al. (2015) and Decoene et al. (2019) that issue datasets with FischerS,EngstlerC,ProcopioS,BeckerT. 2016.EGFP- based evaluation of temperature inducible native promot- DOIs. The standalone tool for predicting the pro- ers of industrial ale yeast by using a high throughput sys- moter strength is open-source and available at https:// tem. LWT - Food Science and Technology. 68:556–562. github.com/DevangLiya/QPromoters.The online ver- doi:10.1016/j.lwt.2015.12.020. sion of this tool can be accessed at https://qpromoters. Gasser B, Steiger MG, Mattanovich D. 2015.Methanolreg- com/. ulated yeast promoters: production vehicles and toolbox for synthetic biology. Microbial Cell Factories. 14:196. doi:10.1186/s12934-015-0387-1. ORCID HayatM,Gul S, ChongKT. 2020. An intelligent computational Nithishwer Mouroug Anand http://orcid.org/0000-0003- model for prediction of promoters and their strength via 0852-7141 natural language processing. Chemometrics and Intelligent 8 D. HARESH ET AL. Laboratory Systems. 202:104034. doi:10.1016/j.chemolab. Nevoigt, E. 2008. Progress in metabolic engineering of Sac- 2020.104034. charomyces cerevisiae. Microbiology and Molecular Biology HubmannG,Thevelein JM,Nevoigt E. 2014.Natural andmod- Reviews. 72:379–412. doi:10.1128/MMBR.00025-07. ified promoters for tailored metabolic engineering of the Ottoz DSM, Rudolf F. 2018. Constitutive and regulated pro- yeast Saccharomyces cerevisiae. Methods Molecular Biology. moters in yeast: how to design and make use of promot- 1152:17–42. doi:10.1007/978-1-4939-0563-8_2. ers in S. cerevisiae. Synthetic Biology. 107–130. doi:10.1002/ JiangC,PughBF. 2009. Nucleosome positioning and gene reg- 9783527688104.ch6. ulation: advances through genomics. Nature Reviews Genet- PortelaRMC,VoglT,KnielyC,Fischer JE,OliveiraR,Glieder ics. 10:161–172. doi:10.1038/nrg2522. A. 2017. Synthetic core promoters as universal parts for fine- Kanhere A, Bansal M. 2005. Structural properties of pro- tuning expression in different yeast species. ACS Synthetic moters: similarities and differences between prokaryotes Biology. 6:471–484. doi:10.1021/acssynbio.6b00178. and eukaryotes. Nucleic Acids Research. 33:3165–3175. Rajkumar AS, Liu G, Bergenholm D, Arsovska D, Kristensen doi:10.1093/nar/gki627. M, Nielsen J, Jensen MK, Keasling JD. 2016. Engineering of Kim S, Lee K, Bae SJ, Hahn JS. 2015. Promoters inducible synthetic, stress-responsive yeast promoters. Nucleic Acids by aromatic amino acids and γ -aminobutyrate (GABA) Research. 44:e136. doi:10.1093/nar/gkw553. for metabolic engineering applications in Saccharomyces Redden H, Morse N, Alper HS. 2015. Editorial: Yeast synthetic cerevisiae. oApplied Microbiology and Biotechnology. 99: biology: new tools to unlock cellular function. FEMS Yeast 2705–2714. doi:10.1007/s00253-014-6303-5. Research. 15:1–1. doi:10.1111/1567-1364.12188. Kotopka BJ, Smolke CD. 2020. Model-driven generation of arti- Rhodius Virgil A, Mutalik Vivek K. 2010. Predicting strength ficial yeast promoters. Nat Commun. 11:2113. doi:10.1038/ andfunctionfor promotersofthe Escherichia coli alternative s41467-020-15977-4. sigma factor, σ E. Proceedings of the National Academy of Latimer LN, et al. 2014.Employing acombinatorial expres- Sciences. 107:2854–2859. doi:10.1073/pnas.0915066107. sion approach to characterize xylose utilization in Sac- Scalcinati G, Knuf C, Partow S, Chen Y, Maury J, Schalk charomyces cerevisiae. Metabolic Engineering. 25:20–29. M, Daviet L, Nielsen J, Siewers V. 2012.Dynamic con- doi:10.1016/j.ymben.2014.06.002. trol of gene expression in Saccharomyces cerevisiae engi- Lee ME, DeLoache WC, Cervantes B, Dueber JE. 2015.Ahighly neered for the production of plant sesquitepene α-santalene characterized yeast toolkit for modular, multipart assem- in a fed-batch mode. Metabolic Engineering. 14:91–103. bly. ACS Synthetic Biology. 4:975–986. doi:10.1021/sb5003 doi:10.1016/j.ymben.2012.01.007. 66v. Seabold S, Perktold J. 2010. Statsmodels: econometric and sta- Li H, Shi L, Gao W, Zhang Z, Zhang L, Zhao Y, Wang G. 2022. tistical modeling with python. Proceedings of the 9th Python dPromoter-XGBoost: Detecting promoters and strength by in Science Conference 2010. combining multiple descriptors and feature selection using Stewart AJ, Hannenhalli S, Plotkin JB. 2012.Why transcrip- XGBoost. Methods. 215–222. doi:10.1016/j.ymeth.2022.01. tion factor binding sites are ten nucleotides long. Genetics. 001. 192:973–985. doi:10.1534/genetics.112.143370. Li J, Zhang Y. 2014. Relationship between promoter sequence Struhl K, Segal E. 2013. Determinants of nucleosome posi- and its strength in gene expression. The European Physical tioning. Nature Structural & Molecular Biology. 20:267–273. Journal E. 37:44. doi:10.1140/epje/i2014-14044-y. doi:10.1038/nsmb.2506. Liu R, Liu L, Li X, Liu D, Yuan Y. 2020. Engineering Tang H, Wu Y, Deng J, Chen N, Zheng Z, Wei Y, Luo X, yeast artificial core promoter with designated base motifs. Keasling JD, et al. 2020. Promoter architecture and pro- Microbial Cell Factories. 19:38. doi:10.1186/s12934-020-013 moter engineering in Saccharomyces cerevisiae.Metabolites. 05-4. 10:320. doi:10.3390/metabo10080320. Lubliner S, et al. 2015.Corepromotersequenceinyeast is a Weinhandl K, Winkler M, Glieder A, Camattari A. 2014.Car- major determinant of expression level. Genome Research. bon source dependent promoters in yeasts. Microbial Cell 25:1008–1017. doi:10.1101/gr.188193.114. Factories. 13:5. doi:10.1186/1475-2859-13-5. Lubliner S, Keren L, Segal E. 2013.Sequencefeaturesofyeast West RW Jr., Yocum RR, Ptashne M. 1984. Saccharomyces cere- and human core promoters that are predictive of maximal visiae GAL1-GAL10 divergent promoter region: location promoter activity. Nucleic Acids Research. 41:5569–5581. and function of the upstream activating sequence UASG. doi:10.1093/nar/gkt256. Molecular and Cellular Biology. 4:2467–2478. doi:10.1128/ Machens F, Balazadeh S, Mueller-Roeber B, Messerschmidt mcb.4.11.2467-2478.1984. K. 2017. Synthetic promoters and transcription factors Workman JL. 2006. Nucleosome displacement in transcription. for heterologous protein expression in Saccharomyces cere- Genes & Development. 20:2009–2017. doi:10.1101/gad.143 visiae. Frontiers in Bioengineering and Biotechnology. 5:63. 5706. doi:10.3389/fbioe.2017.00063 . Yada T, et al. 2011. Linear regression models predicting strength Miele V, Vaillant C, d’Aubenton-Carafa Y, Thermes C, Grange of transcriptional activity of promoters. Genome Informat- T. 2008. DNA physical properties determine nucleosome ics. 25:53–60. doi:10.11234/gi.25.53. occupancy from yeast to yfl . Nucleic Acids Research. Zhao M, Yuan Z, Wu L, Zhou S, Deng Y. 2022.Precise pre- 36:3746–3756. doi:10.1093/nar/gkn262. diction of promoter strength based on a de novo synthetic ALL LIFE 9 promoter library coupled with machine learning. ACS Syn- Zhou Y, Li G, Dong J, Xing X-h, Dai J, Zhang C. 2018. thetic Biology. 11:92–102. doi:10.1021/acssynbio.1c00117. MiYA, an efficient machine-learning workflow in conjunc- Zhao M, Zhou S, Wu L, Deng Y. 2020.Model-drivenpromoter tion with the YeastFab assembly strategy for combinatorial strength prediction basedonafine-tunedsynthetic pro- optimization of heterologous metabolic pathways in Sac- moter library in Escherichia coli.bioRxiv.doi:10.1101/2020. charomyces cerevisiae. Metabolic Engineering. 47:294–302. 06.25.170365. doi:10.1016/j.ymben.2018.03.020. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png All Life Taylor & Francis

QPromoters: sequence based prediction of promoter strength in Saccharomyces cerevisiae

QPromoters: sequence based prediction of promoter strength in Saccharomyces cerevisiae

Abstract

Promoters play a key role in influencing transcriptional regulation for fine-tuning the expression of genes. Heterologous promoter engineering has been a widely used concept to control the level of transcription in all model organisms. The strength of a promoter is mainly determined by its nucleotide composition. Many promoter libraries have been curated, but few have attempted to develop theoretical methods to predict the strength of promoters from their nucleotide sequence. Such...
Loading next page...
 
/lp/taylor-francis/qpromoters-sequence-based-prediction-of-promoter-strength-in-dCNJW7Sm0K
Publisher
Taylor & Francis
Copyright
© 2023 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
ISSN
2689-5307
eISSN
2689-5293
DOI
10.1080/26895293.2023.2168304
Publisher site
See Article on Publisher Site

Abstract

ALL LIFE 2023, VOL. 16, NO. 1, 2168304 https://doi.org/10.1080/26895293.2023.2168304 QPromoters: sequence based prediction of promoter strength in Saccharomyces cesrevisiae a b c a Devang Haresh Liya , Mirudula Elanchezhian , Mukulika Pahari , Nithishwer Mouroug Anand , Shivani d e f,g Suresh , Nivedha Balaji and Ashwin Kumar Jainarayanan a b Department of Physical Sciences, Indian Institute of Science Education and Research, Mohali, India; Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India; Department of Computer Engineering, Ramrao Adik Institute of Technology, DY Patil Deemed to be University, Navi Mumbai, India; Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield, Sheffield, e f UK; School of Biology and Environmental Sciences (SBES), University College Dublin, Dublin, Ireland; Kennedy Institute of Rheumatology, University of Oxford, Oxford, UK; Interdisciplinary Bioscience Doctoral Training Program and Exeter College, University of Oxford, Oxford, UK ABSTRACT ARTICLE HISTORY Received 12 November 2021 Promoters play a key role in influencing transcriptional regulation for fine-tuning the expression of Accepted 10 December 2022 genes. Heterologous promoter engineering has been a widely used concept to control the level of transcription in all model organisms. The strength of a promoter is mainly determined by its KEYWORDS nucleotide composition. Many promoter libraries have been curated, but few have attempted to Computational life sciences; develop theoretical methods to predict the strength of promoters from their nucleotide sequence. bioinformatics and system Such theoretical methods are not only valuable in the design of promoters with specified strength biology but are also meaningful in understanding the mechanistic role of promoters in transcriptional regulation. In this study, we present a theoretical model to describe the relationship between promoter strength and nucleotide sequence in Saccharomyces cerevisiae. We infer from our analy- sis that the −49–10 sequence with respect to the Transcription Start Site represents the minimal region that can be used to predict promoter strength. https://qpromoters.com/ and a standalone tool https://github.com/DevangLiya/QPromoters to quickly quantify the strength of Saccharomyces cerevisiae promoters. an open-source script that can be utilized to quan- Author summary tify promoter strength in Saccharomyces cerevisiae and Regulating gene expression is a crucial aspect of streamline the process of promoter design. metabolic engineering and synthetic biology. Pro- moter engineering plays a vital part in modulating Introduction transcriptional capacity and hence controlling gene expression. While there are tools to identify promoter Saccharomyces cerevisiae (S. cerevisiae), commonly regions in the eukaryotic genome, there are no simple known as brewer’s yeast, is a widely used eukary- tools to predict the strength of promoters in eukary- otic model organism in synthetic biology – it has otes. Previous studies have shown that there exists a applications in the production of biofuels, recombi- relationship between the promoter strength and the nant proteins and bulk chemicals (Nevoigt, 2008;Tang natural log of the promoter score. We use this rela- et al. 2020). Promoters are basic transcriptional ele- tionship to identify the minimal promoter region in mentsthatplayakeyroleinmanipulatinggenetic Saccharomyces cerevisiaethatcanbeusedtopredictthe and metabolic pathways in S. cerevisiae by the reg- strength of promoters. We have used a set of 18 stan- ulation of protein expression both quantitatively and dard promoters whose strengths were experimentally temporally (Scalcinati et al. 2012; Latimer et al. 2014) determined in previous studies to verify our model. and are one of the most crucial component of yeast We were able to classify promoters into three broad synthetic biology toolbox (Hubmann et al. 2014;Red- classes, namely weak,moderate,and strong,withhigh den et al. 2015;Machens et al. 2017;Portela et al. 2017; confidence. We have also developed a website and Ottoz and Rudolf 2018;Decoene et al. 2019; Kotopka CONTACT Ashwin Kumar Jainarayanan ashwin.jainarayanan@dtc.ox.ac.uk Supplemental data for this article can be accessed here. https://doi.org/10.1080/26895293.2023.2168304 © 2023 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. 2 D. HARESH ET AL. and Smolke 2020;Liu et al. 2020;Fengand Marchisio transcription activators and repressors, respectively. 2021). Promoters in S. cerevisiae have multiple com- The UAS enhances gene expression, provides addi- ponents which together account for successful tran- tional stability, and plays a role in regulating the PIC scriptional regulation. The key components of a yeast formation process (West et al. 1984;Bitteretal. 1991). promoter are an upstream activator sequence (UAS), The UASs and URSs in S. cerevisiae are typically 10 bp an upstream repressor sequence (URS), a nucleosome- long butcan vary from 5to30bpinlength(Stewart disfavoring sequence and a core promoter region. et al. 2012). The core promoter is the DNA sequence nearest to The disfavoring nucleotide sequence is a stretch of the start codon, which interacts with RNA polymerase DNA that decreases nucleosome occupancy to facili- II (pol-II) and other general transcriptional factors tate transcription (Struhl and Segal 2013). Poly(dA:dT) to form the pre-initiation complex (PIC) (Tang et al. tract, a homopolymeric stretch of deoxyadenosine 2020). Thecoreregionalsocontainsthe TATA box, nucleotides, is a well-known nucleosome-disfavoring the transcription start site (TSS), a PIC localization sequence commonly present in promoters (Workman stretch and a TSS scanning region for pol-II (Lubliner 2006). et al. 2013). The binding of general transcription factor The structural properties of the promoter are vital proteins andhistonestothe TATA boxfacilitates the for successful transcription. The flexibility of the pro- subsequent binding of pol-II, which along with several motershouldbeoptimal to make sure that thebinding transcription factor proteins, constructs a transcrip- sitesare accessible andproperlypositionedtoenable tion initiation complex that starts the mRNA synthesis their recognition by transcriptional machinery (Jiang from the TSS (Kanhere and Bansal 2005;Jiang and and Pugh 2009). In this regard, the bendability, or the Pugh 2009). The nucleotide composition of different propensity of each tri-nucleotide to bend, is essential regions in the core promoter strongly influences the (Kanhere and Bansal 2005). Existing studies indicate sensitivity of the promoter. Studies have shown that the presence of regions of low bendability about 100- promoters with A/T – or T/C-rich PIC regions have 200 bp upstream to the start codon (Miele et al. 2008) higher sensitivities than promoters containing G/C- (illustrated in Figure 1 by a jagged line). Studies also rich sequences (Lubliner et al. 2015). The position of indicate that the low bendability is caused by a com- the different regions of the core promoter is illustrated bination of A/T richness and di- and tri-nucleotide in Figure 1. composition (Akan and Deloukas 2008). The UAS and URS are the regulatory components of Promoters in S. cerevisiae canbeeitherconstitutive, a promoter and are located upstream to the core pro- that are relatively unaeff cted by internal and external moter region. UASs and URSs act as binding sites for signals and maintain stable levels of transcription, or Figure 1. A schematic of promoter architecture in S. cerevisiae: The text in crimson (top) denotes the conditions necessary for high sen- sitivity, while the green text (bottom) denotes the conditions for lower sensitivity. The length of the different regions of the promoter are also given in bp. The jagged line at the bottom denotes the part of the promoter that is rigid in nature. ALL LIFE 3 inducible, which can initiate a drastic change in tran- to calculate the ‘promoter score’ for all the promoters scriptional levels in response to specific stimuli. These in the downstream analysis. stimuli, called inducers, range from molecules such as Taking inspiration from the well-established lin- metabolites, amino acids, and sugars to metal ions and ear relationship between the total promoter score and environmental factors like pH and stress (Weinhandl the promoter strength in E. coli (Berg and von Hip- et al. 2014;Gasseretal. 2015;Kim et al. 2015;Fischer pel 1987), we modeled the promoter strength using a et al. 2016;Rajkumaretal. 2016). Using endogenous linear model with the promoter score as, promoters of S. cerevisiae for synthetic biology appli- Promoter strength = C0+ C1× (Promoter score) cationshas disadvantagesowing to an insucffi iencyof well-characterized promoters (Chen et al. 2018;Zhou Lee et al. have characterized the strength of 19 con- et al. 2018). Thus,itisofutmostimportancetocharac- stitutive promoters in S. cerevisiae using three u fl o- terize and quantify the strength of various S. cerevisiae rescence markers (Venus, mRuby2, and mTurquoise2) promoters and create a database of the same. (Lee et al. 2015). We have used 18 of these promoter Previous work has established a two-step approach strengths in this study. The sequence for the pro- for the quantitative prediction of the strength of pro- moter pREV1 was not found in EPD, due to which moters in Escherichia coli (E. coli), a prokaryotic we have dropped it from our analysis. The normalized model organism (Li and Zhang 2014;Bharanikumar u fl orescence values folded over the background were et al. 2018). Thelinearrelationshipbetween thetotal obtained from the authors of Lee et al. (2015). The log promoter score and the promoter strength is well- ofthesevaluesgivesthepromoterstrength.Thesewere established in E. coli (Berg and von Hippel 1987;Bha- further divided by the strength of the strongest pro- ranikumar et al. 2018). In this study, we have presented moter(pTDH3) from Leeetal. (2015). We note that a similar simplified model of the promoter strength in this step does not alter the linear relationship that is S. cerevisiae based on the promoter sequence. being tested but merely acts as a scaling. These val- We have done an extensive literature survey, and ues finally constitute the result space or the ‘Promoter we observethatthere areseveralestablished methods strength’inEq. 1. We also have used anotherset of to quantify the strength of promoters (Rhodius Virgil vfi e promoters from Decoene et al. ( 2019), as the train- and Mutalik Vivek 2010;Yadaetal. 2011;Bharaniku- ing set to further test the robustness of our model mar et al. 2018;Hayat et al. 2020;Zhaoetal. 2020;Li (Decoene et al. 2019). These uo fl rescence values were et al. 2022;Zhaoetal. 2022). We would like to high- also subjectedtothe normalizations describedabove. light the fact that most of these methods deal with E. We define a ‘segment score’ which is simply the total coli as the model organism. Our study is the only study score of a given segment of a given promoter as cal- that deals with the promoter sequences associated with culated from PSSM. This score is then divided by the eukaryotes to the best of our knowledge. highest score (corresponding to pTDH3) to obtain the feature space or ‘Promoter score’ part of the Eq. 1. We then performed Ordinary Linear Regression Materials and methods (OLS) using the statsmodels package in python. C0 The core promoter sequences of 5117 promoters in S. and C1 were left as free parameters to obtain the best cerevisiae were retrieved using the Sequence Retrieval tfi . The quality of tfi was then assessed using reduced in Tool Eukaryotic Promoter Database (EPD) (Dreos r-squared and F-statistic. The significance of model et al. 2017). The core promoter sequence consisted of parameters were assessed using the t-statistic. We also −49–10 sequences with reference to the Transcription tfi tedalinear modeltothe residues to look forbiases Start Site (TSS). in the model. Scipy, statsmodels and Seaborn pack- A Position Frequency Matrix (PFM) was generated ages were used to perform, visualize, and test the linear from the motif of all 5117 promoter sequences. The regression (Seabold and Perktold 2010). PFM was then converted to Position Weight Matrix (PWM) or Position-Specific Scoring Matrix (PSSM) Results and discussion using the functions from biopython (Cock et al. 2009). The motif landscape was visualized using WebLogo The 5117 native S. cerevisiae promoters from the EPD (Crooks et al. 2004). The resulting PSSM was then used that were included in this study represent a diverse 4 D. HARESH ET AL. Figure 2. Motif of −49–10 region: Motif logo generated from all 5117 S. cerevisiae promoters from Eukaryotic Promoter Database. The motif is generated for −49–10 sequence with respect to the Transcription Start Site. population of transcriptional regulators. Motif analy- sis on this set, shown in Figure 2,revealedthatthe promoters were diverse in terms of nucleotide compo- sition. We see that the conservation along the entire promoter length is low, except for the TSS. Based on this, there are two possible regions or segments of the −49–10 sequence that can be considered for modeling the linear relationship in Eq. 1. 1) Highly conserved −9–1segment,(2) −49 toXsegment whereXvaries from −48 to 10. Results of modelling the promoter strength using these segments are discussed in the fol- lowing sections along with their possible biological implications. Figure 3. Various fit statistics for the linear regression of segment scores against the mRuby2 fluorescence: One of the ends of the promoter is fixed at −49 and nucleotides are added on the other −49 to X region end towards the TSS. The values of R-squared, Adj. R-squared, and p-value for F-statistic are tabulated in Table S1. Similar plots for We sought to determine the shortest sequence that Venus and mTurquoise2 fluorescence are given in Fig S2. could model the relationship between experimental promoter strength and the segment scores. We first fix further analysis focused on this 60 bp segment to ease one end of such a segment at−49 position with respect the integration with the EPD. Figure 4 shows the to TSS and add nucleotides towards TSS until posi- plot of normalized −49–10 scores and normalized tion ‘X’. Scores of the segments thus obtained were mRuby2 u fl orescence along with the best tfi model. used to perform a linear regression described by the We seethatthe residues forthismodel arerandomly Eq.1.The qualityoffitestimatorsfor each such regres- distributed around 0, indicating that the errors are sion is shown in Figure 3. It was observed that the uncorrelated, and the quantile-quantile plot shows that quality of tfi generally improves as more and more errors are normally distributed, thus validating all the nucleotides are added. This is indicated by larger R- assumptions for linear regression. Similar trends were squared values and lower p-values for F-statistic. A observed using Venus and mTurquoise2 u fl orescence saturation is reached at X =−1, after which adding as seen in Figures S2 and S3, respectively. more nucleotides does not improve the quality of tfi by an appreciable amount, as shown in Figure 3 and S2). This saturation is sustained until X = 10, indi- −9–1 region cating that the quality of tfi from −49 to −1and −49–10 is mostly similar (Figures S4–S6). Since the Motif analysis shown in Figure 2 showed that the−9–1 −49–10 sequence was readily available in the EPD, region to be the most conserved stretch. Previous work ALL LIFE 5 Figure 4. Best fit model for −49–10 segment: (A) Plot of normalized −49–10 score and normalized mRuby2 fluorescence along with the best fit model. (B) Residues obtained from the best fit model. (C) Quantile-Quantile plot of residues against normally distributed theoretical quantiles. Figure 5. Best fit model for−9–1 segment: (A) Plot of normalized−9–1 score and normalized mRuby2 fluorescence along with the best fit model. (B) Residues obtained from the best fit model. (C) Quantile-Quantile plot of residues against normally distributed quantiles. suggests that conserved sequences contribute signifi- cantly to the binding specificity of the promoter region (Berg and von Hippel 1987). As mentioned earlier, a high binding specificity is indicative of high pro- moter strength. Consequently, we sought to determine whether this hypothesis holds true for the promoters included in our analysis. The quality of tfi observed in this case, however, is extremely poor as seen in the Figure 5.The p-value of F-statistic is 0.54 for regres- sion usingmRuby2strengths.Therefore,wecannot reject the null hypothesis that there is no relation- Figure 6. Best fit values of model parameters: The best fit values ship between −9–1 score and promoter strength. This of C0 and C1 obtained using different fluorescence data as a proxy demonstrates that the above argument is not a domi- for promoter strength. The black dot shows the mean value of C0 nant mechanism in defining the promoter strength. and C1 weighted by error bars. The data corresponding to this plot can be found in Table S2. Values of model parameters alongwiththeir errors.ValuesofC0and C1 obtained We found that the value of intercept C0 in the model using mTorquoise2 u fl orescence were slightly different is closetozerofor thebestfitmodel andthe valueof from those obtained using Venus or mRuby2 u fl ores- slope C1 was between 0.8 and 1.1. Figure 6 shows the cence. These differences likely stem from the stochas- best fit values of C0 and C1 for different fluorescence tic gene expression and noisy uo fl rescence signals in 6 D. HARESH ET AL. Figure 7. A sample output from QPromoters application: (A)−49–10 score of the user’s promoter is shown as a horizontal line on a plot of scores of the characterized promoters from Lee et al. (2015) (B) Arrow shows the score of the user’s promoter in reference to scores of all 5117 S. cerevisiae promoters in EPD. the experiments. However, they agree with each other Conclusion within error bars. The value of C1 using the Decoene Weanalyzedthecorepromoterregionof−49–10, with et al. (2019)dataset waswithinthe 1-sigmaerror bars respecttoTSS, to findasimple correlationbetween of Venus and mRuby2 u fl orescence. The value of C0 the score of this region and the experimental promoter dieff rs significantly for this dataset, but we note that activity. Analysis of the core promoter region revealed this is likely due to different experimental conditions that there is a correlation between the promoter score anddoesnotchangetherelativestrengthsofotherpro- and experimental promoter strength. Particularly, the moters (Figures S7 and S8). The mean values of C0 and −49–10 region of the core promoter was seen to be C1 are 0.025 and 0.87, respectively. the best predictor of the promoter strength. We also observed a similar quality of tfi between the −49 to QPromoters software −1and −40–10 regions. The biological basis and sig- nificance of this sustained quality of tfi needs further We have developed an open-source, free-to-use stan- investigation. In addition to these findings, we have dalone tool andawebsitetouse ourfindings to predict developed an open-source, free-to-use tool to predict the strength of S. cerevisiae promoters. The standalone the promoter strength of unknown promoters in S. tool can be found at https://github.com/DevangLiya/ cerevisiae. QPromoters, and the website can be found at https:// Using computational tools to determine the essen- qpromoters.com/.Users caneitherenter the −49–10 tiality of genes and strength of gene regulatory sequence of the engineered promoter or can retrieve elements is of significant use in synthetic biology, as this sequence directly from EPD by entering the this tool will help in constructing recombinant circuits. EPDnew ID of the promoter. The tool then outputs This tool would also be helpful in experiments where the promoter score, promoter score normalized by fine-tuned regulation of gene expression is required pTDH3score,promoterstrengthusing themodel and in studies involving transcription kinetics where described by equation Eq. 1, a plot showing the loca- characterizing promoter strength might be required. tion of the user’s promoter with reference to the 18 Consequently, these in-silico methods can precede characterized promoters, and a histogram showing and lower the risk of failure in in vivo experiments. the location of the user’s promoter with reference to Moreover, our web tool is useful in characterizing all 5117 EPD promoters. An example of the figure the strength of existing promoters in the EPD as returned by the program is shown in Figure 7. ALL LIFE 7 well as predicting the strength of other engineered References S. cerevisiae promoters on the basis of the promoter Akan P, Deloukas P. 2008. DNA sequence and structural prop- sequence. erties as predictors of human and mouse promoters. Gene. 410:165–176. doi:10.1016/j.gene.2007.12.011. Berg OG, von Hippel PH. 1987. Selection of DNA binding Acknowledgments sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. Journal of The authors thank Shubham Kumar Sinha and Swaroopa Molecular Biology. 193:723–743. doi:10.1016/0022-2836(87) Nakkeeran from the Indian Institute of Science Education 90354-8. and Research, Mohali, for their comments and suggestions Bharanikumar R, Premkumar KAR, Palaniappan A. 2018.Pro- throughout the work. AKJ is supported by Clarendon Fund moterPredict: sequence-based modelling of Escherichia coli (http://www.ox.ac.uk/clarendon/about), SKP scholarship, Exe- sigma(70) promoter strength yields logarithmic dependence ter College (https://www.exeter.ox.ac.uk/wp-content/uploads/ between promoter strength and sequence. PeerJ. 6:e5862. 2019/09/SKP-2020.pdf) and UKRI-BBSRC grant BB/M011- doi:10.7717/peerj.5862. 224/1, Oxford Interdisciplinary Bioscience DTP at the Univer- Bitter GA, Chang KK, Egan KM. 1991. A multi-component sity of Oxford. DHL and NMA are supported by the INSPIRE upstream activation sequence of the Saccharomyces cere- scholarship (https://www.online-inspire.gov.in/). AKJ concep- visiae glyceraldehyde-3-phosphate dehydrogenase gene pro- tualized and designed the project. DHL curated the data moter. Molecular and General Genetics MGG. 231:22–32. and worked on the formal analysis to quantify the promoter doi:10.1007/BF00293817. strength. SS helped organizing and standardizing the datasets. Chen X, GaoC,Guo L, Hu G, LuoQ,Liu J, NielsenJ, NMA and NB did the promoter landscape analysis and visu- Liu L. 2018. Dceo biotechnology: tools to design, con- alizations. ME and SS mined the S. cerevisiae promoters’ flu- struct, evaluate, and optimize the metabolic pathway for orescence data from published datasets and resources for val- biosynthesis of chemicals. Chemical Reviews. 118:4–72. idating the tool. MP designed web implementation and pro- doi:10.1021/acs.chemrev.6b00804. videdtechnical assistance.MEhelpedintesting thetooland Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, improving the user experience. AKJ, SS, DHL, and ME wrote Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon the manuscript. All the authors have read and approved this MJL. 2009. Biopython: freely available Python tools for com- manuscript. putational molecular biology and bioinformatics. Bioinfor- matics. 25:1422–1423. doi:10.1093/bioinformatics/btp163. Crooks GE, Hon G, Chandonia JM, Brenner SE. 2004. Disclosure statement WebLogo: a sequence logo generator. Genome Research. No potential conflict of interest was reported by the author(s). 14:1188–1190. doi:10.1101/gr.849004. DecoeneT,DeMaeseneireSL, De MeyM. 2019.Modulating transcription through development of semi-synthetic yeast core promoters. PLOS ONE. 14:e0224476. doi:10.1371/jour Funding nal.pone.0224476. The authors did not receive funding from any source for this Dreos R, Ambrosini G, Groux R, Cavin Perier R, Bucher P. work and was carried out of scientific interest. 2017. The eukaryotic promoter database in its 30th year: focus on non-vertebrate organisms. Nucleic Acids Research. 45:D51–D55. doi:10.1093/nar/gkw1069. Data availability Feng X, Marchisio MA. 2021. Saccharomyces cerevisiae pro- moter engineering before and during the synthetic biology The uo fl rescence data used in this work are openly era. Biology (Basel). 10(6):504. doi:10.3390/biology10060 available in the following publications: Lee et al. (2015) and Decoene et al. (2019) that issue datasets with FischerS,EngstlerC,ProcopioS,BeckerT. 2016.EGFP- based evaluation of temperature inducible native promot- DOIs. The standalone tool for predicting the pro- ers of industrial ale yeast by using a high throughput sys- moter strength is open-source and available at https:// tem. LWT - Food Science and Technology. 68:556–562. github.com/DevangLiya/QPromoters.The online ver- doi:10.1016/j.lwt.2015.12.020. sion of this tool can be accessed at https://qpromoters. Gasser B, Steiger MG, Mattanovich D. 2015.Methanolreg- com/. ulated yeast promoters: production vehicles and toolbox for synthetic biology. Microbial Cell Factories. 14:196. doi:10.1186/s12934-015-0387-1. ORCID HayatM,Gul S, ChongKT. 2020. An intelligent computational Nithishwer Mouroug Anand http://orcid.org/0000-0003- model for prediction of promoters and their strength via 0852-7141 natural language processing. Chemometrics and Intelligent 8 D. HARESH ET AL. Laboratory Systems. 202:104034. doi:10.1016/j.chemolab. Nevoigt, E. 2008. Progress in metabolic engineering of Sac- 2020.104034. charomyces cerevisiae. Microbiology and Molecular Biology HubmannG,Thevelein JM,Nevoigt E. 2014.Natural andmod- Reviews. 72:379–412. doi:10.1128/MMBR.00025-07. ified promoters for tailored metabolic engineering of the Ottoz DSM, Rudolf F. 2018. Constitutive and regulated pro- yeast Saccharomyces cerevisiae. Methods Molecular Biology. moters in yeast: how to design and make use of promot- 1152:17–42. doi:10.1007/978-1-4939-0563-8_2. ers in S. cerevisiae. Synthetic Biology. 107–130. doi:10.1002/ JiangC,PughBF. 2009. Nucleosome positioning and gene reg- 9783527688104.ch6. ulation: advances through genomics. Nature Reviews Genet- PortelaRMC,VoglT,KnielyC,Fischer JE,OliveiraR,Glieder ics. 10:161–172. doi:10.1038/nrg2522. A. 2017. Synthetic core promoters as universal parts for fine- Kanhere A, Bansal M. 2005. Structural properties of pro- tuning expression in different yeast species. ACS Synthetic moters: similarities and differences between prokaryotes Biology. 6:471–484. doi:10.1021/acssynbio.6b00178. and eukaryotes. Nucleic Acids Research. 33:3165–3175. Rajkumar AS, Liu G, Bergenholm D, Arsovska D, Kristensen doi:10.1093/nar/gki627. M, Nielsen J, Jensen MK, Keasling JD. 2016. Engineering of Kim S, Lee K, Bae SJ, Hahn JS. 2015. Promoters inducible synthetic, stress-responsive yeast promoters. Nucleic Acids by aromatic amino acids and γ -aminobutyrate (GABA) Research. 44:e136. doi:10.1093/nar/gkw553. for metabolic engineering applications in Saccharomyces Redden H, Morse N, Alper HS. 2015. Editorial: Yeast synthetic cerevisiae. oApplied Microbiology and Biotechnology. 99: biology: new tools to unlock cellular function. FEMS Yeast 2705–2714. doi:10.1007/s00253-014-6303-5. Research. 15:1–1. doi:10.1111/1567-1364.12188. Kotopka BJ, Smolke CD. 2020. Model-driven generation of arti- Rhodius Virgil A, Mutalik Vivek K. 2010. Predicting strength ficial yeast promoters. Nat Commun. 11:2113. doi:10.1038/ andfunctionfor promotersofthe Escherichia coli alternative s41467-020-15977-4. sigma factor, σ E. Proceedings of the National Academy of Latimer LN, et al. 2014.Employing acombinatorial expres- Sciences. 107:2854–2859. doi:10.1073/pnas.0915066107. sion approach to characterize xylose utilization in Sac- Scalcinati G, Knuf C, Partow S, Chen Y, Maury J, Schalk charomyces cerevisiae. Metabolic Engineering. 25:20–29. M, Daviet L, Nielsen J, Siewers V. 2012.Dynamic con- doi:10.1016/j.ymben.2014.06.002. trol of gene expression in Saccharomyces cerevisiae engi- Lee ME, DeLoache WC, Cervantes B, Dueber JE. 2015.Ahighly neered for the production of plant sesquitepene α-santalene characterized yeast toolkit for modular, multipart assem- in a fed-batch mode. Metabolic Engineering. 14:91–103. bly. ACS Synthetic Biology. 4:975–986. doi:10.1021/sb5003 doi:10.1016/j.ymben.2012.01.007. 66v. Seabold S, Perktold J. 2010. Statsmodels: econometric and sta- Li H, Shi L, Gao W, Zhang Z, Zhang L, Zhao Y, Wang G. 2022. tistical modeling with python. Proceedings of the 9th Python dPromoter-XGBoost: Detecting promoters and strength by in Science Conference 2010. combining multiple descriptors and feature selection using Stewart AJ, Hannenhalli S, Plotkin JB. 2012.Why transcrip- XGBoost. Methods. 215–222. doi:10.1016/j.ymeth.2022.01. tion factor binding sites are ten nucleotides long. Genetics. 001. 192:973–985. doi:10.1534/genetics.112.143370. Li J, Zhang Y. 2014. Relationship between promoter sequence Struhl K, Segal E. 2013. Determinants of nucleosome posi- and its strength in gene expression. The European Physical tioning. Nature Structural & Molecular Biology. 20:267–273. Journal E. 37:44. doi:10.1140/epje/i2014-14044-y. doi:10.1038/nsmb.2506. Liu R, Liu L, Li X, Liu D, Yuan Y. 2020. Engineering Tang H, Wu Y, Deng J, Chen N, Zheng Z, Wei Y, Luo X, yeast artificial core promoter with designated base motifs. Keasling JD, et al. 2020. Promoter architecture and pro- Microbial Cell Factories. 19:38. doi:10.1186/s12934-020-013 moter engineering in Saccharomyces cerevisiae.Metabolites. 05-4. 10:320. doi:10.3390/metabo10080320. Lubliner S, et al. 2015.Corepromotersequenceinyeast is a Weinhandl K, Winkler M, Glieder A, Camattari A. 2014.Car- major determinant of expression level. Genome Research. bon source dependent promoters in yeasts. Microbial Cell 25:1008–1017. doi:10.1101/gr.188193.114. Factories. 13:5. doi:10.1186/1475-2859-13-5. Lubliner S, Keren L, Segal E. 2013.Sequencefeaturesofyeast West RW Jr., Yocum RR, Ptashne M. 1984. Saccharomyces cere- and human core promoters that are predictive of maximal visiae GAL1-GAL10 divergent promoter region: location promoter activity. Nucleic Acids Research. 41:5569–5581. and function of the upstream activating sequence UASG. doi:10.1093/nar/gkt256. Molecular and Cellular Biology. 4:2467–2478. doi:10.1128/ Machens F, Balazadeh S, Mueller-Roeber B, Messerschmidt mcb.4.11.2467-2478.1984. K. 2017. Synthetic promoters and transcription factors Workman JL. 2006. Nucleosome displacement in transcription. for heterologous protein expression in Saccharomyces cere- Genes & Development. 20:2009–2017. doi:10.1101/gad.143 visiae. Frontiers in Bioengineering and Biotechnology. 5:63. 5706. doi:10.3389/fbioe.2017.00063 . Yada T, et al. 2011. Linear regression models predicting strength Miele V, Vaillant C, d’Aubenton-Carafa Y, Thermes C, Grange of transcriptional activity of promoters. Genome Informat- T. 2008. DNA physical properties determine nucleosome ics. 25:53–60. doi:10.11234/gi.25.53. occupancy from yeast to yfl . Nucleic Acids Research. Zhao M, Yuan Z, Wu L, Zhou S, Deng Y. 2022.Precise pre- 36:3746–3756. doi:10.1093/nar/gkn262. diction of promoter strength based on a de novo synthetic ALL LIFE 9 promoter library coupled with machine learning. ACS Syn- Zhou Y, Li G, Dong J, Xing X-h, Dai J, Zhang C. 2018. thetic Biology. 11:92–102. doi:10.1021/acssynbio.1c00117. MiYA, an efficient machine-learning workflow in conjunc- Zhao M, Zhou S, Wu L, Deng Y. 2020.Model-drivenpromoter tion with the YeastFab assembly strategy for combinatorial strength prediction basedonafine-tunedsynthetic pro- optimization of heterologous metabolic pathways in Sac- moter library in Escherichia coli.bioRxiv.doi:10.1101/2020. charomyces cerevisiae. Metabolic Engineering. 47:294–302. 06.25.170365. doi:10.1016/j.ymben.2018.03.020.

Journal

All LifeTaylor & Francis

Published: Dec 31, 2023

Keywords: Computational life sciences; bioinformatics and system biology

References