Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

The genomic and bulked segregant analysis of Curcuma alismatifolia revealed its diverse bract pigmentation

The genomic and bulked segregant analysis of Curcuma alismatifolia revealed its diverse bract... aBIOTECH (2022) 3:178–196 https://doi.org/10.1007/s42994-022-00081-6 aBIOTECH RESEARCH ARTICLE The genomic and bulked segregant analysis of Curcuma alismatifolia revealed its diverse bract pigmentation 1 2 1 1 1 Xuezhu Liao , Yuanjun Ye , Xiaoni Zhang , Dan Peng , Mengmeng Hou , 1 2 3 4 2 Gaofei Fu , Jianjun Tan , Jianli Zhao , Rihong Jiang , Yechun Xu , 2 5 6 7 Jinmei Liu , Jinliang Yang , Wusheng Liu , Luke R. Tembrock , 2& 1,8& Genfa Zhu , Zhiqiang Wu Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China Guangdong Provincial Key Lab of Ornamental Plant Germplasm Innovation and Utilization, Environmental Horticulture Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, China Yunnan Key Laboratory of Plant Reproductive Adaptation and Evolutionary Ecology, Yunnan University, Kunming 650504, China Guangxi Engineering and Technology Research Center for Woody Spices, Guangxi Key Laboratory for Cultivation and Utilization of Special Non-Timber Forest Crops, Guangxi Forestry Research Institute, Nanning 530002, China Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583, USA Department of Horticultural Science, North Carolina State University, Raleigh, NC 27607, USA Department of Agricultural Biology, Colorado State University, Fort Collins, CO 80523, USA Kunpeng Institute of Modern Agriculture at Foshan, Foshan 528200, China Received: 8 July 2022 / Accepted: 5 September 2022 / Published online: 6 October 2022 Abstract Compared with most flowers where the showy part comprises specialized leaves (petals) directly subtending the reproductive structures, most Zingiberaceae species produce showy ‘‘flowers’’ through modifications of leaves (bracts) subtending the true flowers throughout an inflorescence. Curcuma alismatifolia, belonging to the Zingiberaceae family, a plant species originating from Southeast Asia, has become increasingly popular in the flower market worldwide because of its varied and esthetically pleasing bracts produced in different cultivars. Here, we present the chromosome-scale genome assembly of C. alismatifolia ‘‘Chiang Mai Pink’’ and explore the underlying mechanisms of bract pig- mentation. Comparative genomic analysis revealed C. alismatifolia contains a residual signal of whole- genome duplication. Duplicated genes, including pigment-related genes, exhibit functional and struc- tural differentiation resulting in diverse bract colors among C. alismatifolia cultivars. In addition, we identified the key genes that produce different colored bracts in C. alismatifolia, such as F3 5’H, DFR, ANS and several transcription factors for anthocyanin synthesis, as well as chlH and CAO in the chlorophyll synthesis pathway by conducting transcriptomic analysis, bulked segregant analysis using both DNA and RNA data, and population genomic analysis. This work provides data for understanding the mechanism of bract pigmentation and will accelerate breeding in developing novel cultivars with Xuezhu Liao, Yuanjun Ye, Xiaoni Zhang have contributed equally to this work. & Correspondence: genfazhu@163.com (G. Zhu), wuzhiqiang@- caas.cn (Z. Wu) The Author(s) 2022 aBIOTECH (2022) 3:178–196 179 richly colored bracts in C. alismatifolia and related species. It is also important to understand the variation in the evolution of the Zingiberaceae family. Keywords Anthocyanin synthesis, Siam tulip, Floriculture, Zingiberaceae, Genome evolution INTRODUCTION development (Fukai and Udomdee 2005). As the most important ornamental trait of C. alismatifolia, bract Zingiberaceae is a monocotyledonous angiosperm lin- color (Fig. 1) has begun to attract increasing attention eage family that contains several important crops, such among breeders and researchers. However, the mecha- as ginger (Zingiber officinale), cardamom (Elettaria nism underlying bract color formation remains cardamomum), and turmeric (Curcuma longa). In addi- unknown. tion to the important economic value, the special flower Although there are very limited studies on antho- structure and the complex habitat of Zingiberaceae have cyanins in bracts, anthocyanins on plant color have been led to the evolution of many unique reproductive modes well studied, given the importance of certain pigments and pollination mechanisms (Sun et al. 2011; Wang in plant metabolism. The compounds involved in plant et al. 2004). Therefore, Zingiberaceae plays an impor- coloration (in addition to chlorophyll) are flavonoids tant role in the study of plant phylogeny and evolution (including anthocyanins), carotenoids, and betaine of reproductive systems. As the most challenging genus (Grotewold 2006). Anthocyanins are important sec- in Zingiberaceae, the classification of Curcuma is pla- ondary metabolites of plants and have important bio- gued by polyploid speciation and homoploid logical functions such as antioxidant and antibacterial hybridization and the division of the genus has always effects and attracting insects for pollination (Winkel- been a controversial issue owing to molecular and Shirley 2001). In addition to their antioxidant and morphological conflicts (Za´veska´ et al. 2012). However, antibacterial activities, anthocyanins have been used as Curcuma species including C. alismatifolia are of great food colorants and pharmaceutical feedstock (Khoo importance because of their long history of use as et al. 2017;Xu etal. 2017). The biosynthesis of antho- medicines, including C. alismatifolia (Akter et al. 2008; cyanins is catalyzed by several structural genes, such as 0 0 Taheri et al. 2019). In addition to their pharmacological chalcone synthase (CHS), and flavonoid-3 ,5 -hydroxylase 0 0 properties, this genus also contains diverse ornamental (F3 5 H), dihydroflavonol 4-reductase (DFR), and antho- species with showy bracteate inflorescences they pro- cyanidin synthase (ANS) (Belwal et al. 2020; Harborne duce. Thus, it is important to decode the underlying and Williams 2000; Mol et al. 1998) and is also affected genetic basis of these traits in those ornamental species. by transcription factors, such as MYB, bHLH, and WD40. Curcuma alismatifolia Gagnep is a tropical species Several subgroups of these transcription factors have native to Cambodia, Laos, and Thailand and is com- been shown to be involved in anthocyanin synthesis, monly known as Siam tulip. Their ‘‘flowers’’ consist of a including subgroup 4–7, 44, and 79 of MYB regulators series of large colorful bract-subtending flowers in a (Wu et al. 2022), subgroup IIIf, IIId ? e, and IVd of spike inflorescence (Fig. 1). It is a very popular orna- bHLH regulators (Xie et al. 2012; Zhao et al. mental cut or potted flower in China and Southeast Asia 2018, 2019, 2020), and WD40 proteins TRANSPARENT because of its distinctive inflorescence, colorful bracts, TESTA GLABRA1 (TTG1) homologs (Baudry et al. 2004; and a long flowering period that lasts from May to Belwal et al. 2020; Gonzalez et al. 2008). In the antho- November during the high-temperature season. This cyanin synthesis pathway of Arabidopsis, the MYB genes extended period of flowering is particularly attractive interact with the bHLH transcription factors and the among floriculturists, as it provides a longer production WD40 protein family to form the MYB–bHLH–WD40 window than most cut flowers. Among the many culti- (MBW) complex to regulate the formation and accu- vars of Siam tulip, ‘‘Chiang Mai Pink’’ is the most popular mulation of anthocyanins (Dubos et al. 2010; Yan et al. variety with broad market prospects (Fukai and 2021). In a previous qualitative and quantitative anal- Udomdee 2005;Lu 2007; Mao et al. 2018; Taheri et al. ysis of pigments in the pink bracts of C. alismatifolia, 2012, 2014). At present, studies on the traits of C. anthocyanidin malvidin 3-rutinoside was identified as alismatifolia have mainly focused on bract color the main pigment (Nakayama et al. 2000). The CHS and (Koshioka et al. 2015), vase life (Kjonboon and Kan- DFR genes have been cloned from C. alismatifolia, and layanarat 2005), inflorescence and flower initiation and the magenta color of the petals in DFR transgenic plants The Author(s) 2022 180 aBIOTECH (2022) 3:178–196 Fig. 1 Structure of the C. alismatifolia genome. The innermost circle shows the diversity of bract pigmentation and inflorescence morphology. A GC content. B rRNA distribution. C tRNA distribution. D SSR distribution. E LTR distribution. F 16 chromosomes was more brilliant than that of the petals from wild type magnesium chelatase subunit H (chlH), and chloro- (Chanapan et al. 2017; Petchang et al. 2017). Moreover, phyllide a oxygenase (CAO) (Wang et al. 2020). There- 0 0 F3 5 H was also identified as a key gene controlling fore, to clarify the formation of each color in C. anthocyanin synthesis in C. alismatifolia ‘‘Dutch Red’’ by alismatifolia bracts, the biosynthetic pathways of dif- transcriptome analysis (Li et al. 2022). In addition, most ferent pigments must be elucidated. Such knowledge of the inner whorl bracts of C. alismatifolia have green will improve cultivar development through gene editing tips with chlorophyll deposition and red pigmentation of gene targets in the pigment pathways. under them, which gives a desiccated appearance, thus At present, the traditional method for developing reducing the ornamental value (Ding et al. 2021). new varieties of C. alismatifolia is through hybrid Chlorophyll is a pigment that provides plants their breeding, which is costly and time intensive (Ke et al. characteristic green color and is mainly composed of 2020). Moreover, previous research on C. alismatifolia chlorophylls a and b. Chlorophyll metabolism can be has mainly focused on the development of molecular divided into three main steps, i.e., chlorophyll synthesis, markers without the availability of a complete genome chlorophyll cycle, and chlorophyll degradation, with reported for this species. A high-quality C. alismatifolia each step being mediated by a series of important genome assembly will accelerate the development of enzymes such as glutamyl-tRNA reductase (HemA), new cultivars with desired traits. It will also improve The Author(s) 2022 aBIOTECH (2022) 3:178–196 181 the characterization of wild germplasms and aid in the assembly of C. alismatifolia was supported by the high discovery of novel genotypes and the identification of mapping rates of 97.44% (MGI) and 99.07% (HiFi) diversity hotspots for species conversation. Thus, it is (Supplementary Table 2). The high level of complete- urgent to compile a high-quality genome for C. alis- ness of this assembly was also verified by a BUSCO matifolia to accelerate evolutionary studies as well as (Simao et al. 2015) score of 96.53% (Supplementary precision breeding and genome editing in the genus Fig. 5) and a CEGMA (Parra et al. 2007) score of 95.16% Curcuma. (Supplementary Fig. 6). The long terminal repeat (LTR) Currently, only three genomes from Zingiberaceae assembly index (LAI) (Ou et al. 2018) score was 26.38 have been published, including Alpinia nigra (Ranavat (Supplementary Fig. 7, Supplementary Table 3). These et al. 2021), Z. officinale (Cheng et al. 2021b; Li et al. statistics suggest that the C. alismatifolia ‘‘Chiang Mai 2021), and C. longa (Chakraborty et al. 2021), of which Pink’’ genome is a high-quality genome. C. longa is only a draft genome. Here, we present the From the complete genome, 1,172,133 repeat units of chromosome-scale assembly of C. alismatifolia, the first different types were predicted, accounting for 75.84% genome in Curcuma. The genome of C. alismatifolia was (753,914,943 bp) of the total genome size (Supple- determined using a combination of high-accuracy long- mentary Table 4). The long terminal repeats (LTR) read PacBio HiFi and proximity ligation Hi-C data. In accounted for the highest proportion of the genome total, a genome of 994.07 Mb was assembled with (52.60%, Supplementary Table 4), among which the 95.25% of contigs anchored to 16 chromosomes, with super families Copia (31.79%) and Gypsy (20.81%) an N50 of 57.51 Mb. We examined the patterns of dominated (Supplementary Table 4). A burst in LTR whole-genome duplications as well as other duplication proliferation was inferred to have occurred 2.5 mya types and found that tandem and other small-scale gene (Supplementary Fig. 8), with the genomic location of duplications were important in the divergence of C. LTRs concentrated away from the SSR hotspots (Fig. 1). alismatifolia color morphs. In addition, we identified key We identified 57,534 protein-coding genes (Supple- genes involved in the anthocyanin and chlorophyll mentary Table 5) with a BUSCO score of 90.7% (Sup- metabolism pathways in C. alismatifolia bracts that plementary Table 6) and the gene numbers and repeat underlie this coloration. The publication of this refer- sequences in line with C. longa (Chakraborty et al. 2021) ence genome and the genetic mechanisms controlling and the published transcriptome of C. alismatifolia the color of C. alismatifolia bracts provide a valuable (Taheri et al. 2019) (Supplementary Table 5). In addi- resource for the development of novel cultivars as well tion, the total length of all genes (exons ? introns) as increasing our understanding of the evolution of showed a similar pattern to that of other published inflorescences among the monocots and providing an monocot genomes (Supplementary Figs. 9, 10, 11 and important genomic resource for clarifying the complex 12). Up to 92.35% of the protein-coding genes have phylogenetic relationships in Zingiberaceae. been annotated with KEGG, GO, NR, and other databases (Supplementary Table 7). In total, 9417 rRNAs, 8641 snRNAs, 1151 tRNAs, and 202 miRNAs were also RESULTS annotated (Fig. 1, Supplementary Table 8). The GC content of the entire genome was approximately 43% C. alismatifolia genome assembly and annotation (Fig. 1). There was a high GC content (* 55%) region in chromosomes 13 (LG13) and 16 (LG16). These GC- The genome size of C. alismatifolia ‘‘Chiang Mai Pink’’ enriched regions are the locations of the 45S ribosomal was estimated to be 1.10 Gb and the heterozygosity was RNA, which are consistent with the high GC content of found to be 1.7% using 87.45 Gb of MGI-SEQ 2000 the internal transcribed spacers (ITS) for these genes in survey data (Supplementary Figs. 1, 3, 4 and Table 1). banana (Hribova et al. 2011). We also found that 5S This is slightly higher than the reported genome size of rRNA genes were enriched in chromosome 1 (LG01), 998.5 Mb estimated by flow cytometry in a previous while tRNA-genes were enriched in chromosome 8 report (Mao et al. 2020). Then, 30.35 Gb of PacBio cir- (LG08) (Fig. 1). cular consensus sequence (CCS) reads were used for assembly, and 95.25% of the sequences were anchored Comparative genomics and whole-genome to the 16 chromosomes by combining 110.73 Gb of Hi-C duplication (WGD) event data, which was consistent with the expected number of chromosomes (2n = 32) (Leong-Skornickova et al. To clarify the phylogenetic position of C. alismatifolia 2007), resulting in a genome of 994.07 Mb size (Fig. 1; and provide a general framework for understanding its Supplementary Fig. 2). The high fidelity of the genome genomic structure, genes of 217 single-copy gene The Author(s) 2022 182 aBIOTECH (2022) 3:178–196 families from 15 species, including C. alismatifolia,13 Fig. 16). This result was similar to that of Z. officinale, other monocotyledonous species, and the outgroup but slightly different from the results of M. acuminata grape (Vitis vinifera) were used for phylogenetic analy- and the dicotyledonous Rhododendron (Yang et al. 2020) sis. C. alismatifolia was inferred to have diverged from (Supplementary Fig. 16), wherein only PD and TD ginger (Z. officinale) around * 11.9 mya (Fig. 2A). A showed this pattern. Thus, these three types of dupli- total of 9,019 gene families were shared by species in cated genes in C. alismatifolia and Z. officinale had more the Zingiberales (Fig. 2B, C, Supplementary Tables 9 and rapid sequence divergence with stronger positive 10), among which C. alismatifolia had 1490 species- selection than the WGD and TRD genes. KEGG enrich- specific gene families (Fig. 2A). In the genome of C. ment of the five types of duplicated genes also showed alismatifolia, 2,102 gene families were found to be that these genes seem to be divided into two main expanding, of which 120 expanded significantly categories, among which DSD, PD, and TD genes were (Fig. 2A). KEGG enrichment results showed that these enriched in one group for monoterpenoid biosynthesis expanded gene families were mainly related to envi- and flavonoid biosynthesis, which might be related to ronmental adaptation, such as plant–pathogen interac- species-specific differentiation. In comparison, the WGD tion, steroid hormone biosynthesis, and terpenoid and TRD genes in another group were associated with backbone biosynthesis (Supplementary Figs. 13 and more conserved functions such as circadian rhythm and 14). plant hormone signal transduction (Fig. 3B and Sup- In the speciation and divergence of angiosperm lin- plementary Fig. 17). eages, WGD events are an important source of molecu- In addition, the function of replicated genes could be lar diversity (Wu et al. 2020). The distributions of Ks influenced by epigenetic processes such as DNA and substitution rate of fourfold synonymous (degen- methylation, which has been shown to influence gene erative) third-codon transversion (4dtv) sites of gene expression in certain taxa (Dyson and Goodisman pairs in the collinear blocks of C. alismatifolia and Z. 2020). We examined the distribution of the five dupli- officinale indicated that both species have a shared Ks cated gene types on chromosomes and found that genes peak (Fig. 2D). This result was consistent with a pre- of the TRD and WGD types were more concentrated at vious report that Z. officinale has a shared WGD in chromosome ends with an opposite positional trend to Zingiberaceae (Cheng et al. 2021b). This WGD event was LTR distributions, whereas DSD, PD, and TD genes, also supported by the collinear relationships between C. especially DSD genes, showed a higher overlapping alismatifolia and Z. officinale, which had a 2:2 syntenic distribution with LTRs (Supplementary Fig. 18). The depth pattern shown by JCVI (Fig. 2E, F) and was fur- 5mC methylation of CG contexts detected by Nanopolish ther verified by the 1:2 pattern of collinearity between software showed that the distribution of methylation C. alismatifolia and Amborella (Supplementary Fig. 15). coincided with the distribution of LTRs and DSDs This result indicates that, compared with Amborella,a (Supplementary Fig. 18). Considering that TEs are WGD event occurred in both C. alismatifolia and Z. known to promote gene replication (Bayer et al. 2020) officinale (Fig. 2E, F). We conducted the above analyses and angiosperm chromosome remodeling (Douglas and and reported validated evidence of an obvious WGD Futuyma 2017), and that both TEs and methylation can event in the C. alismatifolia genome. affect gene expression, such as cytosine DNA methyla- tion, which regulates TE silencing, imprinting, and gene Gene duplication events contribute expression (Bourque et al. 2018; Liu et al. 2018; Wang to the diversity of C. alismatifolia et al. 2019), we assessed their distribution in the gen- ome. We hypothesized that TE and exon methylation To investigate whether other gene duplication events might affect the distribution and expression of different played an important role in the evolution of diverse classes of repetitive genes and contribute to their dif- bract pigmentation, we identified five types of dupli- ferent evolutionary fates. Given this, we analyzed TE cated genes: 26,466 dispersed duplicates (DSD), 1777 insertions and methylation in exons, introns, and 1 kb proximal duplicates (PD), 2052 tandem duplicates (TD), regions [the length of intergenic regions was longer 14,159 transposed duplicates (TRD), and 11,120 from than 1 kb for most genes (Supplementary Fig. 19)] of whole-genome duplicates (WGD) (Fig. 3A, Supplemen- genes upstream and downstream in the five types of tary Tables 11 and 12). We compared the Ks and Ka/Ks duplicated genes. As expected, more DSD, PD, and TD distributions from different types of duplicated genes genes had TE insertions and higher methylation levels and found that the DSD, PD, and TD gene pairs had than the WGD and TRD genes (Fig. 3C, D, Supplemen- higher Ka/Ks ratios and smaller Ks values than those of tary Tables 13 and 14), especially in their exons and the other two types of duplicated genes (Supplementary upstream and downstream regions (Fig. 3C and The Author(s) 2022 aBIOTECH (2022) 3:178–196 183 The Author(s) 2022 184 aBIOTECH (2022) 3:178–196 bFig. 2 Evolution of the C. alismatifolia genome and gene families. a total of 3347 upregulated and 3350 downregulated A Phylogenetic tree constructed using maximum likelihood based genes in QMF (Fig. 4B, Supplementary Table 15). KEGG on the concatenation of single-copy nuclear genes. B The distri- enrichment analysis showed that these DEGs were bution of orthogroups in each species. C Venn diagram of shared enriched in the phenylpropanoid and flavonoid biosyn- and unique gene families in Zingiberales species. D The distribu- tion frequencies of synonymous substitutions (Ks) and substitu- thesis pathways (Fig. 4C), both of which are upstream of tions of 4dtv sites. E Synteny patterns between C. alismatifolia and anthocyanin synthesis. Z. officinale. F The 2:2 syntenic depth pattern between C. Based on previous reports (Belwal et al. 2020; Fer- alismatifolia and Z. officinale reyra et al. 2012) and our genome annotations, we analyzed the expression levels of all genes in the Supplementary Fig. 20). In addition, genes with TEs in anthocyanin synthesis pathway in different samples their exons had lower relative expression levels (Sup- (Belwal et al. 2020; Ferreyra et al. 2012) (Supplemen- plementary Fig. 21). Moreover, TEs in the exons and the tary Fig. 26, Supplementary Table 16). We found that upstream and downstream regions had a higher degree the tandem duplicated (TD) gene F3 5’H (gene13778)in of methylation than that in the other gene portions QMF was highly expressed in SeR at the S3 and S4 (Supplementary Fig. 22), which is consistent with a stages, but lowly expressed at the S1 and S2 stages. This previous study (Zhang et al. 2018). However, to date, gene was minimally expressed in the different tissues of there have been no general conclusions regarding the XCX (Fig. 4D, Supplementary Table 11). In addition, the effect of methylation in exon/intron/upstream/down- genes DFR (gene25158), ANS (gene437), and BZ1 stream regions on gene expression (Zhang et al. 2018). (gene16173) were also significantly downregulated in We found that the methylation levels in these four XCX (Fig. 4D), among which DFR and ANS had a higher regions were approximately equivalent (Fig. 3D). Com- expression level and their expression patterns were pared to the WGD and TRD genes, the DSD, PD, and TD similar to the bract color accumulation in the pink genes had lower expression in multiple tissues and at bracts. In addition, transcription factors were identified different developmental stages (Supplementary Fig. 23). using iTAK (Supplementary Fig. 27 and Supplementary Therefore, the results showed that the higher the degree Tables 17 and 18). The subgroups of 4–7, 44, and 79 of of methylation, the lower is the gene expression. The MYB regulators (Wu et al. 2022), IIIf, IIId ? e, and IVd expressions of DSD, PD, and TD genes were more of bHLH regulators (Xie et al. 2012; Zhao et al. affected by methylation than WGD and TRD (Fig. 3E). 2018, 2019, 2020) and WD40 protein TTG1 homologs DSD, PD, and TD genes also had longer intron lengths (Baudry et al. 2004; Belwal et al. 2020; Gonzalez et al. and relatively higher TE content than the WGD and TRD 2008) have been reported to be involved in anthocyanin genes (Fig. 3C and Supplementary Fig. 24). synthesis; these most important subgroups of MYB, bHLH, and WD40 genes (Supplementary Figs. 28 and Bract pigmentation genes in C. alismatifolia 29, Supplementary Table 19) were identified by a phy- logenetic analysis based on homologs from A. thaliana To investigate the mechanisms underlying the impor- and SG79 in other species (Wu et al. 2022). On this tant trait of bract color in C. alismatifolia, a stable all- basis, a total of 12 candidate regulators were screened white bract morph, two cultivars of C. alismatifolia, i.e., out (Supplementary Fig. 30A, Supplementary Table 20) ‘‘Country Snow’’ (XCX) and ‘‘Chiang Mai Pink’’ (QMF) and further narrowed down to eight genes according to were selected to conduct comparative transcriptomic the weighted correlation network analysis (WGCNA) analyses. Light microscopic observation showed that the (Supplementary Fig. 30B and 31). Finally, ten genes, color of the inner whorl bract tips of QMF was a mix of including DFR, ANS,2 MYB (SG7 subgroup of MYB: red and green colors (Supplementary Fig. 25), and the gene39947, SG44 subgroup of MYB: gene20923), and 6 anthocyanin content of its inner whorl bract bases at S4 bHLH (IVd subgroup of bHLH: gene46097, IIId ? e stage was 0.2338 mg/g fresh weight (FW). Tissue subgroup of bHLH: gene29974, gene45529, gene40971, samples were collected from the outer all-green bract gene51401, and IIIf subgroup of bHLH: gene32335) (Br), tip (SeG), and base (SeR) of the inner whorl genes were verified by qRT-PCR. The qRT-PCR results ornamental bract at four developmental stages, i.e., showed that the expression patterns of these genes including the early stage of bract initiation (S1), tip were consistent with the bract coloring period in the coloring stage (S2), base color transition stage (S3), and RNA-seq results (Supplementary Fig. 30c). Therefore, color change completion stage (S4), and used for tran- we believe that these genes play a crucial role in scriptomic analyses (Fig. 4A). anthocyanin synthesis and are also closely related to the We identified the differentially expressed genes formation of white bracts, because the qRT-PCR results (DEGs) of SeR at S4 in the QMF vs XCX, which contained The Author(s) 2022 aBIOTECH (2022) 3:178–196 185 Fig. 3 Gene duplication and evolution. A Genes derived from different modes of duplication in four different species. The gene types are whole-genome duplication (WGD), tandem duplication (TD), proximal duplication (PD), transposed duplication (TRD), and dispersed duplication (DSD). B Top five enriched pathways for each duplicated type in C. alismatifolia based on the KEGG analysis. C Percentage of different duplicated gene types which contain TEs in an exon, intron, and 1 kb upstream or downstream sequence from each CDS. D Percentage of cytosine methylation in different duplicated gene types in an exon, intron, and 1 kb upstream or downstream sequence from each CDS. E The relationship between methylation and gene expression among different types of duplicated genes The Author(s) 2022 186 aBIOTECH (2022) 3:178–196 showed that these genes were not expressed in the SeR sesquiterpenoid and monoterpenoid synthesis, were of white bracts. downregulated as the flowers developed, implying that Moreover, population structure analysis based on 56 this kind of aroma will fade with full flowering; how- C. alismatifolia cultivars showed that these samples ever, more evidence is needed for other aromatic sub- were mainly divided into two groups, among which the stances (Supplementary Fig. 35B–C). inner bracts of samples in Group 2 possessed a similar morphology and color in the outer bracts, while there Bulked segregant analysis (BSA) isolates the key were differences between the inner and outer bracts of genes related to bract color of C. alismatifolia samples in Group 1 (Supplementary Fig. 32A). Principal component analysis (PCA) results also showed differ- To further understand the molecular mechanism of entiation between these two groups (Supplementary bract pigmentation, we sequenced two bulked popula- Fig. 32B). To further verify the differences in bract color tions with different bract colors (pink and red), each at the population level, one population with red inner consisting of 50 offspring from an F1 population con- whorl bract bases (SeR) (13 individuals) and one with taining 985 individuals from a cross between the red white or green SeR(14 individuals) were selected to line (C. alismatifolia ‘‘Scarlet’’ (JL), female parent with calculate Fst, and the results of the red and non-red red bract) and pink line (C. alismatifolia ‘‘Dawn’’ (LM), populations further confirmed that 8 of the 14 candi- male parent with pink bract), to a depth of date genes (gene25158, gene437, gene318, gene39947, 50 9 (Fig. 5A). DNA and RNA from four samples gene29974, gene45529, gene32394, and gene32335) [P1(LM), P2(JL), S1(LM), and S2(JL)] were sequenced were differentiated at the population level (Supple- for BSA and bulked segregant RNA-seq (BSR) analysis. mentary Fig. 32C). We identified SNPs between the two parental lines and A previous study reported that the inner whorl bract computed the SNP index for the red line and pink line tips (SeG) are green in C. alismatifolia, which is a phe- bulked populations as well as their differences (G’ value) nomenon of chlorophyll deposition (Ding et al. 2021) using a 1000 kb sliding window with a step size of 10 kb, (Fig. 4A). The results of WGCNA, a widely used tran- as described by Mansfeld and Grumet (2018). Three scriptome analysis method (Langfelder and Horvath genomic regions containing 1547 genes contributing to 2008), revealed one chlH (gene55716) and one CAO bract color, with a confidence threshold exceeding 95% (gene49893) gene in chlorophyll synthesis as the hub were identified (Supplementary Table 23). Similar results genes in the module most associated with the green trait were obtained in the BSR analysis (Supplementary (Supplementary Figs. 33 and 34, Supplementary Fig. 36 and Supplementary Table 24). Based on annota- Table 22). To understand the molecular mechanism of tion information (Supplementary Table 25), we identified chlorophyll synthesis, we analyzed the expression pat- 31 genes that were potentially related to anthocyanin terns of all chlorophyll synthesis and metabolic pathway synthesis (Fig. 5B–D, Supplementary Table 26). Among 0 0 genes in different tissues (Fu et al. 2021; Wang et al. them, three candidate genes, including one F3 5 H gene 2020) (Fig. 4E, Supplementary Table 21). We found that (gene13778) and two MYB genes (SG14: gene13102 and the downstream genes of the TD gene chlH (gene55716, SG4: gene14458) in QTL1, were identified for bract pig- gene55715) were minimally expressed in the white mentation based on gene expression, single nucleotide samples (Fig. 4E), suggesting that the chlH is a key gene variants and structural variations (Fig. 5D, Supplemen- controlling chlorophyll synthesis in C. alismatifolia bracts. tary Table 25). The promoter of F3 5’H gene had a 29 bp In addition to bract color, floral scent is an important heterozygous region 94 bp upstream of the start codon in economic trait of ornamental plants. Previous observa- LM, and this same region had zero sequencing depth in JL 0 0 tion showed that different cultivars had different floral (Fig. 5E). In addition, the second intron of F3 5 H gene scents, and the ‘‘true’’ flowers of C. alismatifolia were the showed an abnormal length with a heterozygous deletion main sources of floral scents rather than its colorful of approximately 16.8 kb in length according to the gen- bracts. Therefore, GC–MS was used to analyze the floral ome annotation of QMF (Supplementary Fig. 37a). For scents originating from the flowers of C. alismatifolia the two MYB genes, the allele frequencies of the SNP in ‘‘Chiang Mai Pink’’ and revealed its volatile compounds, gene13102 promoter at LG03_61315110 and sesquiterpenoids and monoterpenoids, associated with LG03_61315135 in S1 of LM and S2 of JL were different. floral scent, since the duplicated genes were mainly The SNP at LG03_61315110 in S1 of LM was C/A (55%/ enriched in monoterpenoid biosynthesis (Supplemen- 45%), whereas in S2 of JL pool was C (100%). The SNP at tary Fig. 35A). Further analysis of the transcriptomes at LG03_61315135 in S1 of LM was A/T (53%/47%), different flower developmental stages revealed that whereas in S2 of JL pool, it was A (100%) (Supplementary TPS-a and TPS-b genes, which are mainly related to Fig. 38). The allele of the SNP in gene14458’exonat The Author(s) 2022 aBIOTECH (2022) 3:178–196 187 The Author(s) 2022 188 aBIOTECH (2022) 3:178–196 bFig. 4 The anthocyanin and chlorophyll biosynthetic pathway with an impact on the diversity of C. alismatifolia that genes identified by RNA-seq in the all-white bract morph C. promotes the coloration of bracts. Gene duplication alismatifolia and C. alismatifolia ‘‘Chiang Mai Pink’’. A Locations of plays an important role in evolutionary adaptation by tissue samples for RNA-seq. Br (outer all-green bract), SeG (inner providing ‘‘new’’ genetic material to the genome (Dyson whorl bract tips), and SeR (inner whorl bract base) of C. alismatifolia ‘‘Chiang Mai Pink’’ (QMF) at different developmental and Goodisman 2020), and the WGD events are regar- stages and a white morph C. alismatifolia ‘‘Country Snow’’ (XCX) at ded as an important source of such diversity (Wu et al. S4 (bar of S1: 0.2 cm, bar of S2: 0.3 cm, bar of S3: 0.4 cm, bar of 2020). It is now clear that the genomes of extant seed S4: 1.5 cm). B KEGG enrichment of DEGs in the QMF SeR vs XCX plants and angiosperms have undergone multiple WGD SeR at S4 stage. C Anthocyanin biosynthetic pathway in C. alismatifolia. Heatmaps show the FPKM with Log2 transformation events and share an ancient polyploid ancestor. In of genes in SeR of QMF at S1–S4 stages and XCX at the S4 stage. addition, some angiosperms have undergone repeated Enzyme abbreviations: PAL phenylalanine ammonium lyase, C4H independent whole-genome duplication events in recent cinnamate-4-hydroxylase, 4CL 4-coumaroyl-CoA synthase, CHS times. For example, the banana (Musa acuminata) has chalcone synthase, CHI chalcone isomerase, F3H flavanone 3-hy- 0 0 0 0 0 0 droxylase, F3 H flavonoid-3 -hydroxylase, F3 5 H flavonoid-3 ,5 - three independent WGD events (D’Hont et al. 2012). hydroxylase, DFR dihydroflavonol 4-reductase, ANS anthocyanidin Here, by identifying the collinear blocks of C. alismati- synthase, BZ1 anthocyanidin 3-O-glucosyltransferase. D Chloro- folia and Z. officinale, we found that C. alismatifolia and phyll biosynthetic and degradation pathway in C. alismatifolia. Z. officinale have a 2:2 collinear relationship (Fig. 2E, F), Enzyme abbreviations: HemA glutamyl-tRNA reductase, HemL glutamate-1-semialdehyde 2,1-aminomutase, HemB porphobilino- together with the Ks distribution, suggesting that one gen synthase, HemC hydroxymethylbilane synthase, HemD uro- WGD event occurred in the ancestor of these two spe- porphyrinogen-III synthase, HemE uroporphyrinogen cies (Fig. 2D–F). The WGD events in monocots are decarboxylase, HemF coproporphyrinogen III oxidase, HemY complex (Jiao et al. 2014;Wuetal. 2020). Previous protoporphyrinogen/coproporphyrinogen III oxidase, chlH mag- nesium chelatase subunit H, chlM, magnesium-protoporphyrin studies have also indicated that ancestors of Z. officinale O-methyltransferase, chlE magnesium-protoporphyrin IX mono- may have undergone multiple WGD events (Cheng et al. methyl ester, por protochlorophyllide reductase, DVR divinyl 2021b;Li etal. 2021). We further confirmed these WGD chlorophyllide a 8-vinyl-reductase, CAO chlorophyllide a oxyge- events with the fitted Ks results of WGDI for C. alis- nase, chlG chlorophyll/bacteriochlorophyll a synthase, NOL chloro- phyll(ide) b reductase, HCAR, 7-hydroxymethyl chlorophyll a matifolia and the corresponding Ks distribution of C. reductase, CLH chlorophyllase, SGR magnesium dechelatase PAO alismatifolia with Amborella (Supplementary Figs. 40 pheophorbide a oxygenase, RCCR red chlorophyll catabolite and 41), combined with the weak signals in the median reductase. E Expression heatmaps of genes in the chlorophyll Ks distribution of collinear blocks of the Z. officinale biosynthetic and degradation pathway paralogous gene pairs in WGDI collinear blocks (Sup- plementary Fig. 42). From this, it appears that both C. LG03_69765180 was A/G (63%/38%), whereas in the S2 alismatifolia and Z. officinale have an additional WGD of JL pool, it was A (100%), implying that the structural weak signal, which was further supported by the results variations in these genes may be responsible for the red of Tree2GD, suggesting WGD should have occurred once or pink bract color (Supplementary Fig. 39). Further- before the divergence of Zingiberaceae and Musaceae as more, two MYB genes, four bHLH genes, seven WRKY well as once before the divergence of C. alismatifolia and genes, and two WD40 genes were found to be differen- Z. officinale (Supplementary Fig. 43 and 44). Sampling tially expressed in P2 vs. P1 and S2 vs. S1 based on the of more species in this group is needed to support the RNA data of P1(LM), P2(JL), S1(LM), and S2(JL) (Sup- inferred additional WGD event. Based on the results of plementary Tables 27 and 28). Among the four bHLH TreePL and MCMCtree, the most recent WGD event genes (gene21747, gene41577, gene46097, and occurred after the divergence of C. alismatifolia and Z. gene55364), gene55364 belongs to the IIId ? e subgroup officinale ancestors from ancestors of Musa and before of bHLH, while gene46097 belongs to the IVd subgroup of the divergence of C. alismatifolia with Z. officinale bHLH (Supplementary Table 19), which have been pre- (Fig. 2A and Supplementary Fig. 45). viously reported to be involved in the biosynthesis of anthocyanins. Gene duplication as a key to the evolution of diverse bract pigmentation DISCUSSION Five duplicated gene types were identified and com- Monocotyledons have more complex WGD events pared in C. alismatifolia, Z. officinale, and M. acuminata (Fig. 3A). In previous studies of the dicotyledonous Here, we present a high-quality chromosome-scale plant Rhododendron, TDs and PDs were found to con- genome assembly of Curcuma species and identify an tribute to an increase in the ratio of enzymatic genes in obvious WGD event, as well as other duplicated genes, the anthocyanin biosynthesis pathway, suggesting that The Author(s) 2022 aBIOTECH (2022) 3:178–196 189 Fig. 5 BSA-seq reveals the F3 5’H gene as a candidate gene responsible for red bract pigmentation of C. alismatifolia. A Left: C. alismatifolia ‘‘LM’’; right: C. alismatifolia ‘‘JL’’. Bar: 3 cm. B Genome-wide G value for allele frequency of SNPs between F1 hybrids S1 (LM) and S2 (JL) pools. C Anthocyanin biosynthesis related genes in BSA signal regions. D Expression heatmap of 31 anthocyanin biosynthesis- related genes from BSA signal regions. E Gene structure of candidate gene according to genome annotation of QMF and sequence depth in parents LM (P1) and JL (P2), and F1 hybrids LM (S1) and JL (S2) TDs and PDs are important in the evolution of flower that has been transposed (Supplementary Fig. 46). color diversity (Yang et al. 2020). In our study, we found Overall, TD, PD, and DSD can be regarded as a general that the TD and PD genes of Zingiberales were signifi- class and appear to be involved in the evolution of color cantly enriched in the upstream steps of the antho- diversity in C. alismatifolia bracts. Here, we also found cyanin synthesis pathway. In terms of TE content, that there was no significant difference in the methyla- degree of methylation, Ks, and gene expression, TD, PD, tion levels between exons, introns, or upstream or and DSD were more similar to WGD or TRD genes downstream regions of different categories of dupli- (Fig. 3C–E, Supplementary Figs. 16, 17, 18, 19, 20, 21, cated genes (Fig. 3D). However, WGD and TRD genes 22, 23 and 24). The Venn diagram of these five types of had lower levels of methylation and higher gene genes shows that TD and PD share 117 genes, whereas expression (Fig. 3D and Supplementary Fig. 23) and WGD and TRD share 4639 genes, indicating that Dup- were involved in growth and development processes Gen_finder (Qiao et al. 2019) may not be able to clearly (Fig. 3B and Supplementary Fig. 17), suggesting that discern duplicates of these types in all cases or that genes retained after WGD or TRD duplication are more terms used to define these duplicate types can overlap. often related to conserved functions. This result is For instance, no such overlap was found among the consistent with previous studies, where the retention other groups, indicating that some duplicates likely and loss of repetitive genes after WGDs are not random, fulfilled the criteria for two categories, such as a WGD but have a bias for genes related to signal transduction, The Author(s) 2022 190 aBIOTECH (2022) 3:178–196 transcription factors, and genes related to development. Zingiberales; C. alismatifolia had 8 F3 5’H genes, Z. 0 0 The existence of gene retention bias may result from the officinale had 6 F3 5 H genes, and M. acuminata had 4 functional divergence of genes related to adaptation to F3 5’H genes according to gene annotations, among 0 0 novel environments (Wu et al. 2020). However, how which the F3 5 H gene in Z. officinale had a length of DNA methylation affects the evolution and retention of 12,746 bp with 1040 amino acids (gene13778 had 957 these two types of duplicated genes requires further amino acids), as well as six exons, and its longest intron study. had a length of 8,702 bp (Supplementary Fig. 37B, C). The phylogenetic tree showed that gene13778 also 0 0 A complex regulatory mechanism controls clustered with F3 5 H genes in Petunia (Supplementary the color diversity in C. alismatifolia bracts Fig. 37B), which also confirmed the identity of this gene. However, considering the complexity of this region, 0 0 0 Here, we also identified that F3 5’H, DFR, and ANS are where multiple F3 5 H genes repeat in tandem, and the key genes in the anthocyanin biosynthesis pathway, and limitations of existing gene structure prediction soft- transcriptome analysis revealed that DFR and ANS play ware, we believe that more experimental evidence is 0 0 a more critical role, and the role of DFR gene in C. needed in the future to verify the structure of F3 5 H alismatifolia has also been verified (Petchang et al. gene and reveal the differences in the expression levels 2017). Several regulatory factors were identified to be of different transcripts, even the divergence within C. closely related to DFR and ANS, among which the SG7 alismatifolia cultivars. In summary, the key genes subgroup of MYB gene39947 (Supplementary Fig. 47) involved in anthocyanin synthesis pathways were had a stronger correlation with DFR and ANS, and qRT- identified, and resequencing analyses found that genetic PCR results verified that it has a very similar expression differentiation was associated with different bract color trend as DFR and ANS.A TRANSPARENT TESTA 8 (TT8) groups, concluding that color formation in C. alismati- homologous gene, gene32335, was also identified (Sup- folia bracts is a complex process, which also needs to be plementary Fig. 48). TT8 is considered sufficient for the verified by further studies. expression of DFR and ANS genes and is reported to be one of the key regulators of anthocyanin production in many plant species (Yan et al. 2021). In addition, pre- CONCLUSIONS vious reports revealed that LcbHLH92a and LcbHLH92b in sheepgrass are involved in anthocyanin and proan- Our study found that different categories of duplicated thocyanidin synthesis (Zhao et al. 2019). In our study, genes in C. alismatifolia genome were diversified in an AtBHLH92 homologous gene, gene46097 (Supple- function by the duplicate type. This includes the evo- mentary Fig. 48), was simultaneously identified by BSA lution of different colored bracts of C. alismatifolia and transcriptome analysis in our study, suggesting a through the tandem duplication of genes and subse- crucial role. Therefore, we believe that there is a com- quent changes in gene structure. We identified the key 0 0 plex mechanism controls bract color formation. WGCNA anthocyanin synthesis genes DFR, ANS and F3 5 H and has predicted a regulatory network, which also provides chlorophyll synthesis genes chlH and CAO in the for- a basis for subsequent experimental verification. In mation of bract color and inferred the potential contri- addition, candidate gene F3 5’H (gene13778), which is a bution of individual members of the transcription factor flavonoid-3’,5’-hydroxylase, is regarded as a member of gene families. These results provide a basis for further the cytochrome P450 family and is a crucial enzyme identification of gene function in C. alismatifolia and required for producing blue or purple flowers was related species. In conclusion, the reference genome of identified (Hopkins and Rausher 2011; Shimada et al. C. alismatifolia presented in this study provides a key 1999). Our study also showed that the F3 5’H gene resource for further studies and development of novel (gene13778) has a long length in C. alismatifolia ‘‘Chiang cultivars through marker-assisted breeding and genome Mai Pink’’ (Supplementary Fig. 37A), mainly due to the editing. second intron being over 20 kb (verified by mapping HiFi reads) as well as a 16.8 kb heterozygous deletion (Supplementary Fig. 37A). Previous studies have found MATERIALS AND METHODS that plant introns are generally shorter than those of animals, even in species with large genomes (Jin 2007), Plant materials making the F3 5’H gene (gene13778)in C. alismatifolia an extreme outlier to this general pattern. We also found The C. alismatifolia cultivar ‘‘Chiang Mai Pink’’ was 0 0 that the copy number of F3 5 H gene varied in selected for whole-genome sequencing and assembly. The Author(s) 2022 aBIOTECH (2022) 3:178–196 191 The cultivar was planted in the greenhouse at Shenzhen above for further analysis. LACHESIS (Burton et al. Institute of Agricultural Genomics, Chinese Academy of 2013)(https://github.com/shendurelab/LACHESIS) Agricultural Sciences. Leaf tissue was used for whole- was used to further aggregate, sequence, and locate genome sequencing, while flowers, leaves, and young scaffolds onto the chromosomes. Finally, the errors of stems were subjected to RNA sequencing (RNA-seq) to placement and orientation were corrected with manual support genome annotation and analyze of gene adjustment. The final chromosome anchor rate was expression levels (Supplementary File 1). 95.25%. Library preparation and sequencing Genomic evaluation We extracted the DNA from leaves by following the The integrity of the assembled genome was assessed by procedures of Qiagen Genomic DNA kit. According to the BUSCO v4.0.5 (Simao et al. 2015) based on single-copy standard protocol of PacBio, 20 kb preparation solution homologous genes in the OrthoDB database was used to obtain the SMRTbell target size library embryophyta_odb10. CEGMA v2 (Parra et al. 2007)was (Pacific Biosciences, CA, USA), and then the HiFi data also used to predict the genome completeness based on was generated with CCS software (https://github.com/ its database. HISAT2 v2.2.1 (Kim et al. 2019) was used PacificBiosciences/ccs). Genomic DNA (1–1.5 lg) was to map the RNA-seq data from flowers, pedicels, and randomly interrupted into 200–400 bp fragments and leaves, BWA v0.7.17-r1188 (Li and Durbin 2010)was sequenced on the MGI-SEQ 2000 platform. The total used to map the MGI data, and minimap2 v2.21-r1071 RNA of all sample materials were extracted with RNA- (Pertea et al. 2016) was used to map the HiFi data to prep pure Plant Kit (TIANGEN), and the RNA sequenc- genome to calculate the mapping rate. ing was carried out by MGI-SEQ 2000 sequencing platform. The Hi-C library was constructed with DpnII Repetitive sequence annotation restriction enzyme and sequenced on MGI-SEQ T7. The Fastp v0.19.4 was used to perform quality control (Chen The genome repeat sequence annotation was conducted et al. 2018). by using the Extensive de novo TE Annotator (EDTA) v1.9.4 (Su et al. 2021), a toolkit for de novo annotating K-mer analysis and genome assembly TEs in whole-genome datasets. The LAI and insertion time were calculated by using LTR_retriever v2.9.0 (Ou The Jellyfish v2.3.0 program (Marcais and Kingsford et al. 2018; Ou and Jiang 2018), with a substitution rate 2011) and Kmerfreq (Liu et al. 2013) were used to of default 1.3e-8 Ks/year. conduct the k-mer analysis by using the MGI data to estimate the genome size Genomes were assembled by Gene structure and function annotation using 30.35 Gb of high-quality HiFi reads using hifiasm v0.12 software with default parameters (Cheng et al. Trinity v2.8.5 (Grabherr et al. 2011) was used to 2021a)(https://github.com/chhylp123/hifiasm), lead- assemble the transcripts for predicting genes. Then the ing to a 1.22 Gb preliminary assembled genome. Using transcript-based predictions was conducted with PASA the quality-controlled MGI data, the genome was pol- v2.4.1 (Haas et al. 2003). We also performed homology ished using Nextpolish v1.2.4 software (Hu et al. 2020) predictions by using the protein sequences of M. bal- for four iterations, and the corrected genome was bisiana (NCBI, GCA_004837865.1), Z. mays (Phytozome compared with the nr/nt database (NCBI) to remove V13, v4), O. sativa (Phytozome V13, v7.0), and Z. offici- possible sequences originating from biological contam- nale (NCBI, GCA_018446385.1), and mapped them to ination (such as endophytes). the genome of C. alismatifolia, with these homology annotation results being input to Augustus v3.3.3 Hi-C scaffolding (Stanke et al. 2008) for training. The genes from PASA v2.4.1 (Haas et al. 2003) were further used to train the A total of 109.71 Gb clean paired-end reads generated GlimmerHMM v3.0.4 (Majoros et al. 2004), SNAP from Fastp v0.19.4 (Chen et al. 2018) were mapped to v2006-07–28 (Korf 2004) and Augustus v3.3.3 (Stanke the 1.19 Gb preliminary assembled genome by using et al. 2008) software to get results of de novo gene bowtie2 v2.3.2 (Langmead and Salzberg 2012), then the prediction. We used Evidencemodeler v1.1.1 (Haas et al. unique mapped reads were obtained. HiC-Pro v2.8.1 2008) to integrate all above evidence, and the results (Servant et al. 2015) identified and retained valid were re-trained using PASA v2.4.1 (Haas et al. 2003) for interactive paired reads from unique reads described one final round of gene annotation. The Author(s) 2022 192 aBIOTECH (2022) 3:178–196 Blastp v2.9 (ftp://ftp.ncbi.nlm.nih.gov/blast/execu C. alismatifolia. We identified five types of duplicated tables/blast?/LATEST/) was used to annotate the gene genes in C. alismatifolia, Z officinale, and M. acuminata functions by comparing the protein sequences of C. by utilizing the DupGen_finder (Qiao et al. 2019) soft- alismatifolia with Swiss-Prot (uniprot_sprot), NR ware with default parameters. WGDI (https://github. (https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz) com/SunPengChuan/wgdi) was used to verify the WGD and KOG (https://ftp.ncbi.nih.gov/pub/COG/KOG/) results. The program JCVI v1.1.18 (Tang et al. 2008)was databases with feature assignments made based on the used to further analyze the collinearity of C. alismatifolia best hits. Interpro annotation and GO annotation were and Z. officinale. performed by interproscan-5.21–60.0 (Blum et al. 2021). The KAAS server (https://www.genome.jp/ Methylation analysis kegg/kaas) was used to identify the KEGG pathway. For a more complete and comprehensive functional anno- Genomic DNA (2 lg) was obtained for ONT (Oxford tation, we employed egg-mapper v2.0.1 (Huerta-Cepas Nanopore Technology) library preparations and then et al. 2017), to further perform the functional annota- sequenced on the ONT PromethION sequencer. The call- tion based on eggNOG v5.0 (Huerta-Cepas et al. 2019). methylation module of Nanopolish v0.13.2 (https:// In addition, tRNAscan-SE v2.0.8 (Chan et al. 2021)was github.com/jts/nanopolish) was used to analyze used to annotate tRNA, barrnap v0.9 (https://github. 5-methylcytosine in the CG context in the genome based com/tseemann/barrnap) was used to annotate rRNA on Fast5 files, then the results were filtered according to and cmscan v1.1.2 from the software suite Infernal the condition of methylated_frequency C 0.5. (Nawrocki and Eddy 2013) was used to annotate other ncRNAs based on the Rfam database (Kalvari et al. Transcriptome analyses and gene co-expression 2021). networks Orthologous gene family identification, The clean RNA-seq reads from different sample were phylogenetic analysis, estimation of divergence mapped to the C. alismatifolia genome with HISAT2 time, and expansion and contraction of gene 2.2.1 (Kim et al. 2019), and StringTie v2.1.6 (Pertea family expansion et al. 2016) was used to calculate the FPKM of genes in each sample with Log2 transformation, while using the By using OrthoFinder v2.5.2 (https://github.com/davi Log2FPKM C 0.5 cutoff for the gene in at least one demms/OrthoFinder), we obtained orthogroups for 15 sample to ensure that the gene was expressed. Fea- species (Supplementary Table 5), and the protein tureCounts v2.0.1 (Liao et al. 2014) was used to calcu- sequences of 217 single-copy orthogroups were late the counts of each gene in each sample, then the obtained. The MAFFT v7.490 software with default differentially expressed genes (DEGs) of QMF SeR vs settings (Katoh and Standley 2013) was used to per- XCX SeR at the S4 period were analyzed by utilizing form sequence alignments for each single-copy gene DESeq2 v1.34.0 software (Love et al. 2014). The fol- family and the alignments were converted to a nucleo- lowing FC value range was used as the criterion for tide matrix by pal2nal v14 (Suyama et al. 2006). Under selecting DEGs: |log2FoldChange|C 1.5, adjusted the GTRGAMMA model, phylogenetic analysis was con- P value B 0.01. Based on the FPKM of genes, the R structed with RAxML v8.2.12 (Stamatakis 2014). The package WGCNA v1.70-3 (Langfelder and Horvath MCMCTree program of the PAML v4.9 (Yang 2007)was 2008) was used to build the co-expression network. further applied for estimation of species divergence Primers designed for qRT-PCR use were tested and lis- time based on three soft bounds at three nodes (Cheng ted in Supplementary Table 29. The Applied Biosys- TM TM TM et al. 2021b). The expansion and contraction analyses tems PowerUp SYBR Green Master Mix (Thermo was performed on the basis of the dated phylogeny tree Fisher Scientific, US) was used for qRT-PCR with a CFX TM and the homologous gene families from 15 species using Connect Real-Time System (BIO-RAD, US). Two bio- the CAFE v4.21 program (Bie et al. 2006). GO and KEGG logical repeats and two technical repeats were carried enrichment analyses were then performed with genes in out for each gene, and the relative expression level was -DDCT significantly expanded families. calculated through the comparative 2 method. Duplicated gene identification and WGD analysis GO and KEGG enrichment To study the size evolution of the C. alismatifolia gen- Based on the results from eggnog-mapper v2.0.1 ome, we identified whole-genome duplication events in (Huerta-Cepas et al. 2017, 2019) software, the protein The Author(s) 2022 aBIOTECH (2022) 3:178–196 193 gratefully acknowledge Daniel B Sloan (Colorado State University) sequences of M. acuminata, Z. officinale and C. alismat- and the personnel of the Wu laboratory for help with providing ifolia were functionally annotated, and the GO and KEGG suggestions and revising the manuscript. annotation results of the genes were extracted. With the help of the R package AnnotationForge v1.36.0 (https:// Author contributions ZW and GZ designed the experiments. YY, bioconductor.org/packages/AnnotationForge/). The JT, JZ, RJ, YX, and JL built the hybrid population. ZW, JY, YY, XZ, WL, LT, and XL conceived and interpreted the data. XL, DP, and XZ clusterProfiler v4.2.1 (Wu et al. 2021) program was performed the bioinformatic analysis. XL, HM, and FG performed used for GO and KEGG enrichment analysis. The visu- the experiments. XL wrote the first manuscript. ZW, JY, WL, LT, and alization of enrichment results was generated with R GZ revised the manuscript, which all authors edited and approved. package ggplot2 (https://github.com/tidyverse/ ggplot2). Data availability The genome sequence data including HiFi, ONT, and Hi-C data were deposited in the Genome Warehouse in National Genomics Data Center (NGDC), under accession number Bulked segregant analysis CRA006523, while the whole-genome assembly was also depos- ited under accession number GWHBHOP00000000 that is publicly The C. alismatifolia ‘‘Scarlet’’ (JL, red line) and C. alis- accessible at https://ngdc.cncb.ac.cn/gwh. matifolia ‘‘Dawn’’ (LM, pink line) used for BSA were Declarations planted in the Environmental Horticulture Research Institute, Guangdong Academy of Agricultural Sciences. Conflict of interest The authors declare that they have no con- The individual plants of JL and LM grown under natural flict of interest. conditions were used for crossbreeding to obtain F1 Open Access This article is licensed under a Creative Commons hybrid populations. The segregation ratio was 502 (with Attribution 4.0 International License, which permits use, sharing, the same red bract as JL): 483 (with the same pink bract adaptation, distribution and reproduction in any medium or for- as LM). Sequencing of the extracted DNA and RNA from mat, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons parental LM (P1) and JL (P2), F1 hybrids LM (50 indi- licence, and indicate if changes were made. The images or other viduals mixed, S1) and JL (50 individuals mixed, S2) was third party material in this article are included in the article’s carried on an Illumina NovaSeq 6000 platform. Data Creative Commons licence, unless indicated otherwise in a credit filtering and quality control were performed by Fastp line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted v0.19.4 (Chen et al. 2018). BWA v0.7.17-r1188 (Li and by statutory regulation or exceeds the permitted use, you will Durbin 2010) and STAR v2.7.9a (Dobin et al. 2013) were need to obtain permission directly from the copyright holder. To used to map DNA and RNA data to the genome, sepa- view a copy of this licence, visit http://creativecommons.org/ rately, then GATK v4.2.2.0 (DePristo et al. 2011)was licenses/by/4.0/. used for SNP calling. Finally, BSA and BSR analysis was performed using the R package QTLseqr v0.7.5.2 (Mansfeld and Grumet 2018). The quantitative method References of transcript is the same as described in transcriptome analyses, and the R package edgeR v3.36.0 (Robinson Akter R, Hasan R, Siddiqua SA et al (2008) Evaluation of analgesic et al. 2010) was used to analyze the differentially and antioxidant potential of the leaves of Curcuma alismat- expressed genes of S2 vs S1 and P2 vs P1, using the ifolia Gagnep. Stamford J Pharm Sci 1:3–9. https://doi.org/10. following FC value range as the criteria for selecting 3329/sjps.v1i1.1779 Baudry A, Heim MA, Dubreucq B et al (2004) TT2, TT8, and TTG1 DEGs: |logFC|C 1, FDR B 0.01. synergistically specify the expression of BANYULS and A detailed description of the above analysis, as well proanthocyanidin biosynthesis in Arabidopsis thaliana. Plant as the methods used for other analyses is listed in J 39:366–380. https://doi.org/10.1111/j.1365-313X.2004. Supplementary File 1. 02138.x Bayer PE, Golicz AA, Scheben A et al (2020) Plant pan-genomes Supplementary InformationThe online version contains are the new reference. Nat Plants 6:1389–1389. https://doi. supplementary material available at https://doi.org/10.1007/ org/10.1038/s41477-020-00776-y s42994-022-00081-6. Belwal T, Singh G, Jeandet P et al (2020) Anthocyanins, multi- functional natural products of industrial relevance: recent Acknowledgements These studies were supported by the start- biotechnological advances. Biotechnol Adv 43:107600. up fund of the Chinese Academy of Agricultural Sciences Elite https://doi.org/10.1016/j.biotechadv.2020.107600 Youth Program, Kunpeng Institute of Modern Agriculture at Bie TD, Cristianini N, Demuth JP et al (2006) CAFE: a computa- Foshan to Zhiqiang Wu, as well as supported by the opening tional tool for the study of gene family evolution. Bioinfor- project of Laboratory of Ecology and Evolutionary Biology from matics 22:1269–1271. https://doi.org/10.1093/ Yunnan University and Shenzhen Zhongnonghuadu Ecological bioinformatics/btl097 Technology Co., Ltd. (R20012) to Z.W., and the USDA National Blum M, Chang HY, Chuguransky S et al (2021) The InterPro Institute of Food and Agriculture Hatch project 02685 to W.L. We protein families and domains database: 20 years. Nucleic The Author(s) 2022 194 aBIOTECH (2022) 3:178–196 Acids Res 49:344–354. https://doi.org/10.1093/nar/ (Zingiberaceae). Jpn J Trop Agric 49:14–20. https://doi.org/ gkaa977 10.11248/JSTA1957.49.14 Bourque G, Burns KH, Gehring M et al (2018) Ten things you Gonzalez A, Zhao M, Leavitt JM et al (2008) Regulation of the should know about transposable elements. Genome Biol anthocyanin biosynthetic pathway by the TTG1/bHLH/Myb 19:199. https://doi.org/10.1186/s13059-018-1577-z transcriptional complex in Arabidopsis seedlings. Plant J Burton JN, Adey A, Patwardhan RP et al (2013) Chromosome-scale 53:814–827. https://doi.org/10.1111/j.1365-313X.2007. scaffolding of de novo genome assemblies based on chro- 03373.x matin interactions. Nat Biotechnol 31:1119–1125. https:// Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length doi.org/10.1038/nbt.2727 transcriptome assembly from RNA-Seq data without a refer- Chakraborty A, Mahajan S, Jaiswal SK et al (2021) Genome ence genome. Nat Biotechnol 29:644-U130. https://doi.org/ sequencing of turmeric provides evolutionary insights into its 10.1038/nbt.1883 medicinal properties. Commun Biol 4:1193. https://doi.org/ Grotewold E (2006) The genetics and biochemistry of floral 10.1038/s42003-021-02720-y pigments. Annu Rev Plant Biol 57:761–780. https://doi.org/ Chan PP, Lin BY, Mak AJ et al (2021) tRNAscan-SE 2.0: improved 10.1146/annurev.arplant.57.032905.105248 detection and functional classification of transfer RNA genes. Haas BJ, Delcher AL, Mount SM et al (2003) Improving the Nucleic Acids Res 49:9077–9096. https://doi.org/10.1093/ Arabidopsis genome annotation using maximal transcript nar/gkab688 alignment assemblies. Nucleic Acids Res 31:5654–5666. Chanapan S, Tontiworachai B, Deewatthanawong R et al (2017) https://doi.org/10.1093/nar/gkg770 Cloning and sequence analysis of chalcone synthase gene in Haas BJ, Salzberg SL, Zhu W et al (2008) Automated eukaryotic Curcuma alismatifolia. Acta Hortic. https://doi.org/10. gene structure annotation using EVidenceModeler and the 17660/ActaHortic.2017.1167.43 program to assemble spliced alignments. Genome Biol 9:R7. Chen SF, Zhou YQ, Chen YR et al (2018) fastp: an ultra-fast all-in- https://doi.org/10.1186/gb-2008-9-1-r7 one FASTQ preprocessor. Bioinformatics 34:884–890. Harborne JB, Williams CA (2000) Advances in flavonoid research https://doi.org/10.1093/bioinformatics/bty560 since 1992. Phytochemistry 55:481–504. https://doi.org/10. Cheng HY, Concepcion GT, Feng XW et al (2021a) Haplotype- 1016/S0031-9422(00)00235-1 resolved de novo assembly using phased assembly graphs Hopkins R, Rausher MD (2011) Identification of two genes causing with hifiasm. Nat Methods 18:170–175. https://doi.org/10. reinforcement in the Texas wildflower Phlox drummondii. 1038/s41592-020-01056-5 Nature 469:411–414. https://doi.org/10.1038/nature09641 Cheng SP, Jia KH, Liu H et al (2021b) Haplotype-resolved genome Hribova E, Cizkova J, Christelova P et al (2011) The ITS1–5.8S- assembly and allele-specific gene expression in cultivated ITS2 sequence region in the Musaceae: structure, diversity ginger. Hortic Res 8:188. https://doi.org/10.1038/s41438- and use in molecular phylogeny. PLoS ONE 6:e17863. 021-00599-8 https://doi.org/10.1371/journal.pone.0017863 DePristo MA, Banks E, Poplin R et al (2011) A framework for Hu J, Fan JP, Sun ZY et al (2020) NextPolish: a fast and efficient variation discovery and genotyping using next-generation genome polishing tool for long-read assembly. Bioinformatics DNA sequencing data. Nat Genet 43:491–498. https://doi. 36:2253–2255. https://doi.org/10.1093/bioinformatics/ org/10.1038/ng.806 btz891 D’Hont A, Denoeud F, Aury JM et al (2012) The banana (Musa Huerta-Cepas J, Forslund K, Coelho LP et al (2017) Fast genome- acuminata) genome and the evolution of monocotyledonous wide functional annotation through orthology assignment by plants. Nature 488:213–219. https://doi.org/10.1038/ eggNOG-Mapper. Mol Biol Evol 34:2115–2122. https://doi. nature11241 org/10.1093/molbev/msx148 Ding HQ, Mao LH, Hu W et al (2021) Cloning and bioinformatics Huerta-Cepas J, Szklarczyk D, Heller D et al (2019) eggNOG 5.0: a analysis of chlorophyll degrading gene PPH from Curcuma hierarchical, functionally and phylogenetically annotated alismatifolia. Mol Plant Breed 19:2521–2526. https://doi. orthology resource based on 5090 organisms and 2502 org/10.13271/j.mpb.019.002521 viruses. Nucleic Acids Res 47:309–314. https://doi.org/10. Dobin A, Davis CA, Schlesinger F et al (2013) STAR: ultrafast 1093/nar/gky1085 universal RNA-seq aligner. Bioinformatics 29:15–21. https:// Jiao YN, Li JP, Tang HB et al (2014) Integrated syntenic and doi.org/10.1093/bioinformatics/bts635 phylogenomic analyses reveal an ancient genome duplication Douglas J, Futuyma MK (2017) Evolution. Sinauer Associates, in monocots. Plant Cell 26:2792–2802. https://doi.org/10. Stamford 1105/tpc.114.127597 Dubos C, Stracke R, Grotewold E et al (2010) MYB transcription Jin G (2007) Research of plant intron evolution pattern. Fujian factors in Arabidopsis. Trends Plant Sci 15:573–581. https:// Agriculture and Forestry University doi.org/10.1016/j.tplants.2010.06.005 Kalvari I, Nawrocki EP, Ontiveros-Palacios N et al (2021) Rfam 14: Dyson CJ, Goodisman MAD (2020) Gene duplication in the expanded coverage of metagenomic, viral and microRNA honeybee: patterns of DNA methylation, gene expression, families. Nucleic Acids Res 49:D192–D200. https://doi.org/ and genomic environment. Mol Biol Evol 37:2322–2331. 10.1093/nar/gkaa1047 https://doi.org/10.1093/molbev/msaa088 Katoh K, Standley DM (2013) MAFFT multiple sequence alignment Ferreyra MLF, Rius SP, Casati P (2012) Flavonoids: biosynthesis, software version 7: improvements in performance and biological functions, and biotechnological applications. Front usability. Mol Biol Evol 30:772–780. https://doi.org/10. Plant Sci 3:222. https://doi.org/10.3389/fpls.2012.00222 1093/molbev/mst010 Fu HS, Zeng T, Zhao YY et al (2021) Identification of chlorophyll Ke LJ, Yu HW, Peng FT et al (2020) Preliminary report on hybrid metabolism- and photosynthesis-related genes regulating breeding of Curcuma alsimatifolia. J Minnan Normal Univ green flower color in Chrysanthemum by integrative tran- 33:62–66. https://doi.org/10.16007/j.cnki.issn2095-7122. scriptome and weighted correlation network analyses. Genes 2020.04.010 12:449. https://doi.org/10.3390/genes12030449 Khoo HE, Azlan A, Tang ST et al (2017) Anthocyanidins and Fukai S, Udomdee W (2005) Inflorescence and flower initiation anthocyanins: colored pigments as food, pharmaceutical and development in Curcuma alismatifolia Gagnep ingredients, and the potential health benefits. Food Nutr The Author(s) 2022 aBIOTECH (2022) 3:178–196 195 Res 61:1–21. https://doi.org/10.1080/16546628.2017. Mao LH, Liu JX, Ding HQ et al (2018) Microsatellite characteri- 1361779 zation analysis and primers design of the whole transcrip- Kim D, Paggi JM, Park C et al (2019) Graph-based genome tome of Curcuma alismatifolia. Mol Plant Breed alignment and genotyping with HISAT2 and HISAT-genotype. 16:7408–7414. https://doi.org/10.13271/j.mpb.016.007407 Nat Biotechnol 37:907–915. https://doi.org/10.1038/ Mao LH, Jin L, Ding HQ et al (2020) Estimation of genome size s41587-019-0201-4 analysis of C. alismatifolia ‘Chiang Mai Pink.’ J Zhejiang Agric Kjonboon T, Kanlayanarat S (2005) Effects of gibberellic acid on Sci 61:2066–2073. https://doi.org/10.16178/j.issn.0528- the vase life of cut patumma (Curcuma alismatifolia Gagnep.) 9017.20201034 ‘‘Chaing Mai’’ flowers. Acta Hortic 673:525–529. https://doi. Marcais G, Kingsford C (2011) A fast, lock-free approach for org/10.17660/ActaHortic.2005.673.70 efficient parallel counting of occurrences of k-mers. Bioinfor- Korf I (2004) Gene finding in novel genomes. BMC Bioinform 5:59. matics 27:764–770. https://doi.org/10.1093/bioinfor https://doi.org/10.1186/1471-2105-5-59 matics/btr011 Koshioka M, Umegaki N, Boontiang K et al (2015) Anthocyanins in Mol J, Grotewold E, Koes R (1998) How genes paint flowers and the bracts of Curcuma species and relationship of the species seeds. Trends Plant Sci 3:212–217. https://doi.org/10.1016/ based on anthocyanin composition. Nat Prod Commun S1360-1385(98)01242-4 10:453–456. https://doi.org/10.1177/1934578X1501000 Nakayama M, Roh MS, Uchida K et al (2000) Malvidin 3-rutinoside 320 as the pigment responsible for bract color in Curcuma Langfelder P, Horvath S (2008) WGCNA: an R package for alismatifolia. Biosci Biotech Bioch 64:1093–1095. https://doi. weighted correlation network analysis. BMC Bioinform org/10.1271/bbb.64.1093 9:559. https://doi.org/10.1186/1471-2105-9-559 Nawrocki EP, Eddy SR (2013) Infernal 1.1: 100-fold faster RNA Langmead B, Salzberg SL (2012) Fast gapped-read alignment with homology searches. Bioinformatics 29:2933–2935. https:// Bowtie 2. Nat Methods 9:357–359. https://doi.org/10.1038/ doi.org/10.1093/bioinformatics/btt509 Nmeth.1923 Ou SJ, Jiang N (2018) LTR_retriever: a highly accurate and Leong-Skornickova J, Sida O, Jarolimova V et al (2007) Chromo- sensitive program for identification of long terminal repeat some numbers and genome size variation in Indian species of retrotransposons. Plant Physiol 176:1410–1422. https://doi. Curcuma (Zingiberaceae). Ann Bot 100:505–526. https://doi. org/10.1104/pp.17.01310 org/10.1093/aob/mcm144 Ou SJ, Chen JF, Jiang N (2018) Assessing genome assembly quality Li H, Durbin R (2010) Fast and accurate long-read alignment with using the LTR Assembly Index (LAI). Nucleic Acids Res Burrows-Wheeler transform. Bioinformatics 26:589–595. 46:e126. https://doi.org/10.1093/nar/gky730 https://doi.org/10.1093/bioinformatics/btp698 Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately Li HL, Wu L, Dong ZM et al (2021) Haplotype-resolved genome of annotate core genes in eukaryotic genomes. Bioinformatics diploid ginger (Zingiber officinale) and its unique gingerol 23:1061–1067. https://doi.org/10.1093/bioinformatics/ biosynthetic pathway. Hortic Res 8:189. https://doi.org/10. btm071 1038/s41438-021-00700-1 Pertea M, Kim D, Pertea GM et al (2016) Transcript-level Li YY, Chen XH, Yu H et al (2022) Comparative RNA-Seq analysis to expression analysis of RNA-seq experiments with HISAT, understand anthocyanin biosynthesis and regulations in StringTie and Ballgown. Nat Protoc 11:1650–1667. https:// Curcuma alismatifolia. Folia Hortic 34:1–19. https://doi.org/ doi.org/10.1038/nprot.2016.095 10.2478/fhort-2022-0007 Petchang R, Buddharak P, Chundet R et al (2017) Cloning of DFR Liao Y, Smyth GK, Shi W (2014) featureCounts: an efficient general gene in Curcuma alismatifolia ‘‘Chiang Mai Pink’’ and purpose program for assigning sequence reads to genomic Agrobacterium-mediated transformation. Re J Biotechnol features. Bioinformatics 30:923–930. https://doi.org/10. 12:1–6 1093/bioinformatics/btt656 Qiao X, Li QH, Yin H et al (2019) Gene duplication and evolution in Liu BH, Shi YJ, Yuan JY et al (2013) Estimation of genomic recurring polyploidization-diploidization cycles in plants. characteristics by analyzing k-mer frequency in de novo Genome Biol 20:38. https://doi.org/10.1186/s13059-019- genome projects. arXiv:1308.2012. https://doi.org/10. 1650-2 48550/arXiv.1308.2012 Ranavat S, Becher H, Newman MF et al (2021) A draft genome of Liu GF, Xia YD, Liu TK et al (2018) The DNA methylome and the ginger species alpinia nigra and new insights into the association of differentially methylated regions with differ- genetic basis of flexistyly. Genes 12:1297. https://doi.org/10. ential gene expression during heat stress in Brassica rapa. Int 3390/genes12091297 J Mol Sci 19:1414. https://doi.org/10.3390/ijms19051414 Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Biocon- Love MI, Huber W, Anders S (2014) Moderated estimation of fold ductor package for differential expression analysis of digital change and dispersion for RNA-seq data with DESeq2. gene expression data. Bioinformatics 26:139–140. https:// Genome Biol 15:550. https://doi.org/10.1186/s13059-014- doi.org/10.1093/bioinformatics/btp616 0550-8 Servant N, Varoquaux N, Lajoie BR et al (2015) HiC-Pro: an Lu LM (2007) Fujian tropical cut flower industry and develop- optimized and flexible pipeline for Hi-C data processing. ment strategies. Chin Agric Sci Bull 23:434–438. https://doi. Genome Biol 16:259. https://doi.org/10.1186/s13059-015- org/10.3969/j.issn.1000-6850.2007.03.096 0831-x Majoros WH, Pertea M, Salzberg SL (2004) TigrScan and Shimada Y, Nakano-Shimada R, Ohbayashi M et al (1999) GlimmerHMM: two open source ab initio eukaryotic gene- Expression of chimeric P450 genes encoding flavonoid-3 ’,5 finders. Bioinformatics 20:2878–2879. https://doi.org/10. ’-hydroxylase in transgenic tobacco and petunia plants. FEBS 1093/bioinformatics/bth315 Lett 461:241–245. https://doi.org/10.1016/S0014- Mansfeld BN, Grumet R (2018) QTLseqr: an R package for bulk 5793(99)01425-8 segregant analysis with next-generation sequencing. Plant Simao FA, Waterhouse RM, Ioannidis P et al (2015) BUSCO: Genome 11:1–5. https://doi.org/10.3835/plantgenome2018. assessing genome assembly and annotation completeness 01.0006 with single-copy orthologs. Bioinformatics 31:3210–3212. https://doi.org/10.1093/bioinformatics/btv351 The Author(s) 2022 196 aBIOTECH (2022) 3:178–196 Stamatakis A (2014) RAxML version 8: a tool for phylogenetic Wu SD, Han BC, Jiao YN (2020) Genetic contribution of pale- analysis and post-analysis of large phylogenies. Bioinformat- opolyploidy to adaptive evolution in angiosperms. Mol Plant ics 30:1312–1313. https://doi.org/10.1093/bioinformatics/ 13:59–71. https://doi.org/10.1016/j.molp.2019.10.012 btu033 Wu TZ, Hu EQ, Xu SB et al (2021) clusterProfiler 4.0: a universal Stanke M, Diekhans M, Baertsch R et al (2008) Using native and enrichment tool for interpreting omics data. Innovation syntenically mapped cDNA alignments to improve de novo 2:100141. https://doi.org/10.1016/j.xinn.2021.100141 gene finding. Bioinformatics 24:637–644. https://doi.org/10. Wu Y, Wen J, Xia Y et al (2022) Evolution and functional 1093/bioinformatics/btn013 diversification of R2R3-MYB transcription factors in plants. Su WJ, Ou SJ, Hufford MB et al (2021) A tutorial of EDTA: extensive Hortic Res 9:uhac058. https://doi.org/10.1093/hr/uhac058 de novo TE annotator. Plant Transposable Elem 2250:55–67. Xie XB, Li S, Zhang RF et al (2012) The bHLH transcription factor https://doi.org/10.1007/978-1-0716-1134-0_4 MdbHLH3 promotes anthocyanin accumulation and fruit Sun S, Zhang DY, Ives A et al (2011) Why do stigmas move in a colouration in response to low temperature in apples. Plant flexistylous plant? J Evol Biol 24:497–504. https://doi.org/ Cell Environ 35:1884–1897. https://doi.org/10.1111/j.1365- 10.1111/j.1420-9101.2010.02181.x 3040.2012.02523.x Suyama M, Torrents D, Bork P (2006) PAL2NAL: robust conver- Xu ZH, Mahmood K, Rothstein SJ (2017) ROS induces anthocyanin sion of protein sequence alignments into the corresponding production via late biosynthetic genes and anthocyanin codon alignments. Nucleic Acids Res 34:609–612. https:// deficiency confers the hypersensitivity to ROS-generating doi.org/10.1093/nar/gkl315 stresses in Arabidopsis. Plant Cell Physiol 58:1364–1377. Taheri S, Abdullah TL, Abdullah NAP et al (2012) Genetic https://doi.org/10.1093/pcp/pcx073 relationships among five varieties of Curcuma alismatifolia Yan HL, Pei XN, Zhang H et al (2021) MYB-mediated regulation of (Zingiberaceae) based on ISSR markers. Genet Mol Res anthocyanin biosynthesis. Int J Mol Sci 22:3103. https://doi. 11:3069–3076. https://doi.org/10.4238/2012.August.31.4 org/10.3390/ijms22063103 Taheri S, Abdullah TL, Abdullah NAP et al (2014) Assessing the Yang ZH (2007) PAML 4: Phylogenetic analysis by maximum genetic relationships of Curcuma alismatifolia varieties using likelihood. Mol Biol Evol 24:1586–1591. https://doi.org/10. simple sequence repeat markers. Genet Mol Res 1093/molbev/msm088 13:7339–7346. https://doi.org/10.4238/2014.September.5. Yang FS, Nie S, Liu H et al (2020) Chromosome-level genome 12 assembly of a parent species of widely cultivated azaleas. Nat Taheri S, Abdullah TL, Rafii MY et al (2019) De novo assembly of Commun 11:5269. https://doi.org/10.1038/s41467-020- transcriptomes, mining, and development of novel EST-SSR 18771-4 markers in Curcuma alismatifolia (Zingiberaceae family) Za´veska´ E, Fer T, Sı´da O et al (2012) Phylogeny of Curcuma through Illumina sequencing. Sci Rep 9:3047. https://doi. (Zingiberaceae) based on plastid and nuclear sequences: org/10.1038/s41598-019-39944-2 proposal of the new subgenus Ecomata. Taxon 61:747–763. Tang HB, Bowers JE, Wang XY et al (2008) Synteny and collinearity https://doi.org/10.1002/tax.614004 in plant genomes. Science 320:486–488. https://doi.org/10. Zhang HM, Lang ZB, Zhu JK (2018) Dynamics and function of DNA 1126/science.1153917 methylation in plants. Nat Rev Mol Cell Biol 19:489–506. Wang Y, Zhang D, Renner SS et al (2004) Botany: a new self- https://doi.org/10.1038/s41580-018-0016-z pollination mechanism. Nature 431:39–40. https://doi.org/ Zhao F, Li G, Hu P et al (2018) Identification of basic/helix-loop- 10.1038/431039b helix transcription factors reveals candidate genes involved Wang L, Shi Y, Chang XJ et al (2019) DNA methylome analysis in anthocyanin biosynthesis from the strawberry white-flesh provides evidence that the expansion of the tea genome is mutant. Sci Rep 8:2721. https://doi.org/10.1038/s41598- linked to TE bursts. Plant Biotechnol J 17:826–835. https:// 018-21136-z doi.org/10.1111/pbi.13018 Zhao P, Li X, Jia J et al (2019) bHLH92 from sheepgrass acts as a Wang M, Chen L, Liang ZJ et al (2020) Metabolome and negative regulator of anthocyanin/proanthocyandin accumu- transcriptome analyses reveal chlorophyll and anthocyanin lation and influences seed dormancy. J Exp Bot 70:269–284. metabolism pathway associated with cucumber fruit skin https://doi.org/10.1093/jxb/ery335 color. BMC Plant Biol 20:386. https://doi.org/10.1186/ Zhao R, Song X, Yang N et al (2020) Expression of the subgroup IIIf s12870-020-02597-9 bHLH transcription factor CpbHLH1 from Chimonanthus Winkel-Shirley B (2001) Flavonoid biosynthesis. A colorful model praecox (L.) in transgenic model plants inhibits anthocyanin for genetics, biochemistry, cell biology, and biotechnology. accumulation. Plant Cell Rep 39:891–907. https://doi.org/10. Plant Physiol 126:485–493. https://doi.org/10.1104/pp.126. 1007/s00299-020-02537-9 2.485 The Author(s) 2022 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png aBIOTECH Springer Journals

The genomic and bulked segregant analysis of Curcuma alismatifolia revealed its diverse bract pigmentation

Loading next page...
 
/lp/springer-journals/the-genomic-and-bulked-segregant-analysis-of-curcuma-alismatifolia-ffq6jAIkQ0
Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2022
eISSN
2662-1738
DOI
10.1007/s42994-022-00081-6
Publisher site
See Article on Publisher Site

Abstract

aBIOTECH (2022) 3:178–196 https://doi.org/10.1007/s42994-022-00081-6 aBIOTECH RESEARCH ARTICLE The genomic and bulked segregant analysis of Curcuma alismatifolia revealed its diverse bract pigmentation 1 2 1 1 1 Xuezhu Liao , Yuanjun Ye , Xiaoni Zhang , Dan Peng , Mengmeng Hou , 1 2 3 4 2 Gaofei Fu , Jianjun Tan , Jianli Zhao , Rihong Jiang , Yechun Xu , 2 5 6 7 Jinmei Liu , Jinliang Yang , Wusheng Liu , Luke R. Tembrock , 2& 1,8& Genfa Zhu , Zhiqiang Wu Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China Guangdong Provincial Key Lab of Ornamental Plant Germplasm Innovation and Utilization, Environmental Horticulture Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, China Yunnan Key Laboratory of Plant Reproductive Adaptation and Evolutionary Ecology, Yunnan University, Kunming 650504, China Guangxi Engineering and Technology Research Center for Woody Spices, Guangxi Key Laboratory for Cultivation and Utilization of Special Non-Timber Forest Crops, Guangxi Forestry Research Institute, Nanning 530002, China Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583, USA Department of Horticultural Science, North Carolina State University, Raleigh, NC 27607, USA Department of Agricultural Biology, Colorado State University, Fort Collins, CO 80523, USA Kunpeng Institute of Modern Agriculture at Foshan, Foshan 528200, China Received: 8 July 2022 / Accepted: 5 September 2022 / Published online: 6 October 2022 Abstract Compared with most flowers where the showy part comprises specialized leaves (petals) directly subtending the reproductive structures, most Zingiberaceae species produce showy ‘‘flowers’’ through modifications of leaves (bracts) subtending the true flowers throughout an inflorescence. Curcuma alismatifolia, belonging to the Zingiberaceae family, a plant species originating from Southeast Asia, has become increasingly popular in the flower market worldwide because of its varied and esthetically pleasing bracts produced in different cultivars. Here, we present the chromosome-scale genome assembly of C. alismatifolia ‘‘Chiang Mai Pink’’ and explore the underlying mechanisms of bract pig- mentation. Comparative genomic analysis revealed C. alismatifolia contains a residual signal of whole- genome duplication. Duplicated genes, including pigment-related genes, exhibit functional and struc- tural differentiation resulting in diverse bract colors among C. alismatifolia cultivars. In addition, we identified the key genes that produce different colored bracts in C. alismatifolia, such as F3 5’H, DFR, ANS and several transcription factors for anthocyanin synthesis, as well as chlH and CAO in the chlorophyll synthesis pathway by conducting transcriptomic analysis, bulked segregant analysis using both DNA and RNA data, and population genomic analysis. This work provides data for understanding the mechanism of bract pigmentation and will accelerate breeding in developing novel cultivars with Xuezhu Liao, Yuanjun Ye, Xiaoni Zhang have contributed equally to this work. & Correspondence: genfazhu@163.com (G. Zhu), wuzhiqiang@- caas.cn (Z. Wu) The Author(s) 2022 aBIOTECH (2022) 3:178–196 179 richly colored bracts in C. alismatifolia and related species. It is also important to understand the variation in the evolution of the Zingiberaceae family. Keywords Anthocyanin synthesis, Siam tulip, Floriculture, Zingiberaceae, Genome evolution INTRODUCTION development (Fukai and Udomdee 2005). As the most important ornamental trait of C. alismatifolia, bract Zingiberaceae is a monocotyledonous angiosperm lin- color (Fig. 1) has begun to attract increasing attention eage family that contains several important crops, such among breeders and researchers. However, the mecha- as ginger (Zingiber officinale), cardamom (Elettaria nism underlying bract color formation remains cardamomum), and turmeric (Curcuma longa). In addi- unknown. tion to the important economic value, the special flower Although there are very limited studies on antho- structure and the complex habitat of Zingiberaceae have cyanins in bracts, anthocyanins on plant color have been led to the evolution of many unique reproductive modes well studied, given the importance of certain pigments and pollination mechanisms (Sun et al. 2011; Wang in plant metabolism. The compounds involved in plant et al. 2004). Therefore, Zingiberaceae plays an impor- coloration (in addition to chlorophyll) are flavonoids tant role in the study of plant phylogeny and evolution (including anthocyanins), carotenoids, and betaine of reproductive systems. As the most challenging genus (Grotewold 2006). Anthocyanins are important sec- in Zingiberaceae, the classification of Curcuma is pla- ondary metabolites of plants and have important bio- gued by polyploid speciation and homoploid logical functions such as antioxidant and antibacterial hybridization and the division of the genus has always effects and attracting insects for pollination (Winkel- been a controversial issue owing to molecular and Shirley 2001). In addition to their antioxidant and morphological conflicts (Za´veska´ et al. 2012). However, antibacterial activities, anthocyanins have been used as Curcuma species including C. alismatifolia are of great food colorants and pharmaceutical feedstock (Khoo importance because of their long history of use as et al. 2017;Xu etal. 2017). The biosynthesis of antho- medicines, including C. alismatifolia (Akter et al. 2008; cyanins is catalyzed by several structural genes, such as 0 0 Taheri et al. 2019). In addition to their pharmacological chalcone synthase (CHS), and flavonoid-3 ,5 -hydroxylase 0 0 properties, this genus also contains diverse ornamental (F3 5 H), dihydroflavonol 4-reductase (DFR), and antho- species with showy bracteate inflorescences they pro- cyanidin synthase (ANS) (Belwal et al. 2020; Harborne duce. Thus, it is important to decode the underlying and Williams 2000; Mol et al. 1998) and is also affected genetic basis of these traits in those ornamental species. by transcription factors, such as MYB, bHLH, and WD40. Curcuma alismatifolia Gagnep is a tropical species Several subgroups of these transcription factors have native to Cambodia, Laos, and Thailand and is com- been shown to be involved in anthocyanin synthesis, monly known as Siam tulip. Their ‘‘flowers’’ consist of a including subgroup 4–7, 44, and 79 of MYB regulators series of large colorful bract-subtending flowers in a (Wu et al. 2022), subgroup IIIf, IIId ? e, and IVd of spike inflorescence (Fig. 1). It is a very popular orna- bHLH regulators (Xie et al. 2012; Zhao et al. mental cut or potted flower in China and Southeast Asia 2018, 2019, 2020), and WD40 proteins TRANSPARENT because of its distinctive inflorescence, colorful bracts, TESTA GLABRA1 (TTG1) homologs (Baudry et al. 2004; and a long flowering period that lasts from May to Belwal et al. 2020; Gonzalez et al. 2008). In the antho- November during the high-temperature season. This cyanin synthesis pathway of Arabidopsis, the MYB genes extended period of flowering is particularly attractive interact with the bHLH transcription factors and the among floriculturists, as it provides a longer production WD40 protein family to form the MYB–bHLH–WD40 window than most cut flowers. Among the many culti- (MBW) complex to regulate the formation and accu- vars of Siam tulip, ‘‘Chiang Mai Pink’’ is the most popular mulation of anthocyanins (Dubos et al. 2010; Yan et al. variety with broad market prospects (Fukai and 2021). In a previous qualitative and quantitative anal- Udomdee 2005;Lu 2007; Mao et al. 2018; Taheri et al. ysis of pigments in the pink bracts of C. alismatifolia, 2012, 2014). At present, studies on the traits of C. anthocyanidin malvidin 3-rutinoside was identified as alismatifolia have mainly focused on bract color the main pigment (Nakayama et al. 2000). The CHS and (Koshioka et al. 2015), vase life (Kjonboon and Kan- DFR genes have been cloned from C. alismatifolia, and layanarat 2005), inflorescence and flower initiation and the magenta color of the petals in DFR transgenic plants The Author(s) 2022 180 aBIOTECH (2022) 3:178–196 Fig. 1 Structure of the C. alismatifolia genome. The innermost circle shows the diversity of bract pigmentation and inflorescence morphology. A GC content. B rRNA distribution. C tRNA distribution. D SSR distribution. E LTR distribution. F 16 chromosomes was more brilliant than that of the petals from wild type magnesium chelatase subunit H (chlH), and chloro- (Chanapan et al. 2017; Petchang et al. 2017). Moreover, phyllide a oxygenase (CAO) (Wang et al. 2020). There- 0 0 F3 5 H was also identified as a key gene controlling fore, to clarify the formation of each color in C. anthocyanin synthesis in C. alismatifolia ‘‘Dutch Red’’ by alismatifolia bracts, the biosynthetic pathways of dif- transcriptome analysis (Li et al. 2022). In addition, most ferent pigments must be elucidated. Such knowledge of the inner whorl bracts of C. alismatifolia have green will improve cultivar development through gene editing tips with chlorophyll deposition and red pigmentation of gene targets in the pigment pathways. under them, which gives a desiccated appearance, thus At present, the traditional method for developing reducing the ornamental value (Ding et al. 2021). new varieties of C. alismatifolia is through hybrid Chlorophyll is a pigment that provides plants their breeding, which is costly and time intensive (Ke et al. characteristic green color and is mainly composed of 2020). Moreover, previous research on C. alismatifolia chlorophylls a and b. Chlorophyll metabolism can be has mainly focused on the development of molecular divided into three main steps, i.e., chlorophyll synthesis, markers without the availability of a complete genome chlorophyll cycle, and chlorophyll degradation, with reported for this species. A high-quality C. alismatifolia each step being mediated by a series of important genome assembly will accelerate the development of enzymes such as glutamyl-tRNA reductase (HemA), new cultivars with desired traits. It will also improve The Author(s) 2022 aBIOTECH (2022) 3:178–196 181 the characterization of wild germplasms and aid in the assembly of C. alismatifolia was supported by the high discovery of novel genotypes and the identification of mapping rates of 97.44% (MGI) and 99.07% (HiFi) diversity hotspots for species conversation. Thus, it is (Supplementary Table 2). The high level of complete- urgent to compile a high-quality genome for C. alis- ness of this assembly was also verified by a BUSCO matifolia to accelerate evolutionary studies as well as (Simao et al. 2015) score of 96.53% (Supplementary precision breeding and genome editing in the genus Fig. 5) and a CEGMA (Parra et al. 2007) score of 95.16% Curcuma. (Supplementary Fig. 6). The long terminal repeat (LTR) Currently, only three genomes from Zingiberaceae assembly index (LAI) (Ou et al. 2018) score was 26.38 have been published, including Alpinia nigra (Ranavat (Supplementary Fig. 7, Supplementary Table 3). These et al. 2021), Z. officinale (Cheng et al. 2021b; Li et al. statistics suggest that the C. alismatifolia ‘‘Chiang Mai 2021), and C. longa (Chakraborty et al. 2021), of which Pink’’ genome is a high-quality genome. C. longa is only a draft genome. Here, we present the From the complete genome, 1,172,133 repeat units of chromosome-scale assembly of C. alismatifolia, the first different types were predicted, accounting for 75.84% genome in Curcuma. The genome of C. alismatifolia was (753,914,943 bp) of the total genome size (Supple- determined using a combination of high-accuracy long- mentary Table 4). The long terminal repeats (LTR) read PacBio HiFi and proximity ligation Hi-C data. In accounted for the highest proportion of the genome total, a genome of 994.07 Mb was assembled with (52.60%, Supplementary Table 4), among which the 95.25% of contigs anchored to 16 chromosomes, with super families Copia (31.79%) and Gypsy (20.81%) an N50 of 57.51 Mb. We examined the patterns of dominated (Supplementary Table 4). A burst in LTR whole-genome duplications as well as other duplication proliferation was inferred to have occurred 2.5 mya types and found that tandem and other small-scale gene (Supplementary Fig. 8), with the genomic location of duplications were important in the divergence of C. LTRs concentrated away from the SSR hotspots (Fig. 1). alismatifolia color morphs. In addition, we identified key We identified 57,534 protein-coding genes (Supple- genes involved in the anthocyanin and chlorophyll mentary Table 5) with a BUSCO score of 90.7% (Sup- metabolism pathways in C. alismatifolia bracts that plementary Table 6) and the gene numbers and repeat underlie this coloration. The publication of this refer- sequences in line with C. longa (Chakraborty et al. 2021) ence genome and the genetic mechanisms controlling and the published transcriptome of C. alismatifolia the color of C. alismatifolia bracts provide a valuable (Taheri et al. 2019) (Supplementary Table 5). In addi- resource for the development of novel cultivars as well tion, the total length of all genes (exons ? introns) as increasing our understanding of the evolution of showed a similar pattern to that of other published inflorescences among the monocots and providing an monocot genomes (Supplementary Figs. 9, 10, 11 and important genomic resource for clarifying the complex 12). Up to 92.35% of the protein-coding genes have phylogenetic relationships in Zingiberaceae. been annotated with KEGG, GO, NR, and other databases (Supplementary Table 7). In total, 9417 rRNAs, 8641 snRNAs, 1151 tRNAs, and 202 miRNAs were also RESULTS annotated (Fig. 1, Supplementary Table 8). The GC content of the entire genome was approximately 43% C. alismatifolia genome assembly and annotation (Fig. 1). There was a high GC content (* 55%) region in chromosomes 13 (LG13) and 16 (LG16). These GC- The genome size of C. alismatifolia ‘‘Chiang Mai Pink’’ enriched regions are the locations of the 45S ribosomal was estimated to be 1.10 Gb and the heterozygosity was RNA, which are consistent with the high GC content of found to be 1.7% using 87.45 Gb of MGI-SEQ 2000 the internal transcribed spacers (ITS) for these genes in survey data (Supplementary Figs. 1, 3, 4 and Table 1). banana (Hribova et al. 2011). We also found that 5S This is slightly higher than the reported genome size of rRNA genes were enriched in chromosome 1 (LG01), 998.5 Mb estimated by flow cytometry in a previous while tRNA-genes were enriched in chromosome 8 report (Mao et al. 2020). Then, 30.35 Gb of PacBio cir- (LG08) (Fig. 1). cular consensus sequence (CCS) reads were used for assembly, and 95.25% of the sequences were anchored Comparative genomics and whole-genome to the 16 chromosomes by combining 110.73 Gb of Hi-C duplication (WGD) event data, which was consistent with the expected number of chromosomes (2n = 32) (Leong-Skornickova et al. To clarify the phylogenetic position of C. alismatifolia 2007), resulting in a genome of 994.07 Mb size (Fig. 1; and provide a general framework for understanding its Supplementary Fig. 2). The high fidelity of the genome genomic structure, genes of 217 single-copy gene The Author(s) 2022 182 aBIOTECH (2022) 3:178–196 families from 15 species, including C. alismatifolia,13 Fig. 16). This result was similar to that of Z. officinale, other monocotyledonous species, and the outgroup but slightly different from the results of M. acuminata grape (Vitis vinifera) were used for phylogenetic analy- and the dicotyledonous Rhododendron (Yang et al. 2020) sis. C. alismatifolia was inferred to have diverged from (Supplementary Fig. 16), wherein only PD and TD ginger (Z. officinale) around * 11.9 mya (Fig. 2A). A showed this pattern. Thus, these three types of dupli- total of 9,019 gene families were shared by species in cated genes in C. alismatifolia and Z. officinale had more the Zingiberales (Fig. 2B, C, Supplementary Tables 9 and rapid sequence divergence with stronger positive 10), among which C. alismatifolia had 1490 species- selection than the WGD and TRD genes. KEGG enrich- specific gene families (Fig. 2A). In the genome of C. ment of the five types of duplicated genes also showed alismatifolia, 2,102 gene families were found to be that these genes seem to be divided into two main expanding, of which 120 expanded significantly categories, among which DSD, PD, and TD genes were (Fig. 2A). KEGG enrichment results showed that these enriched in one group for monoterpenoid biosynthesis expanded gene families were mainly related to envi- and flavonoid biosynthesis, which might be related to ronmental adaptation, such as plant–pathogen interac- species-specific differentiation. In comparison, the WGD tion, steroid hormone biosynthesis, and terpenoid and TRD genes in another group were associated with backbone biosynthesis (Supplementary Figs. 13 and more conserved functions such as circadian rhythm and 14). plant hormone signal transduction (Fig. 3B and Sup- In the speciation and divergence of angiosperm lin- plementary Fig. 17). eages, WGD events are an important source of molecu- In addition, the function of replicated genes could be lar diversity (Wu et al. 2020). The distributions of Ks influenced by epigenetic processes such as DNA and substitution rate of fourfold synonymous (degen- methylation, which has been shown to influence gene erative) third-codon transversion (4dtv) sites of gene expression in certain taxa (Dyson and Goodisman pairs in the collinear blocks of C. alismatifolia and Z. 2020). We examined the distribution of the five dupli- officinale indicated that both species have a shared Ks cated gene types on chromosomes and found that genes peak (Fig. 2D). This result was consistent with a pre- of the TRD and WGD types were more concentrated at vious report that Z. officinale has a shared WGD in chromosome ends with an opposite positional trend to Zingiberaceae (Cheng et al. 2021b). This WGD event was LTR distributions, whereas DSD, PD, and TD genes, also supported by the collinear relationships between C. especially DSD genes, showed a higher overlapping alismatifolia and Z. officinale, which had a 2:2 syntenic distribution with LTRs (Supplementary Fig. 18). The depth pattern shown by JCVI (Fig. 2E, F) and was fur- 5mC methylation of CG contexts detected by Nanopolish ther verified by the 1:2 pattern of collinearity between software showed that the distribution of methylation C. alismatifolia and Amborella (Supplementary Fig. 15). coincided with the distribution of LTRs and DSDs This result indicates that, compared with Amborella,a (Supplementary Fig. 18). Considering that TEs are WGD event occurred in both C. alismatifolia and Z. known to promote gene replication (Bayer et al. 2020) officinale (Fig. 2E, F). We conducted the above analyses and angiosperm chromosome remodeling (Douglas and and reported validated evidence of an obvious WGD Futuyma 2017), and that both TEs and methylation can event in the C. alismatifolia genome. affect gene expression, such as cytosine DNA methyla- tion, which regulates TE silencing, imprinting, and gene Gene duplication events contribute expression (Bourque et al. 2018; Liu et al. 2018; Wang to the diversity of C. alismatifolia et al. 2019), we assessed their distribution in the gen- ome. We hypothesized that TE and exon methylation To investigate whether other gene duplication events might affect the distribution and expression of different played an important role in the evolution of diverse classes of repetitive genes and contribute to their dif- bract pigmentation, we identified five types of dupli- ferent evolutionary fates. Given this, we analyzed TE cated genes: 26,466 dispersed duplicates (DSD), 1777 insertions and methylation in exons, introns, and 1 kb proximal duplicates (PD), 2052 tandem duplicates (TD), regions [the length of intergenic regions was longer 14,159 transposed duplicates (TRD), and 11,120 from than 1 kb for most genes (Supplementary Fig. 19)] of whole-genome duplicates (WGD) (Fig. 3A, Supplemen- genes upstream and downstream in the five types of tary Tables 11 and 12). We compared the Ks and Ka/Ks duplicated genes. As expected, more DSD, PD, and TD distributions from different types of duplicated genes genes had TE insertions and higher methylation levels and found that the DSD, PD, and TD gene pairs had than the WGD and TRD genes (Fig. 3C, D, Supplemen- higher Ka/Ks ratios and smaller Ks values than those of tary Tables 13 and 14), especially in their exons and the other two types of duplicated genes (Supplementary upstream and downstream regions (Fig. 3C and The Author(s) 2022 aBIOTECH (2022) 3:178–196 183 The Author(s) 2022 184 aBIOTECH (2022) 3:178–196 bFig. 2 Evolution of the C. alismatifolia genome and gene families. a total of 3347 upregulated and 3350 downregulated A Phylogenetic tree constructed using maximum likelihood based genes in QMF (Fig. 4B, Supplementary Table 15). KEGG on the concatenation of single-copy nuclear genes. B The distri- enrichment analysis showed that these DEGs were bution of orthogroups in each species. C Venn diagram of shared enriched in the phenylpropanoid and flavonoid biosyn- and unique gene families in Zingiberales species. D The distribu- tion frequencies of synonymous substitutions (Ks) and substitu- thesis pathways (Fig. 4C), both of which are upstream of tions of 4dtv sites. E Synteny patterns between C. alismatifolia and anthocyanin synthesis. Z. officinale. F The 2:2 syntenic depth pattern between C. Based on previous reports (Belwal et al. 2020; Fer- alismatifolia and Z. officinale reyra et al. 2012) and our genome annotations, we analyzed the expression levels of all genes in the Supplementary Fig. 20). In addition, genes with TEs in anthocyanin synthesis pathway in different samples their exons had lower relative expression levels (Sup- (Belwal et al. 2020; Ferreyra et al. 2012) (Supplemen- plementary Fig. 21). Moreover, TEs in the exons and the tary Fig. 26, Supplementary Table 16). We found that upstream and downstream regions had a higher degree the tandem duplicated (TD) gene F3 5’H (gene13778)in of methylation than that in the other gene portions QMF was highly expressed in SeR at the S3 and S4 (Supplementary Fig. 22), which is consistent with a stages, but lowly expressed at the S1 and S2 stages. This previous study (Zhang et al. 2018). However, to date, gene was minimally expressed in the different tissues of there have been no general conclusions regarding the XCX (Fig. 4D, Supplementary Table 11). In addition, the effect of methylation in exon/intron/upstream/down- genes DFR (gene25158), ANS (gene437), and BZ1 stream regions on gene expression (Zhang et al. 2018). (gene16173) were also significantly downregulated in We found that the methylation levels in these four XCX (Fig. 4D), among which DFR and ANS had a higher regions were approximately equivalent (Fig. 3D). Com- expression level and their expression patterns were pared to the WGD and TRD genes, the DSD, PD, and TD similar to the bract color accumulation in the pink genes had lower expression in multiple tissues and at bracts. In addition, transcription factors were identified different developmental stages (Supplementary Fig. 23). using iTAK (Supplementary Fig. 27 and Supplementary Therefore, the results showed that the higher the degree Tables 17 and 18). The subgroups of 4–7, 44, and 79 of of methylation, the lower is the gene expression. The MYB regulators (Wu et al. 2022), IIIf, IIId ? e, and IVd expressions of DSD, PD, and TD genes were more of bHLH regulators (Xie et al. 2012; Zhao et al. affected by methylation than WGD and TRD (Fig. 3E). 2018, 2019, 2020) and WD40 protein TTG1 homologs DSD, PD, and TD genes also had longer intron lengths (Baudry et al. 2004; Belwal et al. 2020; Gonzalez et al. and relatively higher TE content than the WGD and TRD 2008) have been reported to be involved in anthocyanin genes (Fig. 3C and Supplementary Fig. 24). synthesis; these most important subgroups of MYB, bHLH, and WD40 genes (Supplementary Figs. 28 and Bract pigmentation genes in C. alismatifolia 29, Supplementary Table 19) were identified by a phy- logenetic analysis based on homologs from A. thaliana To investigate the mechanisms underlying the impor- and SG79 in other species (Wu et al. 2022). On this tant trait of bract color in C. alismatifolia, a stable all- basis, a total of 12 candidate regulators were screened white bract morph, two cultivars of C. alismatifolia, i.e., out (Supplementary Fig. 30A, Supplementary Table 20) ‘‘Country Snow’’ (XCX) and ‘‘Chiang Mai Pink’’ (QMF) and further narrowed down to eight genes according to were selected to conduct comparative transcriptomic the weighted correlation network analysis (WGCNA) analyses. Light microscopic observation showed that the (Supplementary Fig. 30B and 31). Finally, ten genes, color of the inner whorl bract tips of QMF was a mix of including DFR, ANS,2 MYB (SG7 subgroup of MYB: red and green colors (Supplementary Fig. 25), and the gene39947, SG44 subgroup of MYB: gene20923), and 6 anthocyanin content of its inner whorl bract bases at S4 bHLH (IVd subgroup of bHLH: gene46097, IIId ? e stage was 0.2338 mg/g fresh weight (FW). Tissue subgroup of bHLH: gene29974, gene45529, gene40971, samples were collected from the outer all-green bract gene51401, and IIIf subgroup of bHLH: gene32335) (Br), tip (SeG), and base (SeR) of the inner whorl genes were verified by qRT-PCR. The qRT-PCR results ornamental bract at four developmental stages, i.e., showed that the expression patterns of these genes including the early stage of bract initiation (S1), tip were consistent with the bract coloring period in the coloring stage (S2), base color transition stage (S3), and RNA-seq results (Supplementary Fig. 30c). Therefore, color change completion stage (S4), and used for tran- we believe that these genes play a crucial role in scriptomic analyses (Fig. 4A). anthocyanin synthesis and are also closely related to the We identified the differentially expressed genes formation of white bracts, because the qRT-PCR results (DEGs) of SeR at S4 in the QMF vs XCX, which contained The Author(s) 2022 aBIOTECH (2022) 3:178–196 185 Fig. 3 Gene duplication and evolution. A Genes derived from different modes of duplication in four different species. The gene types are whole-genome duplication (WGD), tandem duplication (TD), proximal duplication (PD), transposed duplication (TRD), and dispersed duplication (DSD). B Top five enriched pathways for each duplicated type in C. alismatifolia based on the KEGG analysis. C Percentage of different duplicated gene types which contain TEs in an exon, intron, and 1 kb upstream or downstream sequence from each CDS. D Percentage of cytosine methylation in different duplicated gene types in an exon, intron, and 1 kb upstream or downstream sequence from each CDS. E The relationship between methylation and gene expression among different types of duplicated genes The Author(s) 2022 186 aBIOTECH (2022) 3:178–196 showed that these genes were not expressed in the SeR sesquiterpenoid and monoterpenoid synthesis, were of white bracts. downregulated as the flowers developed, implying that Moreover, population structure analysis based on 56 this kind of aroma will fade with full flowering; how- C. alismatifolia cultivars showed that these samples ever, more evidence is needed for other aromatic sub- were mainly divided into two groups, among which the stances (Supplementary Fig. 35B–C). inner bracts of samples in Group 2 possessed a similar morphology and color in the outer bracts, while there Bulked segregant analysis (BSA) isolates the key were differences between the inner and outer bracts of genes related to bract color of C. alismatifolia samples in Group 1 (Supplementary Fig. 32A). Principal component analysis (PCA) results also showed differ- To further understand the molecular mechanism of entiation between these two groups (Supplementary bract pigmentation, we sequenced two bulked popula- Fig. 32B). To further verify the differences in bract color tions with different bract colors (pink and red), each at the population level, one population with red inner consisting of 50 offspring from an F1 population con- whorl bract bases (SeR) (13 individuals) and one with taining 985 individuals from a cross between the red white or green SeR(14 individuals) were selected to line (C. alismatifolia ‘‘Scarlet’’ (JL), female parent with calculate Fst, and the results of the red and non-red red bract) and pink line (C. alismatifolia ‘‘Dawn’’ (LM), populations further confirmed that 8 of the 14 candi- male parent with pink bract), to a depth of date genes (gene25158, gene437, gene318, gene39947, 50 9 (Fig. 5A). DNA and RNA from four samples gene29974, gene45529, gene32394, and gene32335) [P1(LM), P2(JL), S1(LM), and S2(JL)] were sequenced were differentiated at the population level (Supple- for BSA and bulked segregant RNA-seq (BSR) analysis. mentary Fig. 32C). We identified SNPs between the two parental lines and A previous study reported that the inner whorl bract computed the SNP index for the red line and pink line tips (SeG) are green in C. alismatifolia, which is a phe- bulked populations as well as their differences (G’ value) nomenon of chlorophyll deposition (Ding et al. 2021) using a 1000 kb sliding window with a step size of 10 kb, (Fig. 4A). The results of WGCNA, a widely used tran- as described by Mansfeld and Grumet (2018). Three scriptome analysis method (Langfelder and Horvath genomic regions containing 1547 genes contributing to 2008), revealed one chlH (gene55716) and one CAO bract color, with a confidence threshold exceeding 95% (gene49893) gene in chlorophyll synthesis as the hub were identified (Supplementary Table 23). Similar results genes in the module most associated with the green trait were obtained in the BSR analysis (Supplementary (Supplementary Figs. 33 and 34, Supplementary Fig. 36 and Supplementary Table 24). Based on annota- Table 22). To understand the molecular mechanism of tion information (Supplementary Table 25), we identified chlorophyll synthesis, we analyzed the expression pat- 31 genes that were potentially related to anthocyanin terns of all chlorophyll synthesis and metabolic pathway synthesis (Fig. 5B–D, Supplementary Table 26). Among 0 0 genes in different tissues (Fu et al. 2021; Wang et al. them, three candidate genes, including one F3 5 H gene 2020) (Fig. 4E, Supplementary Table 21). We found that (gene13778) and two MYB genes (SG14: gene13102 and the downstream genes of the TD gene chlH (gene55716, SG4: gene14458) in QTL1, were identified for bract pig- gene55715) were minimally expressed in the white mentation based on gene expression, single nucleotide samples (Fig. 4E), suggesting that the chlH is a key gene variants and structural variations (Fig. 5D, Supplemen- controlling chlorophyll synthesis in C. alismatifolia bracts. tary Table 25). The promoter of F3 5’H gene had a 29 bp In addition to bract color, floral scent is an important heterozygous region 94 bp upstream of the start codon in economic trait of ornamental plants. Previous observa- LM, and this same region had zero sequencing depth in JL 0 0 tion showed that different cultivars had different floral (Fig. 5E). In addition, the second intron of F3 5 H gene scents, and the ‘‘true’’ flowers of C. alismatifolia were the showed an abnormal length with a heterozygous deletion main sources of floral scents rather than its colorful of approximately 16.8 kb in length according to the gen- bracts. Therefore, GC–MS was used to analyze the floral ome annotation of QMF (Supplementary Fig. 37a). For scents originating from the flowers of C. alismatifolia the two MYB genes, the allele frequencies of the SNP in ‘‘Chiang Mai Pink’’ and revealed its volatile compounds, gene13102 promoter at LG03_61315110 and sesquiterpenoids and monoterpenoids, associated with LG03_61315135 in S1 of LM and S2 of JL were different. floral scent, since the duplicated genes were mainly The SNP at LG03_61315110 in S1 of LM was C/A (55%/ enriched in monoterpenoid biosynthesis (Supplemen- 45%), whereas in S2 of JL pool was C (100%). The SNP at tary Fig. 35A). Further analysis of the transcriptomes at LG03_61315135 in S1 of LM was A/T (53%/47%), different flower developmental stages revealed that whereas in S2 of JL pool, it was A (100%) (Supplementary TPS-a and TPS-b genes, which are mainly related to Fig. 38). The allele of the SNP in gene14458’exonat The Author(s) 2022 aBIOTECH (2022) 3:178–196 187 The Author(s) 2022 188 aBIOTECH (2022) 3:178–196 bFig. 4 The anthocyanin and chlorophyll biosynthetic pathway with an impact on the diversity of C. alismatifolia that genes identified by RNA-seq in the all-white bract morph C. promotes the coloration of bracts. Gene duplication alismatifolia and C. alismatifolia ‘‘Chiang Mai Pink’’. A Locations of plays an important role in evolutionary adaptation by tissue samples for RNA-seq. Br (outer all-green bract), SeG (inner providing ‘‘new’’ genetic material to the genome (Dyson whorl bract tips), and SeR (inner whorl bract base) of C. alismatifolia ‘‘Chiang Mai Pink’’ (QMF) at different developmental and Goodisman 2020), and the WGD events are regar- stages and a white morph C. alismatifolia ‘‘Country Snow’’ (XCX) at ded as an important source of such diversity (Wu et al. S4 (bar of S1: 0.2 cm, bar of S2: 0.3 cm, bar of S3: 0.4 cm, bar of 2020). It is now clear that the genomes of extant seed S4: 1.5 cm). B KEGG enrichment of DEGs in the QMF SeR vs XCX plants and angiosperms have undergone multiple WGD SeR at S4 stage. C Anthocyanin biosynthetic pathway in C. alismatifolia. Heatmaps show the FPKM with Log2 transformation events and share an ancient polyploid ancestor. In of genes in SeR of QMF at S1–S4 stages and XCX at the S4 stage. addition, some angiosperms have undergone repeated Enzyme abbreviations: PAL phenylalanine ammonium lyase, C4H independent whole-genome duplication events in recent cinnamate-4-hydroxylase, 4CL 4-coumaroyl-CoA synthase, CHS times. For example, the banana (Musa acuminata) has chalcone synthase, CHI chalcone isomerase, F3H flavanone 3-hy- 0 0 0 0 0 0 droxylase, F3 H flavonoid-3 -hydroxylase, F3 5 H flavonoid-3 ,5 - three independent WGD events (D’Hont et al. 2012). hydroxylase, DFR dihydroflavonol 4-reductase, ANS anthocyanidin Here, by identifying the collinear blocks of C. alismati- synthase, BZ1 anthocyanidin 3-O-glucosyltransferase. D Chloro- folia and Z. officinale, we found that C. alismatifolia and phyll biosynthetic and degradation pathway in C. alismatifolia. Z. officinale have a 2:2 collinear relationship (Fig. 2E, F), Enzyme abbreviations: HemA glutamyl-tRNA reductase, HemL glutamate-1-semialdehyde 2,1-aminomutase, HemB porphobilino- together with the Ks distribution, suggesting that one gen synthase, HemC hydroxymethylbilane synthase, HemD uro- WGD event occurred in the ancestor of these two spe- porphyrinogen-III synthase, HemE uroporphyrinogen cies (Fig. 2D–F). The WGD events in monocots are decarboxylase, HemF coproporphyrinogen III oxidase, HemY complex (Jiao et al. 2014;Wuetal. 2020). Previous protoporphyrinogen/coproporphyrinogen III oxidase, chlH mag- nesium chelatase subunit H, chlM, magnesium-protoporphyrin studies have also indicated that ancestors of Z. officinale O-methyltransferase, chlE magnesium-protoporphyrin IX mono- may have undergone multiple WGD events (Cheng et al. methyl ester, por protochlorophyllide reductase, DVR divinyl 2021b;Li etal. 2021). We further confirmed these WGD chlorophyllide a 8-vinyl-reductase, CAO chlorophyllide a oxyge- events with the fitted Ks results of WGDI for C. alis- nase, chlG chlorophyll/bacteriochlorophyll a synthase, NOL chloro- phyll(ide) b reductase, HCAR, 7-hydroxymethyl chlorophyll a matifolia and the corresponding Ks distribution of C. reductase, CLH chlorophyllase, SGR magnesium dechelatase PAO alismatifolia with Amborella (Supplementary Figs. 40 pheophorbide a oxygenase, RCCR red chlorophyll catabolite and 41), combined with the weak signals in the median reductase. E Expression heatmaps of genes in the chlorophyll Ks distribution of collinear blocks of the Z. officinale biosynthetic and degradation pathway paralogous gene pairs in WGDI collinear blocks (Sup- plementary Fig. 42). From this, it appears that both C. LG03_69765180 was A/G (63%/38%), whereas in the S2 alismatifolia and Z. officinale have an additional WGD of JL pool, it was A (100%), implying that the structural weak signal, which was further supported by the results variations in these genes may be responsible for the red of Tree2GD, suggesting WGD should have occurred once or pink bract color (Supplementary Fig. 39). Further- before the divergence of Zingiberaceae and Musaceae as more, two MYB genes, four bHLH genes, seven WRKY well as once before the divergence of C. alismatifolia and genes, and two WD40 genes were found to be differen- Z. officinale (Supplementary Fig. 43 and 44). Sampling tially expressed in P2 vs. P1 and S2 vs. S1 based on the of more species in this group is needed to support the RNA data of P1(LM), P2(JL), S1(LM), and S2(JL) (Sup- inferred additional WGD event. Based on the results of plementary Tables 27 and 28). Among the four bHLH TreePL and MCMCtree, the most recent WGD event genes (gene21747, gene41577, gene46097, and occurred after the divergence of C. alismatifolia and Z. gene55364), gene55364 belongs to the IIId ? e subgroup officinale ancestors from ancestors of Musa and before of bHLH, while gene46097 belongs to the IVd subgroup of the divergence of C. alismatifolia with Z. officinale bHLH (Supplementary Table 19), which have been pre- (Fig. 2A and Supplementary Fig. 45). viously reported to be involved in the biosynthesis of anthocyanins. Gene duplication as a key to the evolution of diverse bract pigmentation DISCUSSION Five duplicated gene types were identified and com- Monocotyledons have more complex WGD events pared in C. alismatifolia, Z. officinale, and M. acuminata (Fig. 3A). In previous studies of the dicotyledonous Here, we present a high-quality chromosome-scale plant Rhododendron, TDs and PDs were found to con- genome assembly of Curcuma species and identify an tribute to an increase in the ratio of enzymatic genes in obvious WGD event, as well as other duplicated genes, the anthocyanin biosynthesis pathway, suggesting that The Author(s) 2022 aBIOTECH (2022) 3:178–196 189 Fig. 5 BSA-seq reveals the F3 5’H gene as a candidate gene responsible for red bract pigmentation of C. alismatifolia. A Left: C. alismatifolia ‘‘LM’’; right: C. alismatifolia ‘‘JL’’. Bar: 3 cm. B Genome-wide G value for allele frequency of SNPs between F1 hybrids S1 (LM) and S2 (JL) pools. C Anthocyanin biosynthesis related genes in BSA signal regions. D Expression heatmap of 31 anthocyanin biosynthesis- related genes from BSA signal regions. E Gene structure of candidate gene according to genome annotation of QMF and sequence depth in parents LM (P1) and JL (P2), and F1 hybrids LM (S1) and JL (S2) TDs and PDs are important in the evolution of flower that has been transposed (Supplementary Fig. 46). color diversity (Yang et al. 2020). In our study, we found Overall, TD, PD, and DSD can be regarded as a general that the TD and PD genes of Zingiberales were signifi- class and appear to be involved in the evolution of color cantly enriched in the upstream steps of the antho- diversity in C. alismatifolia bracts. Here, we also found cyanin synthesis pathway. In terms of TE content, that there was no significant difference in the methyla- degree of methylation, Ks, and gene expression, TD, PD, tion levels between exons, introns, or upstream or and DSD were more similar to WGD or TRD genes downstream regions of different categories of dupli- (Fig. 3C–E, Supplementary Figs. 16, 17, 18, 19, 20, 21, cated genes (Fig. 3D). However, WGD and TRD genes 22, 23 and 24). The Venn diagram of these five types of had lower levels of methylation and higher gene genes shows that TD and PD share 117 genes, whereas expression (Fig. 3D and Supplementary Fig. 23) and WGD and TRD share 4639 genes, indicating that Dup- were involved in growth and development processes Gen_finder (Qiao et al. 2019) may not be able to clearly (Fig. 3B and Supplementary Fig. 17), suggesting that discern duplicates of these types in all cases or that genes retained after WGD or TRD duplication are more terms used to define these duplicate types can overlap. often related to conserved functions. This result is For instance, no such overlap was found among the consistent with previous studies, where the retention other groups, indicating that some duplicates likely and loss of repetitive genes after WGDs are not random, fulfilled the criteria for two categories, such as a WGD but have a bias for genes related to signal transduction, The Author(s) 2022 190 aBIOTECH (2022) 3:178–196 transcription factors, and genes related to development. Zingiberales; C. alismatifolia had 8 F3 5’H genes, Z. 0 0 The existence of gene retention bias may result from the officinale had 6 F3 5 H genes, and M. acuminata had 4 functional divergence of genes related to adaptation to F3 5’H genes according to gene annotations, among 0 0 novel environments (Wu et al. 2020). However, how which the F3 5 H gene in Z. officinale had a length of DNA methylation affects the evolution and retention of 12,746 bp with 1040 amino acids (gene13778 had 957 these two types of duplicated genes requires further amino acids), as well as six exons, and its longest intron study. had a length of 8,702 bp (Supplementary Fig. 37B, C). The phylogenetic tree showed that gene13778 also 0 0 A complex regulatory mechanism controls clustered with F3 5 H genes in Petunia (Supplementary the color diversity in C. alismatifolia bracts Fig. 37B), which also confirmed the identity of this gene. However, considering the complexity of this region, 0 0 0 Here, we also identified that F3 5’H, DFR, and ANS are where multiple F3 5 H genes repeat in tandem, and the key genes in the anthocyanin biosynthesis pathway, and limitations of existing gene structure prediction soft- transcriptome analysis revealed that DFR and ANS play ware, we believe that more experimental evidence is 0 0 a more critical role, and the role of DFR gene in C. needed in the future to verify the structure of F3 5 H alismatifolia has also been verified (Petchang et al. gene and reveal the differences in the expression levels 2017). Several regulatory factors were identified to be of different transcripts, even the divergence within C. closely related to DFR and ANS, among which the SG7 alismatifolia cultivars. In summary, the key genes subgroup of MYB gene39947 (Supplementary Fig. 47) involved in anthocyanin synthesis pathways were had a stronger correlation with DFR and ANS, and qRT- identified, and resequencing analyses found that genetic PCR results verified that it has a very similar expression differentiation was associated with different bract color trend as DFR and ANS.A TRANSPARENT TESTA 8 (TT8) groups, concluding that color formation in C. alismati- homologous gene, gene32335, was also identified (Sup- folia bracts is a complex process, which also needs to be plementary Fig. 48). TT8 is considered sufficient for the verified by further studies. expression of DFR and ANS genes and is reported to be one of the key regulators of anthocyanin production in many plant species (Yan et al. 2021). In addition, pre- CONCLUSIONS vious reports revealed that LcbHLH92a and LcbHLH92b in sheepgrass are involved in anthocyanin and proan- Our study found that different categories of duplicated thocyanidin synthesis (Zhao et al. 2019). In our study, genes in C. alismatifolia genome were diversified in an AtBHLH92 homologous gene, gene46097 (Supple- function by the duplicate type. This includes the evo- mentary Fig. 48), was simultaneously identified by BSA lution of different colored bracts of C. alismatifolia and transcriptome analysis in our study, suggesting a through the tandem duplication of genes and subse- crucial role. Therefore, we believe that there is a com- quent changes in gene structure. We identified the key 0 0 plex mechanism controls bract color formation. WGCNA anthocyanin synthesis genes DFR, ANS and F3 5 H and has predicted a regulatory network, which also provides chlorophyll synthesis genes chlH and CAO in the for- a basis for subsequent experimental verification. In mation of bract color and inferred the potential contri- addition, candidate gene F3 5’H (gene13778), which is a bution of individual members of the transcription factor flavonoid-3’,5’-hydroxylase, is regarded as a member of gene families. These results provide a basis for further the cytochrome P450 family and is a crucial enzyme identification of gene function in C. alismatifolia and required for producing blue or purple flowers was related species. In conclusion, the reference genome of identified (Hopkins and Rausher 2011; Shimada et al. C. alismatifolia presented in this study provides a key 1999). Our study also showed that the F3 5’H gene resource for further studies and development of novel (gene13778) has a long length in C. alismatifolia ‘‘Chiang cultivars through marker-assisted breeding and genome Mai Pink’’ (Supplementary Fig. 37A), mainly due to the editing. second intron being over 20 kb (verified by mapping HiFi reads) as well as a 16.8 kb heterozygous deletion (Supplementary Fig. 37A). Previous studies have found MATERIALS AND METHODS that plant introns are generally shorter than those of animals, even in species with large genomes (Jin 2007), Plant materials making the F3 5’H gene (gene13778)in C. alismatifolia an extreme outlier to this general pattern. We also found The C. alismatifolia cultivar ‘‘Chiang Mai Pink’’ was 0 0 that the copy number of F3 5 H gene varied in selected for whole-genome sequencing and assembly. The Author(s) 2022 aBIOTECH (2022) 3:178–196 191 The cultivar was planted in the greenhouse at Shenzhen above for further analysis. LACHESIS (Burton et al. Institute of Agricultural Genomics, Chinese Academy of 2013)(https://github.com/shendurelab/LACHESIS) Agricultural Sciences. Leaf tissue was used for whole- was used to further aggregate, sequence, and locate genome sequencing, while flowers, leaves, and young scaffolds onto the chromosomes. Finally, the errors of stems were subjected to RNA sequencing (RNA-seq) to placement and orientation were corrected with manual support genome annotation and analyze of gene adjustment. The final chromosome anchor rate was expression levels (Supplementary File 1). 95.25%. Library preparation and sequencing Genomic evaluation We extracted the DNA from leaves by following the The integrity of the assembled genome was assessed by procedures of Qiagen Genomic DNA kit. According to the BUSCO v4.0.5 (Simao et al. 2015) based on single-copy standard protocol of PacBio, 20 kb preparation solution homologous genes in the OrthoDB database was used to obtain the SMRTbell target size library embryophyta_odb10. CEGMA v2 (Parra et al. 2007)was (Pacific Biosciences, CA, USA), and then the HiFi data also used to predict the genome completeness based on was generated with CCS software (https://github.com/ its database. HISAT2 v2.2.1 (Kim et al. 2019) was used PacificBiosciences/ccs). Genomic DNA (1–1.5 lg) was to map the RNA-seq data from flowers, pedicels, and randomly interrupted into 200–400 bp fragments and leaves, BWA v0.7.17-r1188 (Li and Durbin 2010)was sequenced on the MGI-SEQ 2000 platform. The total used to map the MGI data, and minimap2 v2.21-r1071 RNA of all sample materials were extracted with RNA- (Pertea et al. 2016) was used to map the HiFi data to prep pure Plant Kit (TIANGEN), and the RNA sequenc- genome to calculate the mapping rate. ing was carried out by MGI-SEQ 2000 sequencing platform. The Hi-C library was constructed with DpnII Repetitive sequence annotation restriction enzyme and sequenced on MGI-SEQ T7. The Fastp v0.19.4 was used to perform quality control (Chen The genome repeat sequence annotation was conducted et al. 2018). by using the Extensive de novo TE Annotator (EDTA) v1.9.4 (Su et al. 2021), a toolkit for de novo annotating K-mer analysis and genome assembly TEs in whole-genome datasets. The LAI and insertion time were calculated by using LTR_retriever v2.9.0 (Ou The Jellyfish v2.3.0 program (Marcais and Kingsford et al. 2018; Ou and Jiang 2018), with a substitution rate 2011) and Kmerfreq (Liu et al. 2013) were used to of default 1.3e-8 Ks/year. conduct the k-mer analysis by using the MGI data to estimate the genome size Genomes were assembled by Gene structure and function annotation using 30.35 Gb of high-quality HiFi reads using hifiasm v0.12 software with default parameters (Cheng et al. Trinity v2.8.5 (Grabherr et al. 2011) was used to 2021a)(https://github.com/chhylp123/hifiasm), lead- assemble the transcripts for predicting genes. Then the ing to a 1.22 Gb preliminary assembled genome. Using transcript-based predictions was conducted with PASA the quality-controlled MGI data, the genome was pol- v2.4.1 (Haas et al. 2003). We also performed homology ished using Nextpolish v1.2.4 software (Hu et al. 2020) predictions by using the protein sequences of M. bal- for four iterations, and the corrected genome was bisiana (NCBI, GCA_004837865.1), Z. mays (Phytozome compared with the nr/nt database (NCBI) to remove V13, v4), O. sativa (Phytozome V13, v7.0), and Z. offici- possible sequences originating from biological contam- nale (NCBI, GCA_018446385.1), and mapped them to ination (such as endophytes). the genome of C. alismatifolia, with these homology annotation results being input to Augustus v3.3.3 Hi-C scaffolding (Stanke et al. 2008) for training. The genes from PASA v2.4.1 (Haas et al. 2003) were further used to train the A total of 109.71 Gb clean paired-end reads generated GlimmerHMM v3.0.4 (Majoros et al. 2004), SNAP from Fastp v0.19.4 (Chen et al. 2018) were mapped to v2006-07–28 (Korf 2004) and Augustus v3.3.3 (Stanke the 1.19 Gb preliminary assembled genome by using et al. 2008) software to get results of de novo gene bowtie2 v2.3.2 (Langmead and Salzberg 2012), then the prediction. We used Evidencemodeler v1.1.1 (Haas et al. unique mapped reads were obtained. HiC-Pro v2.8.1 2008) to integrate all above evidence, and the results (Servant et al. 2015) identified and retained valid were re-trained using PASA v2.4.1 (Haas et al. 2003) for interactive paired reads from unique reads described one final round of gene annotation. The Author(s) 2022 192 aBIOTECH (2022) 3:178–196 Blastp v2.9 (ftp://ftp.ncbi.nlm.nih.gov/blast/execu C. alismatifolia. We identified five types of duplicated tables/blast?/LATEST/) was used to annotate the gene genes in C. alismatifolia, Z officinale, and M. acuminata functions by comparing the protein sequences of C. by utilizing the DupGen_finder (Qiao et al. 2019) soft- alismatifolia with Swiss-Prot (uniprot_sprot), NR ware with default parameters. WGDI (https://github. (https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz) com/SunPengChuan/wgdi) was used to verify the WGD and KOG (https://ftp.ncbi.nih.gov/pub/COG/KOG/) results. The program JCVI v1.1.18 (Tang et al. 2008)was databases with feature assignments made based on the used to further analyze the collinearity of C. alismatifolia best hits. Interpro annotation and GO annotation were and Z. officinale. performed by interproscan-5.21–60.0 (Blum et al. 2021). The KAAS server (https://www.genome.jp/ Methylation analysis kegg/kaas) was used to identify the KEGG pathway. For a more complete and comprehensive functional anno- Genomic DNA (2 lg) was obtained for ONT (Oxford tation, we employed egg-mapper v2.0.1 (Huerta-Cepas Nanopore Technology) library preparations and then et al. 2017), to further perform the functional annota- sequenced on the ONT PromethION sequencer. The call- tion based on eggNOG v5.0 (Huerta-Cepas et al. 2019). methylation module of Nanopolish v0.13.2 (https:// In addition, tRNAscan-SE v2.0.8 (Chan et al. 2021)was github.com/jts/nanopolish) was used to analyze used to annotate tRNA, barrnap v0.9 (https://github. 5-methylcytosine in the CG context in the genome based com/tseemann/barrnap) was used to annotate rRNA on Fast5 files, then the results were filtered according to and cmscan v1.1.2 from the software suite Infernal the condition of methylated_frequency C 0.5. (Nawrocki and Eddy 2013) was used to annotate other ncRNAs based on the Rfam database (Kalvari et al. Transcriptome analyses and gene co-expression 2021). networks Orthologous gene family identification, The clean RNA-seq reads from different sample were phylogenetic analysis, estimation of divergence mapped to the C. alismatifolia genome with HISAT2 time, and expansion and contraction of gene 2.2.1 (Kim et al. 2019), and StringTie v2.1.6 (Pertea family expansion et al. 2016) was used to calculate the FPKM of genes in each sample with Log2 transformation, while using the By using OrthoFinder v2.5.2 (https://github.com/davi Log2FPKM C 0.5 cutoff for the gene in at least one demms/OrthoFinder), we obtained orthogroups for 15 sample to ensure that the gene was expressed. Fea- species (Supplementary Table 5), and the protein tureCounts v2.0.1 (Liao et al. 2014) was used to calcu- sequences of 217 single-copy orthogroups were late the counts of each gene in each sample, then the obtained. The MAFFT v7.490 software with default differentially expressed genes (DEGs) of QMF SeR vs settings (Katoh and Standley 2013) was used to per- XCX SeR at the S4 period were analyzed by utilizing form sequence alignments for each single-copy gene DESeq2 v1.34.0 software (Love et al. 2014). The fol- family and the alignments were converted to a nucleo- lowing FC value range was used as the criterion for tide matrix by pal2nal v14 (Suyama et al. 2006). Under selecting DEGs: |log2FoldChange|C 1.5, adjusted the GTRGAMMA model, phylogenetic analysis was con- P value B 0.01. Based on the FPKM of genes, the R structed with RAxML v8.2.12 (Stamatakis 2014). The package WGCNA v1.70-3 (Langfelder and Horvath MCMCTree program of the PAML v4.9 (Yang 2007)was 2008) was used to build the co-expression network. further applied for estimation of species divergence Primers designed for qRT-PCR use were tested and lis- time based on three soft bounds at three nodes (Cheng ted in Supplementary Table 29. The Applied Biosys- TM TM TM et al. 2021b). The expansion and contraction analyses tems PowerUp SYBR Green Master Mix (Thermo was performed on the basis of the dated phylogeny tree Fisher Scientific, US) was used for qRT-PCR with a CFX TM and the homologous gene families from 15 species using Connect Real-Time System (BIO-RAD, US). Two bio- the CAFE v4.21 program (Bie et al. 2006). GO and KEGG logical repeats and two technical repeats were carried enrichment analyses were then performed with genes in out for each gene, and the relative expression level was -DDCT significantly expanded families. calculated through the comparative 2 method. Duplicated gene identification and WGD analysis GO and KEGG enrichment To study the size evolution of the C. alismatifolia gen- Based on the results from eggnog-mapper v2.0.1 ome, we identified whole-genome duplication events in (Huerta-Cepas et al. 2017, 2019) software, the protein The Author(s) 2022 aBIOTECH (2022) 3:178–196 193 gratefully acknowledge Daniel B Sloan (Colorado State University) sequences of M. acuminata, Z. officinale and C. alismat- and the personnel of the Wu laboratory for help with providing ifolia were functionally annotated, and the GO and KEGG suggestions and revising the manuscript. annotation results of the genes were extracted. With the help of the R package AnnotationForge v1.36.0 (https:// Author contributions ZW and GZ designed the experiments. YY, bioconductor.org/packages/AnnotationForge/). The JT, JZ, RJ, YX, and JL built the hybrid population. ZW, JY, YY, XZ, WL, LT, and XL conceived and interpreted the data. XL, DP, and XZ clusterProfiler v4.2.1 (Wu et al. 2021) program was performed the bioinformatic analysis. XL, HM, and FG performed used for GO and KEGG enrichment analysis. The visu- the experiments. XL wrote the first manuscript. ZW, JY, WL, LT, and alization of enrichment results was generated with R GZ revised the manuscript, which all authors edited and approved. package ggplot2 (https://github.com/tidyverse/ ggplot2). Data availability The genome sequence data including HiFi, ONT, and Hi-C data were deposited in the Genome Warehouse in National Genomics Data Center (NGDC), under accession number Bulked segregant analysis CRA006523, while the whole-genome assembly was also depos- ited under accession number GWHBHOP00000000 that is publicly The C. alismatifolia ‘‘Scarlet’’ (JL, red line) and C. alis- accessible at https://ngdc.cncb.ac.cn/gwh. matifolia ‘‘Dawn’’ (LM, pink line) used for BSA were Declarations planted in the Environmental Horticulture Research Institute, Guangdong Academy of Agricultural Sciences. Conflict of interest The authors declare that they have no con- The individual plants of JL and LM grown under natural flict of interest. conditions were used for crossbreeding to obtain F1 Open Access This article is licensed under a Creative Commons hybrid populations. The segregation ratio was 502 (with Attribution 4.0 International License, which permits use, sharing, the same red bract as JL): 483 (with the same pink bract adaptation, distribution and reproduction in any medium or for- as LM). Sequencing of the extracted DNA and RNA from mat, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons parental LM (P1) and JL (P2), F1 hybrids LM (50 indi- licence, and indicate if changes were made. The images or other viduals mixed, S1) and JL (50 individuals mixed, S2) was third party material in this article are included in the article’s carried on an Illumina NovaSeq 6000 platform. Data Creative Commons licence, unless indicated otherwise in a credit filtering and quality control were performed by Fastp line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted v0.19.4 (Chen et al. 2018). BWA v0.7.17-r1188 (Li and by statutory regulation or exceeds the permitted use, you will Durbin 2010) and STAR v2.7.9a (Dobin et al. 2013) were need to obtain permission directly from the copyright holder. To used to map DNA and RNA data to the genome, sepa- view a copy of this licence, visit http://creativecommons.org/ rately, then GATK v4.2.2.0 (DePristo et al. 2011)was licenses/by/4.0/. used for SNP calling. Finally, BSA and BSR analysis was performed using the R package QTLseqr v0.7.5.2 (Mansfeld and Grumet 2018). The quantitative method References of transcript is the same as described in transcriptome analyses, and the R package edgeR v3.36.0 (Robinson Akter R, Hasan R, Siddiqua SA et al (2008) Evaluation of analgesic et al. 2010) was used to analyze the differentially and antioxidant potential of the leaves of Curcuma alismat- expressed genes of S2 vs S1 and P2 vs P1, using the ifolia Gagnep. Stamford J Pharm Sci 1:3–9. https://doi.org/10. following FC value range as the criteria for selecting 3329/sjps.v1i1.1779 Baudry A, Heim MA, Dubreucq B et al (2004) TT2, TT8, and TTG1 DEGs: |logFC|C 1, FDR B 0.01. synergistically specify the expression of BANYULS and A detailed description of the above analysis, as well proanthocyanidin biosynthesis in Arabidopsis thaliana. Plant as the methods used for other analyses is listed in J 39:366–380. https://doi.org/10.1111/j.1365-313X.2004. Supplementary File 1. 02138.x Bayer PE, Golicz AA, Scheben A et al (2020) Plant pan-genomes Supplementary InformationThe online version contains are the new reference. Nat Plants 6:1389–1389. https://doi. supplementary material available at https://doi.org/10.1007/ org/10.1038/s41477-020-00776-y s42994-022-00081-6. Belwal T, Singh G, Jeandet P et al (2020) Anthocyanins, multi- functional natural products of industrial relevance: recent Acknowledgements These studies were supported by the start- biotechnological advances. Biotechnol Adv 43:107600. up fund of the Chinese Academy of Agricultural Sciences Elite https://doi.org/10.1016/j.biotechadv.2020.107600 Youth Program, Kunpeng Institute of Modern Agriculture at Bie TD, Cristianini N, Demuth JP et al (2006) CAFE: a computa- Foshan to Zhiqiang Wu, as well as supported by the opening tional tool for the study of gene family evolution. Bioinfor- project of Laboratory of Ecology and Evolutionary Biology from matics 22:1269–1271. https://doi.org/10.1093/ Yunnan University and Shenzhen Zhongnonghuadu Ecological bioinformatics/btl097 Technology Co., Ltd. (R20012) to Z.W., and the USDA National Blum M, Chang HY, Chuguransky S et al (2021) The InterPro Institute of Food and Agriculture Hatch project 02685 to W.L. We protein families and domains database: 20 years. Nucleic The Author(s) 2022 194 aBIOTECH (2022) 3:178–196 Acids Res 49:344–354. https://doi.org/10.1093/nar/ (Zingiberaceae). Jpn J Trop Agric 49:14–20. https://doi.org/ gkaa977 10.11248/JSTA1957.49.14 Bourque G, Burns KH, Gehring M et al (2018) Ten things you Gonzalez A, Zhao M, Leavitt JM et al (2008) Regulation of the should know about transposable elements. Genome Biol anthocyanin biosynthetic pathway by the TTG1/bHLH/Myb 19:199. https://doi.org/10.1186/s13059-018-1577-z transcriptional complex in Arabidopsis seedlings. Plant J Burton JN, Adey A, Patwardhan RP et al (2013) Chromosome-scale 53:814–827. https://doi.org/10.1111/j.1365-313X.2007. scaffolding of de novo genome assemblies based on chro- 03373.x matin interactions. Nat Biotechnol 31:1119–1125. https:// Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length doi.org/10.1038/nbt.2727 transcriptome assembly from RNA-Seq data without a refer- Chakraborty A, Mahajan S, Jaiswal SK et al (2021) Genome ence genome. Nat Biotechnol 29:644-U130. https://doi.org/ sequencing of turmeric provides evolutionary insights into its 10.1038/nbt.1883 medicinal properties. Commun Biol 4:1193. https://doi.org/ Grotewold E (2006) The genetics and biochemistry of floral 10.1038/s42003-021-02720-y pigments. Annu Rev Plant Biol 57:761–780. https://doi.org/ Chan PP, Lin BY, Mak AJ et al (2021) tRNAscan-SE 2.0: improved 10.1146/annurev.arplant.57.032905.105248 detection and functional classification of transfer RNA genes. Haas BJ, Delcher AL, Mount SM et al (2003) Improving the Nucleic Acids Res 49:9077–9096. https://doi.org/10.1093/ Arabidopsis genome annotation using maximal transcript nar/gkab688 alignment assemblies. Nucleic Acids Res 31:5654–5666. Chanapan S, Tontiworachai B, Deewatthanawong R et al (2017) https://doi.org/10.1093/nar/gkg770 Cloning and sequence analysis of chalcone synthase gene in Haas BJ, Salzberg SL, Zhu W et al (2008) Automated eukaryotic Curcuma alismatifolia. Acta Hortic. https://doi.org/10. gene structure annotation using EVidenceModeler and the 17660/ActaHortic.2017.1167.43 program to assemble spliced alignments. Genome Biol 9:R7. Chen SF, Zhou YQ, Chen YR et al (2018) fastp: an ultra-fast all-in- https://doi.org/10.1186/gb-2008-9-1-r7 one FASTQ preprocessor. Bioinformatics 34:884–890. Harborne JB, Williams CA (2000) Advances in flavonoid research https://doi.org/10.1093/bioinformatics/bty560 since 1992. Phytochemistry 55:481–504. https://doi.org/10. Cheng HY, Concepcion GT, Feng XW et al (2021a) Haplotype- 1016/S0031-9422(00)00235-1 resolved de novo assembly using phased assembly graphs Hopkins R, Rausher MD (2011) Identification of two genes causing with hifiasm. Nat Methods 18:170–175. https://doi.org/10. reinforcement in the Texas wildflower Phlox drummondii. 1038/s41592-020-01056-5 Nature 469:411–414. https://doi.org/10.1038/nature09641 Cheng SP, Jia KH, Liu H et al (2021b) Haplotype-resolved genome Hribova E, Cizkova J, Christelova P et al (2011) The ITS1–5.8S- assembly and allele-specific gene expression in cultivated ITS2 sequence region in the Musaceae: structure, diversity ginger. Hortic Res 8:188. https://doi.org/10.1038/s41438- and use in molecular phylogeny. PLoS ONE 6:e17863. 021-00599-8 https://doi.org/10.1371/journal.pone.0017863 DePristo MA, Banks E, Poplin R et al (2011) A framework for Hu J, Fan JP, Sun ZY et al (2020) NextPolish: a fast and efficient variation discovery and genotyping using next-generation genome polishing tool for long-read assembly. Bioinformatics DNA sequencing data. Nat Genet 43:491–498. https://doi. 36:2253–2255. https://doi.org/10.1093/bioinformatics/ org/10.1038/ng.806 btz891 D’Hont A, Denoeud F, Aury JM et al (2012) The banana (Musa Huerta-Cepas J, Forslund K, Coelho LP et al (2017) Fast genome- acuminata) genome and the evolution of monocotyledonous wide functional annotation through orthology assignment by plants. Nature 488:213–219. https://doi.org/10.1038/ eggNOG-Mapper. Mol Biol Evol 34:2115–2122. https://doi. nature11241 org/10.1093/molbev/msx148 Ding HQ, Mao LH, Hu W et al (2021) Cloning and bioinformatics Huerta-Cepas J, Szklarczyk D, Heller D et al (2019) eggNOG 5.0: a analysis of chlorophyll degrading gene PPH from Curcuma hierarchical, functionally and phylogenetically annotated alismatifolia. Mol Plant Breed 19:2521–2526. https://doi. orthology resource based on 5090 organisms and 2502 org/10.13271/j.mpb.019.002521 viruses. Nucleic Acids Res 47:309–314. https://doi.org/10. Dobin A, Davis CA, Schlesinger F et al (2013) STAR: ultrafast 1093/nar/gky1085 universal RNA-seq aligner. Bioinformatics 29:15–21. https:// Jiao YN, Li JP, Tang HB et al (2014) Integrated syntenic and doi.org/10.1093/bioinformatics/bts635 phylogenomic analyses reveal an ancient genome duplication Douglas J, Futuyma MK (2017) Evolution. Sinauer Associates, in monocots. Plant Cell 26:2792–2802. https://doi.org/10. Stamford 1105/tpc.114.127597 Dubos C, Stracke R, Grotewold E et al (2010) MYB transcription Jin G (2007) Research of plant intron evolution pattern. Fujian factors in Arabidopsis. Trends Plant Sci 15:573–581. https:// Agriculture and Forestry University doi.org/10.1016/j.tplants.2010.06.005 Kalvari I, Nawrocki EP, Ontiveros-Palacios N et al (2021) Rfam 14: Dyson CJ, Goodisman MAD (2020) Gene duplication in the expanded coverage of metagenomic, viral and microRNA honeybee: patterns of DNA methylation, gene expression, families. Nucleic Acids Res 49:D192–D200. https://doi.org/ and genomic environment. Mol Biol Evol 37:2322–2331. 10.1093/nar/gkaa1047 https://doi.org/10.1093/molbev/msaa088 Katoh K, Standley DM (2013) MAFFT multiple sequence alignment Ferreyra MLF, Rius SP, Casati P (2012) Flavonoids: biosynthesis, software version 7: improvements in performance and biological functions, and biotechnological applications. Front usability. Mol Biol Evol 30:772–780. https://doi.org/10. Plant Sci 3:222. https://doi.org/10.3389/fpls.2012.00222 1093/molbev/mst010 Fu HS, Zeng T, Zhao YY et al (2021) Identification of chlorophyll Ke LJ, Yu HW, Peng FT et al (2020) Preliminary report on hybrid metabolism- and photosynthesis-related genes regulating breeding of Curcuma alsimatifolia. J Minnan Normal Univ green flower color in Chrysanthemum by integrative tran- 33:62–66. https://doi.org/10.16007/j.cnki.issn2095-7122. scriptome and weighted correlation network analyses. Genes 2020.04.010 12:449. https://doi.org/10.3390/genes12030449 Khoo HE, Azlan A, Tang ST et al (2017) Anthocyanidins and Fukai S, Udomdee W (2005) Inflorescence and flower initiation anthocyanins: colored pigments as food, pharmaceutical and development in Curcuma alismatifolia Gagnep ingredients, and the potential health benefits. Food Nutr The Author(s) 2022 aBIOTECH (2022) 3:178–196 195 Res 61:1–21. https://doi.org/10.1080/16546628.2017. Mao LH, Liu JX, Ding HQ et al (2018) Microsatellite characteri- 1361779 zation analysis and primers design of the whole transcrip- Kim D, Paggi JM, Park C et al (2019) Graph-based genome tome of Curcuma alismatifolia. Mol Plant Breed alignment and genotyping with HISAT2 and HISAT-genotype. 16:7408–7414. https://doi.org/10.13271/j.mpb.016.007407 Nat Biotechnol 37:907–915. https://doi.org/10.1038/ Mao LH, Jin L, Ding HQ et al (2020) Estimation of genome size s41587-019-0201-4 analysis of C. alismatifolia ‘Chiang Mai Pink.’ J Zhejiang Agric Kjonboon T, Kanlayanarat S (2005) Effects of gibberellic acid on Sci 61:2066–2073. https://doi.org/10.16178/j.issn.0528- the vase life of cut patumma (Curcuma alismatifolia Gagnep.) 9017.20201034 ‘‘Chaing Mai’’ flowers. Acta Hortic 673:525–529. https://doi. Marcais G, Kingsford C (2011) A fast, lock-free approach for org/10.17660/ActaHortic.2005.673.70 efficient parallel counting of occurrences of k-mers. Bioinfor- Korf I (2004) Gene finding in novel genomes. BMC Bioinform 5:59. matics 27:764–770. https://doi.org/10.1093/bioinfor https://doi.org/10.1186/1471-2105-5-59 matics/btr011 Koshioka M, Umegaki N, Boontiang K et al (2015) Anthocyanins in Mol J, Grotewold E, Koes R (1998) How genes paint flowers and the bracts of Curcuma species and relationship of the species seeds. Trends Plant Sci 3:212–217. https://doi.org/10.1016/ based on anthocyanin composition. Nat Prod Commun S1360-1385(98)01242-4 10:453–456. https://doi.org/10.1177/1934578X1501000 Nakayama M, Roh MS, Uchida K et al (2000) Malvidin 3-rutinoside 320 as the pigment responsible for bract color in Curcuma Langfelder P, Horvath S (2008) WGCNA: an R package for alismatifolia. Biosci Biotech Bioch 64:1093–1095. https://doi. weighted correlation network analysis. BMC Bioinform org/10.1271/bbb.64.1093 9:559. https://doi.org/10.1186/1471-2105-9-559 Nawrocki EP, Eddy SR (2013) Infernal 1.1: 100-fold faster RNA Langmead B, Salzberg SL (2012) Fast gapped-read alignment with homology searches. Bioinformatics 29:2933–2935. https:// Bowtie 2. Nat Methods 9:357–359. https://doi.org/10.1038/ doi.org/10.1093/bioinformatics/btt509 Nmeth.1923 Ou SJ, Jiang N (2018) LTR_retriever: a highly accurate and Leong-Skornickova J, Sida O, Jarolimova V et al (2007) Chromo- sensitive program for identification of long terminal repeat some numbers and genome size variation in Indian species of retrotransposons. Plant Physiol 176:1410–1422. https://doi. Curcuma (Zingiberaceae). Ann Bot 100:505–526. https://doi. org/10.1104/pp.17.01310 org/10.1093/aob/mcm144 Ou SJ, Chen JF, Jiang N (2018) Assessing genome assembly quality Li H, Durbin R (2010) Fast and accurate long-read alignment with using the LTR Assembly Index (LAI). Nucleic Acids Res Burrows-Wheeler transform. Bioinformatics 26:589–595. 46:e126. https://doi.org/10.1093/nar/gky730 https://doi.org/10.1093/bioinformatics/btp698 Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately Li HL, Wu L, Dong ZM et al (2021) Haplotype-resolved genome of annotate core genes in eukaryotic genomes. Bioinformatics diploid ginger (Zingiber officinale) and its unique gingerol 23:1061–1067. https://doi.org/10.1093/bioinformatics/ biosynthetic pathway. Hortic Res 8:189. https://doi.org/10. btm071 1038/s41438-021-00700-1 Pertea M, Kim D, Pertea GM et al (2016) Transcript-level Li YY, Chen XH, Yu H et al (2022) Comparative RNA-Seq analysis to expression analysis of RNA-seq experiments with HISAT, understand anthocyanin biosynthesis and regulations in StringTie and Ballgown. Nat Protoc 11:1650–1667. https:// Curcuma alismatifolia. Folia Hortic 34:1–19. https://doi.org/ doi.org/10.1038/nprot.2016.095 10.2478/fhort-2022-0007 Petchang R, Buddharak P, Chundet R et al (2017) Cloning of DFR Liao Y, Smyth GK, Shi W (2014) featureCounts: an efficient general gene in Curcuma alismatifolia ‘‘Chiang Mai Pink’’ and purpose program for assigning sequence reads to genomic Agrobacterium-mediated transformation. Re J Biotechnol features. Bioinformatics 30:923–930. https://doi.org/10. 12:1–6 1093/bioinformatics/btt656 Qiao X, Li QH, Yin H et al (2019) Gene duplication and evolution in Liu BH, Shi YJ, Yuan JY et al (2013) Estimation of genomic recurring polyploidization-diploidization cycles in plants. characteristics by analyzing k-mer frequency in de novo Genome Biol 20:38. https://doi.org/10.1186/s13059-019- genome projects. arXiv:1308.2012. https://doi.org/10. 1650-2 48550/arXiv.1308.2012 Ranavat S, Becher H, Newman MF et al (2021) A draft genome of Liu GF, Xia YD, Liu TK et al (2018) The DNA methylome and the ginger species alpinia nigra and new insights into the association of differentially methylated regions with differ- genetic basis of flexistyly. Genes 12:1297. https://doi.org/10. ential gene expression during heat stress in Brassica rapa. Int 3390/genes12091297 J Mol Sci 19:1414. https://doi.org/10.3390/ijms19051414 Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Biocon- Love MI, Huber W, Anders S (2014) Moderated estimation of fold ductor package for differential expression analysis of digital change and dispersion for RNA-seq data with DESeq2. gene expression data. Bioinformatics 26:139–140. https:// Genome Biol 15:550. https://doi.org/10.1186/s13059-014- doi.org/10.1093/bioinformatics/btp616 0550-8 Servant N, Varoquaux N, Lajoie BR et al (2015) HiC-Pro: an Lu LM (2007) Fujian tropical cut flower industry and develop- optimized and flexible pipeline for Hi-C data processing. ment strategies. Chin Agric Sci Bull 23:434–438. https://doi. Genome Biol 16:259. https://doi.org/10.1186/s13059-015- org/10.3969/j.issn.1000-6850.2007.03.096 0831-x Majoros WH, Pertea M, Salzberg SL (2004) TigrScan and Shimada Y, Nakano-Shimada R, Ohbayashi M et al (1999) GlimmerHMM: two open source ab initio eukaryotic gene- Expression of chimeric P450 genes encoding flavonoid-3 ’,5 finders. Bioinformatics 20:2878–2879. https://doi.org/10. ’-hydroxylase in transgenic tobacco and petunia plants. FEBS 1093/bioinformatics/bth315 Lett 461:241–245. https://doi.org/10.1016/S0014- Mansfeld BN, Grumet R (2018) QTLseqr: an R package for bulk 5793(99)01425-8 segregant analysis with next-generation sequencing. Plant Simao FA, Waterhouse RM, Ioannidis P et al (2015) BUSCO: Genome 11:1–5. https://doi.org/10.3835/plantgenome2018. assessing genome assembly and annotation completeness 01.0006 with single-copy orthologs. Bioinformatics 31:3210–3212. https://doi.org/10.1093/bioinformatics/btv351 The Author(s) 2022 196 aBIOTECH (2022) 3:178–196 Stamatakis A (2014) RAxML version 8: a tool for phylogenetic Wu SD, Han BC, Jiao YN (2020) Genetic contribution of pale- analysis and post-analysis of large phylogenies. Bioinformat- opolyploidy to adaptive evolution in angiosperms. Mol Plant ics 30:1312–1313. https://doi.org/10.1093/bioinformatics/ 13:59–71. https://doi.org/10.1016/j.molp.2019.10.012 btu033 Wu TZ, Hu EQ, Xu SB et al (2021) clusterProfiler 4.0: a universal Stanke M, Diekhans M, Baertsch R et al (2008) Using native and enrichment tool for interpreting omics data. Innovation syntenically mapped cDNA alignments to improve de novo 2:100141. https://doi.org/10.1016/j.xinn.2021.100141 gene finding. Bioinformatics 24:637–644. https://doi.org/10. Wu Y, Wen J, Xia Y et al (2022) Evolution and functional 1093/bioinformatics/btn013 diversification of R2R3-MYB transcription factors in plants. Su WJ, Ou SJ, Hufford MB et al (2021) A tutorial of EDTA: extensive Hortic Res 9:uhac058. https://doi.org/10.1093/hr/uhac058 de novo TE annotator. Plant Transposable Elem 2250:55–67. Xie XB, Li S, Zhang RF et al (2012) The bHLH transcription factor https://doi.org/10.1007/978-1-0716-1134-0_4 MdbHLH3 promotes anthocyanin accumulation and fruit Sun S, Zhang DY, Ives A et al (2011) Why do stigmas move in a colouration in response to low temperature in apples. Plant flexistylous plant? J Evol Biol 24:497–504. https://doi.org/ Cell Environ 35:1884–1897. https://doi.org/10.1111/j.1365- 10.1111/j.1420-9101.2010.02181.x 3040.2012.02523.x Suyama M, Torrents D, Bork P (2006) PAL2NAL: robust conver- Xu ZH, Mahmood K, Rothstein SJ (2017) ROS induces anthocyanin sion of protein sequence alignments into the corresponding production via late biosynthetic genes and anthocyanin codon alignments. Nucleic Acids Res 34:609–612. https:// deficiency confers the hypersensitivity to ROS-generating doi.org/10.1093/nar/gkl315 stresses in Arabidopsis. Plant Cell Physiol 58:1364–1377. Taheri S, Abdullah TL, Abdullah NAP et al (2012) Genetic https://doi.org/10.1093/pcp/pcx073 relationships among five varieties of Curcuma alismatifolia Yan HL, Pei XN, Zhang H et al (2021) MYB-mediated regulation of (Zingiberaceae) based on ISSR markers. Genet Mol Res anthocyanin biosynthesis. Int J Mol Sci 22:3103. https://doi. 11:3069–3076. https://doi.org/10.4238/2012.August.31.4 org/10.3390/ijms22063103 Taheri S, Abdullah TL, Abdullah NAP et al (2014) Assessing the Yang ZH (2007) PAML 4: Phylogenetic analysis by maximum genetic relationships of Curcuma alismatifolia varieties using likelihood. Mol Biol Evol 24:1586–1591. https://doi.org/10. simple sequence repeat markers. Genet Mol Res 1093/molbev/msm088 13:7339–7346. https://doi.org/10.4238/2014.September.5. Yang FS, Nie S, Liu H et al (2020) Chromosome-level genome 12 assembly of a parent species of widely cultivated azaleas. Nat Taheri S, Abdullah TL, Rafii MY et al (2019) De novo assembly of Commun 11:5269. https://doi.org/10.1038/s41467-020- transcriptomes, mining, and development of novel EST-SSR 18771-4 markers in Curcuma alismatifolia (Zingiberaceae family) Za´veska´ E, Fer T, Sı´da O et al (2012) Phylogeny of Curcuma through Illumina sequencing. Sci Rep 9:3047. https://doi. (Zingiberaceae) based on plastid and nuclear sequences: org/10.1038/s41598-019-39944-2 proposal of the new subgenus Ecomata. Taxon 61:747–763. Tang HB, Bowers JE, Wang XY et al (2008) Synteny and collinearity https://doi.org/10.1002/tax.614004 in plant genomes. Science 320:486–488. https://doi.org/10. Zhang HM, Lang ZB, Zhu JK (2018) Dynamics and function of DNA 1126/science.1153917 methylation in plants. Nat Rev Mol Cell Biol 19:489–506. Wang Y, Zhang D, Renner SS et al (2004) Botany: a new self- https://doi.org/10.1038/s41580-018-0016-z pollination mechanism. Nature 431:39–40. https://doi.org/ Zhao F, Li G, Hu P et al (2018) Identification of basic/helix-loop- 10.1038/431039b helix transcription factors reveals candidate genes involved Wang L, Shi Y, Chang XJ et al (2019) DNA methylome analysis in anthocyanin biosynthesis from the strawberry white-flesh provides evidence that the expansion of the tea genome is mutant. Sci Rep 8:2721. https://doi.org/10.1038/s41598- linked to TE bursts. Plant Biotechnol J 17:826–835. https:// 018-21136-z doi.org/10.1111/pbi.13018 Zhao P, Li X, Jia J et al (2019) bHLH92 from sheepgrass acts as a Wang M, Chen L, Liang ZJ et al (2020) Metabolome and negative regulator of anthocyanin/proanthocyandin accumu- transcriptome analyses reveal chlorophyll and anthocyanin lation and influences seed dormancy. J Exp Bot 70:269–284. metabolism pathway associated with cucumber fruit skin https://doi.org/10.1093/jxb/ery335 color. BMC Plant Biol 20:386. https://doi.org/10.1186/ Zhao R, Song X, Yang N et al (2020) Expression of the subgroup IIIf s12870-020-02597-9 bHLH transcription factor CpbHLH1 from Chimonanthus Winkel-Shirley B (2001) Flavonoid biosynthesis. A colorful model praecox (L.) in transgenic model plants inhibits anthocyanin for genetics, biochemistry, cell biology, and biotechnology. accumulation. Plant Cell Rep 39:891–907. https://doi.org/10. Plant Physiol 126:485–493. https://doi.org/10.1104/pp.126. 1007/s00299-020-02537-9 2.485 The Author(s) 2022

Journal

aBIOTECHSpringer Journals

Published: Sep 1, 2022

Keywords: Anthocyanin synthesis; Siam tulip; Floriculture; Zingiberaceae; Genome evolution

References