Conservation genomic investigation of an endangered conifer,Thuja sutchuenensis,reveals low genetic diversity but also low genetic load

2024-03-09 10:19TongzhouToRichrdMilneJilingLiHengYngShiyngWngSihnChenKngshnMo
植物多样性 2024年1期

Tongzhou To ,Richrd I.Milne ,Jiling Li ,Heng Yng ,Shiyng Wng ,Sihn Chen ,Kngshn Mo ,c,*

a Key Laboratory for Bio-Resource and Eco-Environment of Ministry of Education & Sichuan Zoige Alpine Wetland Ecosystem National Observation and Research Station,College of Life Science,Sichuan University,Chengdu,China

b Institute of Molecular Plant Science,School of Biological Science,University of Edinburgh,Edinburgh EH9 3BF,UK

c College of Science,Tibet University,Lhasa 850000,Xizang Autonomous Region,PR China

Keywords: Sichuan Arborvitae Genetic load Deleterious mutations Demographic history Conservation genomics

ABSTRACT Endangered species generally have small populations with low genetic diversity and a high genetic load.Thuja sutchuenensis is an endangered conifer endemic to southwestern China.It was once considered extinct in the wild,but in 1999 was rediscovered.However,little is known about its genetic load.We collected 67 individuals from five wild,isolated T.sutchuenensis populations,and used 636,151 SNPs to analyze the level of genetic diversity and genetic load in T.sutchuenensis to delineate the conservation units of T.sutchuenensis,based on whole transcriptome sequencing data,as well as target capture sequencing data.We found that populations of T.sutchuenensis could be divided into three groups.These groups had low levels genetic diversity and were moderately genetically differentiated.Our findings also indicate that T.sutchuenensis suffered two severe bottlenecks around the Last Glaciation Period and Last Glacial Maximum.Among Thuja species,T.sutchuenensis presented the lowest genetic load and hence might have purged deleterious mutations efficiently through purifying selection.However,distribution of fitness effects analysis indicated a high extinction risk for T.sutchuenensis.Multiple lines of evidence identified three management units for T.sutchuenensis.Although T.sutchuenensis possesses a low genetic load,low genetic diversity,suboptimal fitness,and anthropogenic pressures all present an extinction risk for this rare conifer.This might also hold true for many endangered plant species in the mountains all over the world.

1.Introduction

The conservation of threatened species is crucial in a time of global biodiversity loss (Lamoreux et al.,2006; Chau et al.,2013).Threatened species often possess small and highly fragmented populations,making them susceptible to extinction due to increasing anthropogenic pressures and stochastic environmental and demographic events (Lande,1993; Wilson,1996).Small populations typically exhibit low levels of genetic diversity,which limits their ability to adapt to environmental changes and other challenges due to a scarcity of adaptive alleles (Frankham et al.,2002; Jordan et al.,2019).In addition,fragmentation hinders or prevents gene flow between populations (Lange et al.,2010;Mantyka-pringle et al.,2012),causing small and isolated populations to accumulate deleterious alleles through inbreeding,resulting in a reduction in individual and mean population fitness(Bertorelle et al.,2022).This increase in genetic load reduces population fitness,manifesting as lower growth rates due to decreased resistance to fungal infections,maladaptation to harmful environments,and reduced seed viability (Schoen et al.,1998;Hedrick and Garcia-Dorado,2016; Ma et al.,2022).Moreover,in species that have recently become rare due to human activity,increased genetic drift in small populations leads to faster fixation of deleterious mutations,ultimately reducing the probability that populations will survive(Lynch et al.,1993;Robinson et al.,2022).

The rapid advance of high-throughput sequencing technology has made investigating deleterious mutations,inbreeding as well as other factors increasingly feasible and robust (Allendorf et al.,2010; Manners et al.,2013; Palkopoulou et al.,2015; Bortoluzzi et al.,2020).Conservation genomic approaches have provided valuable insights into the accumulation or purging of deleterious mutations in small,endangered wild populations of non-model organisms,such as mountain gorillas (Xue et al.,2015),Alpine ibex(Grossen et al.,2020),Iberian lynx(Kleinman-Ruiz et al.,2022),Acer yangbiense(Ma et al.,2022),Rhododendron griersonianum(Ma et al.,2021) andPopulus ilicifolia(Chen et al.,2020).However,few studies have focused on endangered conifers.

The Sichuan Arborvitae,Thuja sutchuenensisFranch.,is an endangered conifer endemic to limestone cliffs,ridges,or steep slopes at 800-2100 m above sea level in northwestern Chongqing and the Daba Mountains in the eastern Sichuan Basin (Xiang et al.,2002;Cui et al.,2015;Tang et al.,2015).It is a tertiary relict species with a complex evolutionary history,and was once listed as extinct in the wild by IUCN but was rediscovered in 1999 (Xiang et al.,2002).Previous work surveyed its demographic history and level of genetic diversity using chloroplast simple sequence repeats (cpSSR)and restriction site-associated DNA sequencing (RAD-seq) data(Yao et al.,2021),as well as traditional molecular markers (e.g.inter-simple sequence repeat markers and single-copy nuclear loci;Liu et al.,2013; Qin et al.,2021).However,we have no knowledge regarding the level of genetic load inT.sutchuenensis.Note that genomic resequencing of conifers presents a particular challenge because of their very large genomes,which contains numerous and varied ancient repetitive sequences (Borthakur et al.,2022; Wei et al.,2022).In contrast,transcriptome sequencing (RNA-seq)technology provides ample coverage of the genome for conservation genomic studies and is ideally suited for analyzing deleterious alleles due to its focus on coding (exon) regions (Primmer,2009;Angeloni et al.,2011; Li et al.,2020; Miao et al.,2021; Yang et al.,2022).Therefore,RNA-seq is a powerful approach to disentangle the demographic history,genetic diversity,degree of inbreeding,and intensity of purging deleterious mutations in rare conifer species such asT.sutchuenensis.

T.sutchuenensishas experienced recent human-induced decline,such as intensive logging and habitat loss(Qin et al.,2017,2021).If reduced fitness results in decreased population size,which in turn increases inbreeding and genetic load,further exacerbating genetic diversity loss,this creates a destructive feedback loop known as the extinction vortex(Godwin et al.,2020).Consequently,conservation and management efforts should be based on robust evaluation of genetic diversity,inbreeding coefficients,and genetic load in small populations(Grossen et al.,2020;Bertorelle et al.,2022;Willi et al.,2022; Xie et al.,2022).Hence,in the present study,we have conducted a conservation genomic survey ofT.sutchuenensisusing high-throughput sequencing data.Our objectives are to(a)evaluate the genetic load and potential extinction risk ofT.sutchuenensis;(b)estimate the genetic diversity and inbreeding,and unravel the population structure ofT.sutchuenensis; (c) investigate the demographic history and gene flow among populations ofT.sutchuenensis; and (d) develop more targeted conservation measures forT.sutchuenensis.These findings are critical for forecasting the fate ofT.sutchuenensispopulations and improving the effectiveness of current conservation programs.Moreover,these findings will contribute to our understanding of genetic load and extinction risk of endangered plant species.

2.Materials and methods

2.1.Sample collection and RNA sequencing

We sampled 67 individuals from five wild populations across the distribution region ofT.sutchuenensis,plus eightThuja plicataindividuals from Lushan Botanical Garden,Chinese Academy of Sciences (Fig.1; Tables 1 and S1).Fresh leaves were kept below -80°C before extraction.The RNA library for RNA-seq was prepared using standard protocols (Grabherr et al.,2011).Specifically,mRNA was purified from total RNA using polyT and then fragmented into 300-350 bp fragments.The first strand cDNA was reverse-transcribed using fragmented RNA and dNTPs(dATP,dTTP,dCTP,and dGTP),and second strand cDNA synthesis was subsequently performed.The remaining overhangs of double-strand cDNA were converted into blunt ends via polymerase activities.After adenylation of 3’ends of DNA fragments,sequencing adaptors were ligated to the cDNA and the library fragments were purified.The template was enriched by PCR,and the PCR products were purified to obtain the final library.After library preparation and pooling of different samples,the samples underwent Illumina sequencing (Berry Genomics).The raw sequence data reported in this paper has been deposited in the Genome Sequence Archive(Chen et al.,2021)in the National Genomics Data Center(Xue et al.,2022),China National Center for Bioinformation/Beijing Institute of Genomics,Chinese Academy of Sciences (GSA: CRA009439 and CRA007034),which are publicly accessible at https://ngdc.cncb.ac.cn/gsa.Additional transcriptome data of fourThujaspecies andThujopsis dolabratawas gathered from Li et al.(2021) (Table S2).

Fig.1.Phylogenetic and population genetic analyses of Thuja sutchuenensis based on SNP variation.(a) Map showing the geographic distribution of sampling locations for T.sutchuenensis populations.(b) Population structure plots with K = 2 and 3.The x-axis shows the different individuals of populations; the y-axis quantifies the proportion of an individual's variation from inferred ancestral populations.(c) A maximum likelihood phylogenetic tree.Abbreviations: Toc,Thuja occidentalis; Tko,Thuja koraiensis; Tpli,Thuja plicata; SRR,Thuja standishii; Tdo,Thujopsis dolabrata.(d) Principal component analysis (PCA) plot of the first two components.

2.2.Reference assembly and SNP calling

To obtain a high-quality reference,we built a reference transcriptome forT.sutchuenensisby assembling five sets of RNA-seq datade novofrom five populations in TRINITY v.2.8.4 (Grabherr et al.,2011).The individual transcriptome was assembled in TRINITY v.2.8.4 with default parameters,and only the longest isoform of each gene was retained for downstream analyses.We used ORTHOFINDER v.2.3.12 (Emms and Kelly,2015) to identify orthologous groups,and added the unassigned sequences to form the final reference sequence.

Assembly statistics were generated using the trinitystats.pl script within Trinity's toolkit.To obtain high-quality contigs,we used BLASTN v.2.7.1 (Altschul et al.,1997) to align sequences to the microbial genome database (http://mbgd.genome.ad.jp/).Sequences with more than 90%similarity were removed.Redundant sequences were removed using CD-HIT-EST v.4.6.8(Fu et al.,2012).To evaluate the transcriptome completeness,we employed BUSCO v.5.4.4(Manni et al.,2021) with default parameters,using the embryophyte conserved gene dataset(embryophyta_odb10)as the query database.Gene annotation information was predicted by TRANSDECODER v.4.6.1(http://github.com/TransDecoder/TransDecoder/wiki/).

To obtain high-quality reads,TRIMMOMATIC v.0.36(Bolger et al.,2014)was used to filter Illumina raw reads using the following parameters: “LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:5”.For each individual,the quality-filtered reads were mapped to the reference transcriptome using STAR v.2.7.9(Dobin et al.,2013)with default parameters.SAMTOOLS v.1.2(Li et al.,2009) was used to convert Sequence Alignment/Map (SAM) files to Binary Alignment/Map (BAM) files and sort BAM files.Duplicate reads were marked and removed using Picard v.2.18.11 (http://broadinstitute.github.io/picard/).The HaplotypeCaller and GenotypeGVCFs modules from the Genome Analysis Toolkit v.4.2.0(GATK)(DePristo et al.,2011) were used to call single nucleotide polymorphisms (SNPs)following the best practices workflow (DePristo et al.,2011).SNPs were filtered based on the GATK recommendations for hard filtering(e.g.,QD <2.0; FS >30,Dataset 1; Fig.S1).To ensure high-quality data,we removed SNPs with genotype depths below 5,a minor allele frequency below 0.05,a significant deviation from Hardy-Weinberg equilibrium(p<0.001)or those containing more than 20% missing data at all SNP sites,hereafter sites (Dataset 2;Fig.S1),using BCFTOOLS v.1.6 (Narasimhan et al.,2016a) and VCFTOOLS v.0.1.15 (Danecek et al.,2011).

2.3.Population genetic and phylogenetic analyses

We used VCFTOOLS v.0.1.15 to calculate the values of Tajima'sDand fixation statistics(FST).As a lack of information on missing and invariant sites may cause bias when estimating nucleotide diversity(π),we used PIXY v.1.2.7.Beta1(Korunes and Samuk,2021)and 10-kb windows to calculate these genetic parameters on Dataset1,including invariant sites.Meanwhile,we used the R packages ggplot2 (Villanueva and Chen,2019) and ggstatsplot(Patil,2021) to plot and perform statistical analyses.Subsequent analyses were based on Dataset 2.Population structure was analyzed to assign individuals to variable numbers (K= 1-5) of ancestors using the program ADMIXTURE v.1.30 (Alexander and Lange,2011).We used VCFTOOLS and PLINK v.1.90 (Purcell et al.,2007) to convert data.To complement the results of ADMIXTURE,we conducted principal component analysis (PCA) on all individuals to examine genetic structure or relatedness using the SMARTPCA module in the software EIGENSOFT v.6.1.3(Price et al.,2006).

Table 1 Location information of the sampled Thuja sutchuenensis populations.

In our phylogenetic analysis,we used a Python script (https://github.com/edgardomortiz/vcf2phylip) to convert the SNPs database to concatenated sequences for all individuals.IQTREE v.1.6.12(Nguyen et al.,2015)was used to construct a maximum likelihood phylogenetic tree forT.sutchuenensiswith default parameters usingThujopsis dolabrataand an additional four species inThujaas the outgroups.Bootstrap values were calculated based on 1000 replicates.The phylogenetic tree was visualized using FigTree v.1.4.3(http://tree.bio.ed.ac.uk/software/figtree/).

We used the KING program v.2.2.5(Manichaikul et al.,2010) to estimate genetic relatedness (duplicate,1st-degree relative pairs,2nd-degree relative pairs,3rd-degree relative pairs,and unrelated pairs)between individuals of three groups ofT.sutchuenensiswith the-kinshipflag.We also calculated the inbreeding coefficients(FIS)in three groups and the species on a whole using the -hetflag in VCFTOOLS.

2.4.Demographic history inference

Since the accuracy of demographic inferences is affected by mutation rate(μ),we made calculations based on a fossil-calibrated approach‘μ=D*g/2T’(Kondrashov and Crow,1993),whereDis the observed frequency of pairwise differences between the two species,Tis the estimated divergence time,andgis the estimated generation time.The generation time was set at 50 years following Qin et al.(2021),and the divergence time betweenT.sutchuenensisandT.dolabratawas set at 62.68 Mya,according to previous studies(Li et al.,2021; Qin et al.,2021).VCF2Dis (https://github.com/BGIshenzhen/VCF2Dis) was used to calculateDand the mean difference value,and generate a pairwise differences matrix.Based on the observed frequency of pairwise differences and divergence time,we estimated a mutation rate of 1.19 × 10-8per site per generation to infer the demographic history.Global max-likelihood estimates were derived from 100 independent runs,with 100,000 coalescent simulations and 40 likelihood maximization algorithm cycles.The Akaike's information criteria (AIC) (Akaike,1998) was used to assess the relative fit of each model,and the best fit with the lowest AIC was chosen.The 95% confidence intervals (CIs) for the parameters from the best model were constructed using 200 parametric bootstrap replicates by simulating SFS from the maximum composite likelihood estimates.

PCA and ADMIXTURE identified three groups ofT.sutchuenensis.We usedfastsimcoal2software (Excoffier et al.,2013) to estimate divergence time and gene flow between these groups,based on the site frequency spectrum(SFS).Demographic scenarios to be tested were established on the basis of the phylogenetic and population structure analyses.First,to minimize the effects induced by selection,we used SNPs (Dataset 3; Fig.S1) at those four-fold synonymous sites (4DTV,i.e.,codons in which any base at the third position is translated into the same amino acid)that had no missing data across all individuals.We employed the iTools module in the ReSeqTools toolkit (https://github.com/BGI-shenzhen/Reseqtools)with the parameter “Fatools getCdsPep -4DSite” to extract 4DTV sites and used a custom script to generate 0-fold and 4-fold data.We used a Perl script to generate a folded two-dimensional site frequency spectrum for each of the three groups,inferred based on 4DTV sites (Ru et al.,2018).

In addition,we used STAIRWAY PLOT 2 (Liu and Fu,2020) to analyze historical changes in effective population size (Ne) for the three groups ofT.sutchuenensis(Dataset 3).Two hundred subsamples comprising 67%of all sites were created,the median value of the estimation was used as a final output,and the 95%confidence interval of each value was calculated.

Because both demographic methods mentioned above failed to adequately account for the most recent changes in population size,we further estimated the contemporaryNeforT.sutchuenensis.To achieve this,we used an LD (linkage disequilibrium) model estimation method with a random mating and a critical value of 0.02 that was implemented in NEESTIMATE v.2.1 (Do et al.,2014) (Data set 2).Meanwhile,we used an approximate Bayesian computation(ABC)model to estimate contemporaryNethat was implemented in ONESAMP v.1.2 (Tallmon et al.,2008) (Data set 2).

2.5.Characterization of deleterious mutations

To compare genetic loads betweenT.sutchuenensisand its congeners,the target capture sequencing data from Li et al.(2021;Table S2) was used to map to the assembled reference transcriptome,using BWA v.0.7.17 (Li and Durbin,2009) with the default parameters.This target capture sequencing data set covered three individuals of eachThujaspecies (15 individuals; Table S2),and captured nearly the entire exome.We used one individual fromThujopsis dolabrataas the outgroup.To obtain reliable genotypes from RNA sequencing reads,we followed the genotype calling procedure as described above.To avoid false positive results due to missing data,we applied VCFTOOLS to remove INDEL (insertiondeletion)sites,and BCFTOOLS to retain no missing sites(Dataset 4;Fig.S1)We excluded sites where the individuals contained multiple alleles.Finally,a total of 298,719 high-quality SNPs was retained.The ancestral state for each polymorphic site was inferred with outgroup sequences (Thujopsis dolabrata) using the program ESTSFS v.2.0.3 (Keightley and Jackson,2018).

We estimated the selection pressure upon each species by calculating the nucleotide diversity (π) at nonsynonymous (zerofold) and synonymous (four-fold,i.e.,codons in which any base at the third position is translated into the same amino acid)sites(i.e.,πnand πs,respectively) and the ratio of πn/πsusing PIXY v.1.2.7.Beta1 (Korunes and Samuk,2021) and 10-kb windows.

To reduce the influence of algorithm preference on the evaluation of deleterious mutations,we used three different methods to evaluate the SNPs database.SIFT (sorting intolerant from tolerant)was used to predict whether amino acid substitution (A,T,C,G)affects the protein function energy algorithm (Ng and Henikoff,2003; Kumar et al.,2009; Sim et al.,2012).SIFT4G targeted the reference sequence by constructing a target base according to the conservation of homologous sequences of multiple species.Because of the score database of histone coding sequences,the harmfulness of nonsynonymous mutations in species can be quickly evaluated(Vaser et al.,2016).Default settings recommended by the authors were used.We first downloaded the UniRef90 protein database,following the developer's recommendations.Based on this database,we used make-SIFT-DB-all.pl script to construct the reference ofT.sutchuenensisSIFT scoring database,and then used SIFT4-G_Annotator.jar to build the respective nonsynonymous SNPs based on annotating the database.Any SNP mutation with a score<0.05 in the SIFT database was defined as a deleterious mutation site(henceforth denoted as a DEL site).

Grantham score was one of the earliest algorithms developed to predict the amino acid effect of deleterious mutations (Grantham,1974).The score varied from 0 to 215,with lower scores reflecting substitutions between amino acids that were more similar in composition,polarity,and molecular volume,whereas higher scores reflect dissimilarity (Grantham,1974).We used the Grantham scoring matrix to score each nonsynonymous SNP,and to identify those SNPs with a Grantham score (GS) ≥150 which was marked as Radical in the script as moderately harmful mutation sites (Li et al.,1984) (hereafter termed ‘Radical sites’).

We defined any deleterious mutation identified by the above two methods as a moderate and slightly deleterious mutation.However,mutations that affect the transcription of the entire gene(stop gained,stop lost,start lost,or start gained) were highly deleterious mutations and were henceforth termed Loss of Function (LOF) mutations.To assess the accumulation of highly deleterious mutations in different species and groups,we used SNPEFF v.3.4 (Cingolani et al.,2012) to assign SNPs to different mutational types by comparing the reference and gene annotation information with parameter “-eff”.The variants were grouped in three categories:(1) LOF; (2) nonsynonymous; and (3) synonymous.

To further investigate differences in genetic load between species,we sampled three individuals per species,and analyzed each species pair by selecting a single individual from each species.We then counted the number of derived mutations at each site that was variable in one individual but not in the other.From this data,we calculated theRXYratio,which represents the difference between two populations' relative abundance of derived allele frequencies for a specific category of sites (Do et al.,2015).TheRXYratio was calculated independently for each category of mutations: Radical,DEL,and LOF.In each instance,RXYvalues greater than 1 or less than 1 signified that the ability to purge deleterious mutations was stronger in species X or Y,respectively (Do et al.,2015).We employed this approach to search for evidence of purifying selection.For detailed information on calculating the various types of genetic load,please refer to the Supplementary data.

2.6.Estimates of the distribution of fitness effects

The distribution of fitness effects(DFE)from a new mutation can be independently modeled for deleterious,neutral,and beneficial mutations.For our reference transcriptome,we counted the number of polymorphic sites of each frequency class(synonymous and nonsynonymous sites,Data set 5; Fig.S1) to generate an unfolded SFS for five species and three groups.Briefly,variants in coding regions were annotated as synonymous or nonsynonymous and polarized using derived versus ancestral alleles.We inferred the distribution of fitness effects for new mutations using POLYDFE v.2(Tataru et al.,2017; Tataru and Bataillon,2019; Dutheil,2020).We chose the default model,which modeled DFEs as a mixture of gamma and exponential distributions.The fitting procedure itself was run with the basin-hopping algorithm with up to 10 iterations to avoid local maxima.Using the provided postprocessing R code,the models were ranked using the Akaike-information criterion(Akaike,1998),and parameter estimates were calculated as the AICweighted average.We divided the deleterious portion of the DFE in sixNes (effective population size multiply by selection coefficient)classes(-∞to-1000,-1000 to-100,-100 to-10,-10 to-1,-1 to 0 and 0 to+∞).In addition,we used the selection coefficient(s)to create subsets of polymorphisms that were beneficial(Nes >0),or deleterious (Nes < 0),with the latter further subdivided into differentlevelsofseverity,i.e.-1<Nes≤0,-10 <Nes ≤-1,-100 <Nes ≤-10,-1000 <Nes ≤-100,and(Nes ≤-1000),the last of which we defined as highly deleterious.

3.Results

3.1.Transcriptome sequencing and de novo assembly

The finalized reference transcriptome ofT.sutchuenensiscomprised 60,102 unigenes,with an average length of 812 bp and a contig N50 of 2425 bp (Fig.S2; Table S3).The Benchmarking Universal Single-Copy Orthologs (BUSCOs) assessment indicated a quality assessment score of 90.1% and a total of 1454 complete BUSCOs.After eliminating low-quality reads,we acquired approximately 216 Gb of data from the 67 samples analyzed.

3.2.Population genetic and phylogenetic analyses

After hard-filtering through the GATK pipeline,we obtained 3,045,724 SNPs.Of these,636,151 SNPs were retained for population genetic and phylogenetic analyses.Phylogenetic analysis of transcriptome-wide SNPs suggested that individuals ofT.sutchuenensiswere clustered into three separate monophyletic clades.Principal component analysis(PCA)identified three groups along PC1 (variance explained = 15.86%) and PC2 (variance explained = 12.42%) (Fig.1d).ADMIXTURE indicated that the population genetic structure was relatively weak (Fig.S3).WhenK= 3,the three identified groups clearly conformed with the results from PCA(Fig.1b),although the shared ancestry of the groups might have caused non-monophyly in two out of three groups.The range of population level pairwiseFSTwas between 0.0392 and 0.0729,suggesting a low to moderate level of genetic differentiation(Yang et al.,2022),especially for TH4 or TH5 between the other three populations (Table 2).Nevertheless,genetic differentiationbetween the three groups was moderate (FST>0.05) (Table S4).These results indicated that we could divide all sampled population into three groups,henceforth termed groups G1,G2 and G3,and that these groups comprise populations TH1+TH2+TH3,TH4,and TH5,respectively.

Table 2 Genetic distances (FST values,below diagonal) between Thuja sutchuenensis populations.

The level of nucleotide diversity was similar among the fiveT.sutchuenensispopulations,or three groups(Tables S5 and S6).All average π (standard deviation,SD) was 0.00186 (0.00111) across 10-kb windows,which is lower than the result 0.00272 (0.0122)from Shalev et al.(2022) (transcriptome data vs.target capture sequencing data).Additionally,the values of Tajima'sDwere positive in each population and group,which indicates that all populations and groups might have experienced population bottleneck(s) and/or balanced selection in their evolutionary history(Table 3 and S7).

Table 3 Population genetic statistics of Thuja sutchuenensis populations.

ForT.sutchuenensispopulations overall,the inbreeding coefficient (FIS,0.0209) (Table S8) that we estimated was much lower than previous study using 112 individuals showed that inbreeding coefficient in theT.plicatarange-wide population was 0.331(Shalev et al.,2022).Although sampling size (67 vs.112) and data type (transcriptome data vs.target capture sequencing data) may have affected the estimate,mating system might have played a major role sinceT.plicatacan be self-compatible (Shalev et al.,2022).For each of the three groups our data distinguished,FISscores were highest in G3 and lowest in G2(Table S8).In addition,the G1 group possessed the highest value of average kinship coefficient (0.06; Table S9).We also calculated duplicate,1st-degree,2nd-degree,and 3rd-degree relationships in pairs of individuals,and we found that there was a higher proportion of close relationships in G1 group (0.82; Table S9).

3.3.Demographic history and gene flow

The STAIRWAY PLOT 2 analysis showed that all groups had suffered a severe bottleneck effect during the Last Glaciation Period between around 110 and 10 thousand years ago (Kya) (Fig.2b).Following a period of continuous population decline,theNeofT.sutchuenensisstayed relatively stable from around 10 Kya onwards (Fig.2b).

Fig.2.Demographic history of Thuja sutchuenensis.The light gray areas represent different glaciation events during the Pleistocene (LGP: Last Glaciation Period,LGM: Last Glaciation Maximum).(a)Schematic of demographic scenario modeled in fastsimcoal2.Estimated effective population sizes(Ne)and divergence times are indicated.The numbers next to the arrows denote the per-generation migration rate between populations.(b)Changes in effective population size(Ne)over time in T.sutchuenensis inferred by Stairway Plot method using a generation time of 50 years and an autosomal mutation rate(μ)of 1.19×10-8 per base pair per generation.Thick lines represent the median and thin light lines the 95% pseudo-CI defined by the 2.5% and 97.5% estimations of the SFS analysis.

Furthermore,we set eleven models to estimate gene flow among the threeT.sutchuenensisgroups (G1,G2,G3) using coalescent-based simulations infastsimcoal2.Among the elevencandidate models employed,the best-fitting model (with minimum Akaike's weight value) contained 11 parameters (e.g.,ANC:common ancestor effective populations size,TDIV: divergence time,NPOP: effective population size of each group,MIG: gene flow;Table 4,the number 0,1,2 denotes group 1,2,3,respectively),including six separate periods of gene flow (Tables 4 and S10;Fig.S4).The result showed that gene flow had been continuous between the three groups since they diverged,with gene flow from G3 into other groups being greater than that into G3.Specifically,gene flow from G3 into G1 was ~65.8-fold that in the reversedirection; the gene flow from G3 into G2 was 1.6-fold that in the reverse direction (Fig.2a).

Table 4 Inferred demographic parameters(including 95%confidence intervals)for the bestfitting fastsimcoal2 model,which is shown in Fig.2b.

From 92 Kya onwards (95% CIs: 80.95-267.95 Kya),the three groups seemed to have a stable effective population size (Fig.2b).Moreover,the effective population size of G2(Ne=15,905,95%CIs:14,498-21,790) was the smallest of the three groups (Table 4).Meanwhile,the estimated contemporaryNeforT.sutchuenensiswas estimated to be 138.8 (95% CIs: 90.1-269.6) using NEESTIMATE v.2.1.The ratio of contemporaryNe(138.8)to the census population size (7000) was calculated to be ~0.02.At the same time,the contemporaryNeofT.sutchuenensis(138.8) was about half that ofT.plicata,which was estimated to be 270.3(transcriptome data vs.target capture sequencing data)(Shalev et al.,2022).We also used NeABC as implemented in the ONESAMP v.1.2 to estimate the effective population size.However,the results demonstrated that the meanNewas 45.5(95%CIs:40.3-55.9),which was 67.2%lower than the estimate derived from NEESTIMATE (138.8).

3.4.Detection of deleterious mutations

According to the nearly neutral theory(Ohta,1992),the efficacy of purifying selection depends on the effective population size(Ne).Thus,the πn/πsratio is directly and negatively related toNe.Moreover,a small population size could result in a low level of genetic diversity and/or selection efficacy,so that the extent of genetic load can be reflected by the ratio of nonsynonymous versus synonymous nucleotide diversity,and hence correspondingly the ratio of 0-fold and 4-fold degenerate positions in protein-coding sequences.We used 15 individuals ofThujafrom a previous study(five species,three individuals per species; Li et al.,2021) to estimate πnand πs.Our analysis indicated thatT.sutchuenensishad relatively higher nucleotide diversity compared with other species in 0-fold degenerate sites and the value of θπon 4-fold degeneration sites ofT.sutchuenensiswas higher than that of the other 4 species(p<0.001;the Kruskal-Wallis rank-sum test)(Figs.S5 and S6).Likewise,the πn/πsratio ofT.sutchuenensiswas lower than that of other species (Table S11),suggesting a higher selection efficacy withinT.sutchuenensisthan in its congeners.Therefore,relative to otherThujaspecies,T.sutchuenensishas presumably purged more deleterious mutations and possesses a lower genetic load,which was consistent with our results of πn/πsratio,πnand πsvalues(Figs.S5 and S6; Table S11).However,our result of the πn/πsratio(0.478) ofT.sutchuenensiswas slightly higher than that found in recent work onT.plicata(π0/π4ratio of 0.33) (both are target capture sequencing data) (Shalev et al.,2022).

To explore the accumulation of genetic load in each species,we used a total of 258,271 (T.occidentalis),261,714 (T.koraiensis),259,889 (T.plicata),267,209 (T.standishii),and 271,524(T.sutchuenensis)SNPs from protein-coding regions to evaluate the accumulation and purging of deleterious mutations(Table S12).The SNPEFF program annotated 5196 Loss of Function (LOF) SNPs fromT.sutchuenensis,and similar numbers from the other species: 5147 fromT.occidentalis,5151 fromT.koraiensis,5158 fromT.plicata,and 5201 fromT.standishii(Table S12).The respective totals of Radical and DEL SNP mutations identified for each species by Grantham score and SIFT4G were 7487 and 5519 (T.occidentalis),7557 and 5525 (T.koraiensis),7499 and 5516 (T.plicata),7601 and 5564(T.standishii),and 7673 and 5586(T.sutchuenensis)(Table S12).

The relative number of derived alleles(RXY),which was used to estimate genetic load (Xue et al.,2015; Narasimhan et al.,2016b),indicated that the proportion of all three kinds of deleterious mutations were lower inT.sutchuenensisthan in any other congeneric species(Fig.3a).This suggested a purging of genetic load,including both high and low impact deleterious alleles,inT.sutchuenensis.Among the three groups inT.sutchuenensis,RXYfound the lowest proportion of LOF,Radical and DEL mutations in G1,but the highest level of LOF and Radical mutations in G3 (Fig.3b).

Fig.3.The relative number of derived alleles at LOF(red),Radical(orange),and DEL(yellow)sites are frequent in one population and not another.The horizontal line of 1 indicates nondifference in a relative number of derived alleles.(a)Abbreviations:Toc,Thuja occidentalis;Tko,T.koraiensis;Tpli,T.plicata;Tst,T.standishii;Tsu,T.sutchuenensis.(b)Note:G1 includes TH1,TH2,and TH3; G2,TH4; G3,TH5.

3.5.The distribution of fitness effects across species and groups

Analysis of population genetic polymorphism offered a perspective on sequence evolution over more recent timescales,notably via the distribution of fitness effects(DFEs)for segregating derived alleles (Eyre-Walker and Keightley,2007; Tataru et al.,2017).DFE analyses revealed extraordinary differences betweenT.sutchuenensisand the other species (Fig.4a).WithinT.sutchuenensis,the G3 group possessed the most deleterious polymorphisms (Nes <-1000); no beneficial polymorphisms (Nes>0) were detected in the G2 or G3 groups (Fig.4b).

4.Discussion

4.1.Low genetic diversity in three identified groups of Thuja sutchuenensis

The genetic diversity of small populations tends to be low due to genetic drift and inbreeding (Keller and Waller,2002; Kohn et al.,2006; Li et al.,2012; Kim et al.,2022).Therefore,the genetic diversity of rare and endangered species with a narrow distribution is often lower than that of its more widespread relatives(Clegg et al.,1989;Yang et al.,2018;De Kort et al.,2021).Here,genetic diversity withinT.sutchuenensiswas estimated using population-level transcriptome sequencing data.The nucleotide genetic diversity(0.00186) ofT.sutchuenensiswas found to be relatively low compared with that of other threatened conifers such asCupressus chengiana(0.0077) (Li et al.,2020),Cupressus gigantea(0.0027)(Yang et al.,2022),Juniperus erectopatens(0.00222)but higher than that ofJ.microsperma(0.00129) (Miao et al.,2021).This is consistent with two previous surveys based on six nuclear loci(Qin et al.,2021) and RAD-seq ofT.sutchuenensis(Yao et al.,2021).

Our phylogenetic,PCA,and ADMIXTURE analyses suggest there are three distinct groups inT.sutchuenensis,two comprising single populations and the other (G1) comprising the remaining three.PairwiseFSTvalues indicate moderate levels of genetic differentiation among these three groups (between 0.0570 and 0.0608;Table 2 and S4),which is higher than the reported pairwiseFSTvalue between the two management units (MUs) of YTR and NR(0.0488)ofC.giganteabased on the same data type(RNA-seq;Yang et al.,2022).Hence,there are three groups inT.sutchuenensisthat form naturalMUs.

4.2.Demographic history and gene flow explain the genetic diversity pattern of Thuja sutchuenensis

Our reconstructed demographic history of the threeT.sutchuenensisgroups suggested that two groups might have experienced two bottleneck events.The first bottleneck for G1 and G2 occurred around 1 Mya (Fig.2b),with a steep drop in G1 but gradual in G2,coinciding with the Xixiabangma Glaciation(1.2-0.8 Mya)on the Qinghai-Tibet Plateau(Zheng et al.,2002).The second bottleneck occurred ~26.5 Kya in G1 and slightly earlier in G2,around the start of the Last Glacial Maximum (LGM,26-19 Kya)(Fig.2) (Clark et al.,2009; Cui et al.,2011; Qin et al.,2017).Conversely,group G3 contracted gradually from ~1 Mya until ~5 Kya,becoming the smallest group from ~200 Kya onwards.Hence the Quaternary climate oscillations affected this group differently.From the early Holocene (~10 Kya),G1 and G2 were stable in size,whereas G3 was stable in size from ~5 Kya (Fig.2a).Hence,the three groups (G1,G2,G3) exhibited distinct evolutionary histories(Fig.2).Previous findings based on extended Bayesian skyline plots suggested thatT.sutchuenensispopulation size has not changed over time (Qin et al.,2021),although this approach predicts effective population size with less certainty (von et al.,2022).BecauseMUsare based on the demographic independence of populations(Palsbøll et al.,2007),we argue that our demographic reconstruction provides further support for the division of theT.sutchuenensispopulation into three separateMUs.

Fossil evidence from the late Pliocene in Shanxi Province,northern China(Cui et al.,2015),ca.700 km to the northeast of the current distribution ofT.sutchuenensisin the Daba Mountains,southwestern China,indicates that this species once had a wider distribution in China.According to thefastsimcoal2analysis,the ancestral effective population size of the species at approximately 92 Kya was around 137,993(95%CIs:134,790-142,845).This is 8.7 to 2.6 times larger than that of the present-day groups and 997.8 times greater than the contemporary effective population size (as inferred by NEESTIMATE,which is 138.8) of this species.This suggests that the species as a whole has experienced strong bottlenecks and habitat fragmentation in the past.

In general,the effective population size (Ne) is much smaller than the census population size(Nc)(Nei et al.,1975;Charlesworth,2009).TheNcofT.sutchuenensiswas approximately 7000(https://www.iucnredlist.org/species/32378/2816862,last accessed on 31 March 2023),much smaller than the historicalNeof any of the three groups as estimated byfastsimcoal2(>7000) (Fig.2a).This finding suggests that the effective population size may have a hysteresis effect and will continue to decline.Meanwhile,given the estimated contemporaryNeof 138.8 (NEESTIMATE) or 45.5 (NeABC),and the extremely low ratio of the estimated contemporaryNe(0.02 or 0.006)to the census population size(7000),it is possible that there was a recent reduction in the effective population size ofT.sutchuenensis(Turner et al.,2006).

One possible explanation for this could be thatT.sutchuenensishas experienced very recent population contractions due to human impacts such as habitat destruction and logging,but these are too recent in terms of generations to have created a bottleneck effect so far.However,if no action is taken,the bottleneck effect will lead to a further decline ofNe,increasing future extinction risk of the species.

Ourfastsimcoal2analyses suggested high levels of historical gene flow between groups.Gene flow was higher from G1 or G3 to G2 than the other way,yet these were similar in magnitude.However,gene flow from G1 to G3 was 66 times smaller than the other way(Fig.2a,Table 4).Previous studies based on nuclear loci detected similar extensive gene flow among populations ofT.sutchuenensis(Qin et al.,202).It has been demonstrated that anthropogenic impacts are directly related to the loss of biodiversity and the mass extinction of species(Ceballos et al.,2015).In the case ofT.sutchuenensis,trees have long been harvested by local residents for such diverse products as Buddhist beads,carvings,and furniture(Qin et al.,2017).This issue,plus habitat destruction and fragmentation,are likely to be the main causes of a reduction in itsNeand genetic diversity.These,plus the additional threat of climate change,are likely to lead to continued population shrinkage,which will ultimately lead to extinction unless suitable conservation and management measures are taken.

4.3.Low genetic load yet high extinction risk of Thuja sutchuenensis

Endangered species generally suffer from a high genetic load(Díez-del-Molino et al.,2018;Bertorelle et al.,2022;Xie et al.,2022).The accumulation of deleterious mutations leads to a high genetic load (Agrawal and Whitlock,2012; Chen et al.,2020) and can put small populations at risk of extinction (Lande,1994; Lynch et al.,1995; Dussex et al.,2021; Khan et al.,2021).Given enough time,even a very small population may adapt to and survive through high levels of homozygosity (Qiao et al.,2010; Dussex et al.,2021; Khan et al.,2021); nevertheless,this does not apply to species that were driven to the verge of extinction by recent human activity.

Based on the relatively lower θπ(0-fold)/θπ(4-fold) ratio ofT.sutchuenensiscompared to other species inThuja(Table S11),it is therefore likely to have a lower genetic load,which indicates a more robust efficiency in purifying selection,and a lower probability of extinction (Chen et al.,2017,2019; Stewart et al.,2017).Likewise,a relatively lower ratio ofRXYcompared to the other congeneric species,indicates thatT.sutchuenensishas been more successful in purging deleterious mutations.The efficiency of purifying selection depends on the population size and life-history traits (Agrawal and Whitlock,2012; Chen et al.,2017,2022),but also how long a period it has been acting.Long-lived plants with small population size can persist despite a high genetic load(Klekowski,1988).Our analyses suggest that the long-lived conifer species,T.sutchuenensis,has suffered severe population declines during its evolutionary history,and its population size became relatively small since at least 5 Kya.Therefore,likeTupistra pingbianensis(Qiao et al.,2010),it has been naturally rare for a long time,allowing ample time for both genetic drift and purging selection to act.In support of this,we found that the majority of detected deleterious alleles withinT.sutchuenensispopulation are heterozygous(Table S13),consistent with purifying selection acting more strongly when the deleterious recessives are expressed (Zhu et al.,2022).These results suggest that some endangered trees with smallNetend to purge deleterious mutations whether their impacts are high,mild,or low.

Populations with smallNeare expected to deviate from their optimal fitness,due to a stronger effect of stochastic events,as these populations are less able to generate new polymorphisms that are neutral or nearly neutral.However,the likelihood of beneficial or deleterious polymorphisms increases in these populations (Galtier,2016).DFE scores indicate thatT.sutchuenensishad the highest proportion of both beneficial and deleterious polymorphisms among the fiveThujaspecies (Fig.4a),but the lowest proportion of neutral to nearly neutral polymorphisms.This indicates thatT.sutchuenensishas suboptimal fitness,elevating its extinction risk (Robinson et al.,2019).

4.4.Conservation implications

It is critical to identify conservation units (CUs) for biodiversity protection,which makes managers understand the population units of endangered species,such as (a) evolutionarily significant units (ESUs),which represent different lineages that have experienced independent evolutionary history without gene flow; (b)management units (MUs),that means demographically distinct populations; and (c) adaptive units (AUs),which indicate differentiation in adaptability to landscape and climate (Funk et al.,2012;Yang et al.,2022).Our analyses have indicated thatT.sutchuenensiscomprises three distinct groups.We recommend that these groups be treated as independentMUs.The effective population size is lowest in G2,and highest in G1(Fig.2a),whereas the proportion of highly deleterious mutations is highest in G3,followed by G2 and G1 (Fig.3b; Table S14).Nucleotide diversity was higher in G1 and G2 but lower in G3 (Table S5).To maximize the efficacy of conservation measures,we recommend prioritizing efforts in the following order: G3 >G2 >G1.

AlthoughT.sutchuenensisappears to have persisted successfully in small populations for more than 10,000 years,two factors are critical in determining whether it will continue to do so.The first is whether the degree of recent anthropogenic shrinkage,on top of past natural population decline,has been enough to trigger a new round of genetic drift,genetic diversity loss,further reducing fitness,and potentially sending it towards an extinction vortex(Lynch et al.,1993; Agrawal and Whitlock,2012; Teixeira and Huber,2021).The second is how it will respond to climate change,given that its reduced genetic diversity would likely reduce its ability to adapt to new environments.Therefore,conservation strategies and measures,including habitat conservation,collecting its seeds as germplasm(Guo et al.,2019;Zhang et al.,2021),enhancing connectivity of fragmented habitats,artificial assistance of migration between populations,increasing of local population size by artificial plantation etc.,should be seriously considered to minimize the extinction risk ofT.sutchuenensis.CurrentlyT.sutchuenensisis listed as ‘Endangered’in theIUCN Red List(Yang et al.,2013)and“Class I”on theNational Protection List of Wild Plants in China(Fu and Chin,1992).Conservation efforts,including evaluating its conservation status,implementingin situpreservation through the establishment of nature reserves (e.g.,Xuebaoshan National Nature Reserve),and propagating seedlings from artificial cuttings or seeds,have already been undertaken.Future research employing whole genome sequencing and resequencing to investigate adaptive variation and promote selective breeding should be encouraged to support the sustainable conservation of this endangered conifer (Borthakur et al.,2022; Zhao et al.,2023).In short,our study provides new insight as well as recommendations to the conservation of rare tree species,which,in combination with existing conservation approaches,will facilitate the survival ofT.sutchuenensis.

Author contributions

K.S.M.designed and supervised this study.T.Z.T.and J.L.L.managed the fieldwork and collected the materials.T.Z.T.,J.L.L.,H.Y.,S.H.C.and S.Y.W.performed the data analysis.T.Z.T,R.I.M,J.L.L and K.S.M.prepared the manuscript.All authors read and approved the manuscript.

Data availability

The data used for the analyses in this manuscript are available from the National Genomics Data Center: https://ngdc.cncb.ac.cn/gsa/s/h3Pn1TTa and https://ngdc.cncb.ac.cn/gsa/s/HZ98AJ87.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This study was financially supported by National Natural Science Foundation of China (grant No.U20A2080,31622015),the Institutional Research Fund from Sichuan University(2021SCUNL102) and Fundamental Research Fund for the Central Universities of China (SCU 2021D006,SCU 2022D003).

Appendix A.Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.pld.2023.06.005.