Occurrence of D-amino acids in natural products

2023-12-29 13:42DanielArmstrongandAlainBerthod
Natural Products and Bioprospecting 2023年6期

Daniel W.Armstrong and Alain Berthod

Abstract Since the identified standard genetic code contains 61 triplet codons of three bases for the 20 L-proteinogenic amino acids (AAs),no D-AA should be found in natural products.This is not what is observed in the living world.D-AAs are found in numerous natural compounds produced by bacteria,algae,fungi,or marine animals,and even vertebrates.A review of the literature indicated the existence of at least 132 peptide natural compounds in which D-AAs are an essential part of their structure.All compounds are listed,numbered and described herein.The two biosynthetic routes leading to the presence of D-AA in natural products are: non-ribosomal peptide synthesis (NRPS),and ribosomally synthesized and post-translationally modified peptide (RiPP) synthesis which are described.The methods used to identify the AA chirality within naturally occurring peptides are briefly discussed.The biological activity of an all-L synthetic peptide is most often completely different from that of the D-containing natural compounds.Analyzing the selected natural compounds showed that D-Ala,D-Val,D-Leu and D-Ser are the most commonly encountered D-AAs closely followed by the non-proteinogenic D-allo-Thr.D-Lys and D-Met were the least prevalent D-AAs in naturally occurring compounds.

Keywords D-amino acid,Chirality,Biogenesis,Natural products

1 Introduction

Amino acids (AA) are very simple chemicals containing both an amine and a carboxylic acid group [1].In alpha-AAs,the amine and carboxylic acid groups are attached to the same carbon atom.Hydrogen is always the third substituent,hence,when the fourth and last substituent is not hydrogen (glycine),the α-AA is chiral.These α-AAs are the most important building blocks of living organisms given their ability to combine to form proteins.It is not yet well understood why the L-enantiomers are predominantly found in proteins that consist of no more than 19 L-α-AAs [2].However,as early as 1894,Emil Fisher discovered enzymes,that he called Invertin and Emuslin,able to change the chirality of sugars [3].Carrying on with amino acids,he identified their two possible D-and L-forms [4].For years,D-AAs were believed to be present only in unicellular living forms such as viruses or bacteria with a variety of D-AAs found in peptides and antibiotics such as penicillin,the first antibiotic [5].However,in 1981,D-Ala was unambiguously established in the sequence of dermorphin1The number in bold and italic refers to the number assigned to the compound in the first column of the tables.{129},a peptide extracted from the skin of the tree frogPhyllomedusasauνagine[6].Since then,numerous occurrences of D-AA have been found at different positions of bioactive peptides or proteins and in natural products [7].Smaller amounts of free D-AAs are found in essentially all biological systems[8],sometimes playing vital roles and frequently being disease biomarkers [9].

This review focuses on D-AAs in natural products.What is their origin,how were they identified,and what is their function? D-aspartic acid was found in several tissues (teeth,bones,skin,lung,lens) of ageing living bodies,and D-serine was found,in addition to D-aspartic acid,in β-amyloid of Alzheimer patients [9,10].A wide variety of D-AAs can be found in the proteins of biological systems animal or vegetal,including mammal and humans as recently reviewed [8,11].These particular peptides or altered proteins will not be considered in this study.D-amino acids are also key building blocks in the biosynthesis of polyketide-nonribosomal hybrid peptides that found great interest in recent years [12-14].To maintain a focused and manageable size for this review,these and other hybrid structures were not included.Rather,the large number of natural peptide compounds containing at least one D-AA is reviewed.These compounds were arbitrary sorted by their source: prokaryote bacteria and algae,eukaryote fungi,and multicellular invertebrate and vertebrate animals.Since the universal standard genetic code encodes only L-AAs,the presence of D-AAs in peptides and proteins is most frequently explained as the result of post-translational modifications or non-ribosomal biosynthesis.Non-proteinogenic AAs of the D-or L-configuration also must be considered in this context.

2 Defining D-amino acids

D-AAs are the opposite (mirror image) enantiomeric forms of the 19 natural chiral proteinogenic α-AAs found in natural products.However,there are numerous nonproteinogenic amino acids that also have a stereogenic center.Table 1 lists a selection of unusual AAs encountered when searching for D-AAs in natural products.Several rare methylated or hydroxylated variants of these amino acids are not included in this table.These nonproteogenic AAs will be listed in the subsequent "natural compound" tables since they were commonly found associated with proteogenic D-AA containing peptides.

3 Identifying D-amino acids

Due to the identical physicochemical properties of enantiomers in isotropical environments,except for their chiroptical qualities,the presence of D-AAs is not easy to determine and can often be overlooked.Fisher was the pioneer especially interested in the chirality of molecules establishing the D and L nomenclature and testing his isolated compounds by artificially synthesizing them and using optical rotation to detect their chirality [3,4].Spectroscopic,chemical,and especially separation methods are used to characterize D-AAs [9].

3.1 Spectroscopic methods

The spectroscopic methods using light for D-AA identification are optical rotation,called polarimetry,and circular dichroism spectroscopies.Raman optical activity is a new technique under development [15].NMR spectroscopy uses very high frequency magnetic fields.1H and13C NMR spectra give structural information on the atom organization of the compound studied.Nuclear Overhauser Effect Spectroscopy (NOESY) is a powerful 2D NMR measurement giving information on the threedimensional structure of biomolecules.Heteronuclear Single Quantum Coherence (HSQC) NMR gives correlations between carbons and protons that are separated by less than four bonds.Differences in NOESY or HSQC NMR assignments are seen between epimeric peptides differing in the chirality of amino acids.However,NMR methods are unable to determine the chirality of free AAs.Mass spectrometry analyses in vacuum detect charged compounds or fragments accelerated by electrical and magnetic fields.Fast-atom bombardment mass spectrometry (FAB-MS) uses a beam of neutral atomic gas (argon or xenon) bearing a high kinetic energyfor the soft ionization of relatively large nonvolatile molecules with molecular weight up to 5000-6000 daltons and creation of charged fragments.Modern MS equipment can reach a m/z precision as low as 0.0001 Dalton allowing differentiation between a CH2methylene group,m/z=14.0156 Da,and a nitrogen atom,m/z=14.0031.This was not possible with FAB-MS instruments but Land D-enantiomers still cannot be differentiated by MS as they have the same exact mass.

Table 1 Non proteogenic amino acids found in natural peptides sorted by increasing carbon number with L-or S-configuration shown.

Table 1 (continued)

Table 1 (continued)

The oldest and still very commonly used spectroscopic methods include polarimetry and circular dichroism methods [9].Polarimetry measures the degree of rotation of a plane-polarized monochromatic light passing through a sample.Circular dichroism measures the absorption of left-and right-circularly polarized light of different wavelengths by a sample and displays the difference which is not nil for a chiral compound [9].Polarimetry being linked to the solute refractive index is not very sensitive while circular dichroism can be somewhat more sensitive for solutes containing chromophores.

It was found that the optical properties of chiral molecules could be enhanced by orders of magnitude when adsorbed onto specific surfaces.Surface-enhanced polarimetry,circular dichroism and Raman optical activity methods using plasmonic nanostructures have potential in detecting smaller amounts of chiral molecules,but still are under developments [9].

3.2 Chemical methods

Chemical methods include Edman degradation of the peptide chain,Marfey analysis,and chemical synthesis of the identified compounds for a direct comparison of their biological effects.The classic Edman degradation process involves reacting the N-terminal amino acid with phenylisothiocyanate,cleaving it with trifluoroacetic acid,and converting the resulting thiazolinone in aqueous acid to a stable phenylthiohydantoin (PTH) amino acid which is identified.The process is repeated for the next amino acid and the PTH derivatives,that have a strong UV absorption,are analyzed by HPLC.Edman degradation gives the amino acid sequence of a protein or peptide but cannot distinguish between D and L-forms unless a chiral column is used for the HPLC analysis [16].

The Marfey method is more specifically designed to identify the amino acid chirality.Its first step is to completely degrade the peptide in 6N HCl media.The second step consists in reacting the hydrolysate with Nα-(2,4-dinitro-5-fluorophenyl)-L-alaninamide (L-FDAA) or with its enantiomer D-FDAA to form UV absorbing diastereoisomers that can be separated by a classical C18 column and the peak position are compared with those of known FDAA derivatives of amino acid standards [17].However,the strong acid hydrolysis step can partially racemize each chiral AA,and convert asparagine and glutamine to aspartic acid and glutamic acid,respectively,among other undesired effects.

Synthesizing amino acids by classical organic chemistry produces racemic mixtures,however pure D-AAs can be obtained by several different means.Feeding living organisms with a racemic amino acid mixture will leave a residue of D-AAs since only the L-forms are consumed by the organisms.Once the amino acid sequence of a natural product is established,this sequence is reproduced using L-AAs and comparing the properties of the synthesized product with those of the natural one.The D-Ala in Dermorphin{129} was identified by this method.Dermorphin has an analgesic potency two to three orders of magnitude higher than morphine.The synthetic all-L dermorphin lacked any analgesic activity [6].

3.3 Separation methods

Separation methods are mainly chromatographic methods that separate compounds owing to their different affinities toward a stationary phase when they are carried by a mobile phase.Preparative liquid chromatography on low pressure columns usually is first used to isolate and purify the natural compounds.Thin layer chromatography (TLC),gas chromatography (GC),and high performance liquid chromatography (HPLC) were used to characterize the natural products or their amino acid constituents.The oldest approach to separate enantiomers is to derivatize them with an enantiomerically pure reagent prior to the chromatographic analysis.This is the principle of the Marfey method [17].Classical HPLC,TLC,or GC for volatile derivatized compounds are able to separate the diastereoisomers obtained [9,18].In modern chiral studies,the derivatization approach has been supplanted by the use of chiral stationary phases(CSPs) which directly separate native molecules,hence avoiding several sample preparation steps.

Since enantiomers have identical properties in isotropic media,a chiral selector is needed to introduce some anisotropy and to allow for enantiomer differentiation.The chiral selector can make two transient diastereoisomers with both enantiomers.Once separated by the chromatographic process,the two enantiomers can be recovered independently.In chromatography,the chiral selector could be added to the mobile phase or attached to the stationary phase.Since chiral mobile phases need a continuous supply of expensive and pure chiral selector that must not disturb the detection method,CSPs are,by far,the method used in chromatography.Now,specific CSPs can routinely separate amino acid enantiomers that can be detected by MS or a variety of other detectors[19].CSPs are more expensive chromatographic columns than classical columns.However,since the chiral selector is attached to the stationary phase,when used properly,CSP-containing columns can be used for hundreds or even thousands of enantiomer separations making the cost per separation insignificant [9].The functionalizedcarbohydrate CSPs based on bonded derivatized cellulose or amylose sugars cannot separate underivatized AAs or small peptides [19].The macrocyclic glycopeptides based CSPs make chiral columns that are especially efficient in separating both native and N-blocked amino acids [20].Recently introduced superficially porous particles bonded with chiral selectors allowed to obtain enantiomer separation in a very short time using minute amount of mobile phases [8,9,18,19].Associating achiral and chiral columns in 2D-HPLC greatly extended the capacity of the chromatographic methods allowing detecting trace amounts of chiral compounds [9,18].

Peptide epimers that differ in the chirality of a single amino acid within the peptide chain are not enantiomers.They can be separated by achiral columns as demonstrated by the recent analysis of β-amyloids implicated in Alzheimer’s disease [20].However,CSPs [21] and especially macrocyclic glycopeptide bonded CSPs are particularly efficient in separating a wide variety of peptide epimers with minimum sample preparation [22,23].

Capillary electrophoresis (CE) is a "micro-separation"method.However CE is not a chromatographic method and it separates compounds by their size to charge ratios using an electrical field created applying a high voltage in a capillary tube filled by the appropriate electrolyte.The CE technique requires that a chiral mobile-phase additive is dissolved in the running buffer,in order to separate any enantiomers,and especially D and L-AAs [9,19].CE provides high separation efficiency in relatively short electromigration times.The drawbacks are a mediocre reproducibility and sensitivity,and absence of any preparative capability.

4 Origin/biosynthesis of D-amino acids in natural products

Feeding cultures ofPenicilliumchrysogeniumwith14C marked D or L-Valine,Arnstein and Margreiter obtained penicillin {28} containing14C D-Valine only when the mycelium was fed with L-Valine [24].They demonstrated that only L-AAs were processed by the mycelium and that the antibiotic synthesis necessarily involved racemase or epimerase enzymes.The standard genetic code contains six different codons of three DNA bases for L-Arg,L-Leu,and L-Ser,four codons of three bases for L-Ala,L-Gly,L-Pro,L-Thr,and L-Val,three codons,all starting with AU: AUA,AUC,and AUG for L-Ile,two codons for L-Asn,L-Asp,L-Cys,L-Gln,L-Glu,L-His,L-Lys,L-Phe,and L-Tyr,with the single codon,AUG,encoding L-Met,and UGG encoding L-Trp.This genetic code is almost universal for all known living organisms animal or vegetal.However,so far,there is no identified codon for D-AAs.Hence,the synthesis of D-containing peptides or proteins cannot come directly from the translation of DNA nucleotide sequences in ribosomal peptide synthesis.Non-ribosomal peptide synthesis (NRPS) is the main biosynthetic approach that produces the unique structural features observed in natural products containing D-AAs [25].

Nonribosomally assembled peptides can be altered by subsequent chemical peptide synthesis or enzyme catalysis in a connected ballet between multi-domain NRPS proteins and polyketide synthases.NRPS enzymes are the largest known enzymes with molecular weights passing 2.3 MDa or more than 21,000 residues [26].The NRPS structure was found to be made by successive modules,each containing a condensation domain,C,an adenylation domain,A,and thiolation domain,T.As an example,the production of valinomycin {43} by the bacteriaStreptomycestsusimaensiswas fully described [27].Four NRPS modules were identified to incorporate successively D-α-hydroxy isovaleric acid (Hiv),L-valine,L-lactic acid,and another L-valine.An epimerase domain,E,and an iterative terminal domain,TE,were also identified (Fig.1).The E domain converts the first L-valine into D-valine,and the TE domain cleaves the tetrapeptide and catalyzes a head-to-tail cyclization to give valinomycin{43} after coupling three tetrapeptides.Advanced modern bioinformatics deciphered theStreptomycestsusimaensisgene responsible for the NRPS valinomycin {43}production.This gene of 39,266 base pairs (bp) contains eighteen open reading frames (ORN) producing different proteins.From these 18 ORNs,the valinomycin (vlm)cluster was identified as ORN16 and ORN17.ORN16 was called vlm1 of 10,286 bps between bp 19,526 and bp 29,812,and ORN17 was called vlm2 of 7967 bps between bp 29,835 and 37,802 [27].Both ORNs contained two modules making the four modules responsible for the valinomycin {43} production (Fig.1).This example shows that racemase,epimerase or isomerase enzymes encoded in the NRPS processes are responsible for the occurrence of D-AAs in natural products as obtained from their L-counterparts.

Fig. 1 Simplified scheme of the valinomycin {43} biosynthesis by the NRPS subprotein of the bacteria Streptomyces tsusimaensis that was dissected into four modules containing a total number of fourteen domains (colored circles): C condensation,A adenylation,T thiolation,E epimerization,and TE C-terminal iterative domain.D-Hiv D-α-hydroxy isovaleric acid,L-Lac L-lactic acid.Adapted from [27]

The NRPS route is not the only natural process able to produce non-proteogenic and D-AA containing compounds.In the ribosomally-synthesized and posttranslationally modified peptide (RiPP) synthesis route,organisms can synthesize compounds classically: normal codons for L-AAs are present in the mRNA at the position where the D-residue is found in the studied peptide.The L-AA is processed by a post-translational modification involving peptidyl-amino acyl L-D isomerization [8,10].Several mechanisms are possible.Simple epimerases just change a particular L-amino acid into its D-form using a deprotonation-protonation mechanism.The chiral AA hydrogen of a particular AA is removed to form an intermediate flat carbanion.The proton is reintroduced to the opposite side changing the chirality of the AA [28].Hydroxylases or methylases are other enzymes that post-translationally modify L-AAs creating nonproteogenic AAs by converting the initial L-AA form into a D-form with hydroxyl or methyl groups added,respectively [29].Also,specific enzymes are able to perform posttranslational conversion of a L-AA into another D-AA.For example,a lantibiotic synthetase found inLactococcuslactisis able to change L-serine into D-alanine removing the hydroxyl group and changing the chirality in the production of the powerful lantibiotic 3147{21} [30].Replacing D-alanine by L-alanine in lantibiotic 3147 {21} reduced its bioactivity by 94% demonstrating that the type of AA and its chirality are both critical [30].

5 Occurence of D-amino acids in natural compounds of various origins

5.1 Prokaryotes

Prokaryotes are single-cell organisms lacking a nucleus.The domain of bacteria is of particular interest.They are also the first and most prevalent source of D-AA containing natural products.D-AAs were found in antibiotics,biosurfactants,toxins and siderophores produced by bacteria.All of these compounds facilitate bacterial life and survival.Antibiotics and toxins can limit or help eliminate competing bacteria or predator organisms.Biosurfactants facilitate bacterial spread in the medium or host.Iron is a critical element for bacterial development.Siderophores are very powerful iron-complexing compounds that selectively capture iron,releasing it specifically to the siderophore-producing strain.Table 2 lists D-AA containing compounds produced by prokaryotes,mainly bacteria [31-97].The antibiotics are numbered and listed alphabetically with the bracketed number also used within this text after the compound name to facilitate reading.Compound names and numbers in the tables are followed by the name of the associated prokaryote,the structural formula,and molecular weight.The number of amino acids is given along with the number and proportion of D-AAs in each peptide.Also,the type of D-AAs and non-proteinogenic AAs are given in Table 2.The non-proteinogenic amino acids are listed using the Table 1 codes and adding the D-prefix if the D-form was found (Table 2).

The syringopeptin antibiotics {36},{37} and {38} from the bacteriaPseudomonassyringaeare among the larger peptides listed,containing up to 25 amino acids with up to 16 in the D-form [66-68].Jessenipeptin {20} fromPseudomonassp.QS1027 is another large antibiotic with 19 AAs,13 of which are the D-antipodes (Fig.2) [50].The first and best-known penicillin {28} antibiotic fromPenicilliumchysogenumand the siderophore chrysobactin{47} fromErwiniachrysanthemiare made up of only two AAs,one being in its D-form: respectively D-glutamine[58] and D-lysine [76] (Fig.2).There is no direct or obvious relationship between the total number of amino acids in a compound and the number of which are of the D-configuration although larger peptides could accommodate more D-AAs (Table 2).The phytotoxin fuscopeptin A {60} fromPseudomonasfuscoνaginaeconsists of 19 AAs with 74% or 14 in the D-form (Fig.2) [89].The small cytotoxic burdolhac A {57} fromBurkholderiathilandensishas a similar D-proportion with only four amino acids,three of which are the D-configuration [86].On the other hand,the antibiotics daptomycin {11} fromSterptomycesroseosporusor mattacin {24} fromPaenibacilluskobensiscontain respectively 14 and 10 total AAs but only respectively three (21%) [36] and one (10%) [48] are in the D-form.

The small Staphylopine {55} produced by the MRSAStaphylococcusaureusis a metallophore that is extremely efficient in complexing zinc.This property explains its high pathogenicity since the host defense,called nutritional immunity,consists in restricting zinc availability critical to bacterial development [84].Staphylopine {55}contains a D-histidine unit that acts as the metal chelating group.

Most of the listed antibiotics were synthesized by bacteria using a NRPS process.Bottromycins {7} are powerful heptapeptides produced bySpectromycesbottropensisthat were extensively studied.It was found that they were produced following a RiPP pathway with a ribosomally produced core peptide that is post-translationally modified by tailoring enzymes [36].Lacticin 3147A2 {21} is a larger antibiotic with 29 amino acids and only 2 D-AAs.It is synthesized byLactococcuslactisalso via a RiPP process [44].Lacticin 3147A2 {21} is a lantibiotic compound,meaning that it contains the pseudo-amino acid lanthionine which is enzymatically created by connecting two alanines,or aminobutyric acid and alanine via a sulfur atom.Lacticin A2 {21} synergistically works with its other RiPP produced lacticitin 3142A1 compound inducing pore formation in bacterial walls [29,30,51].Lantibiotics are much more powerful than antibiotics and offer hope in overcoming antibiotic resistantStaphilococcus aureusand other lethal bacteria [51].

5.2 Photosynthetic microorganisms (prokaryote cyanobacteria and eukariote algae)

Prokaryotic cyanobacteria and eukaryotic algae contain pigments that allow them to perform photosynthesis producing oxygen associating them to the plant kingdom.Table 3 lists a selection of D-amino containing compounds produced by cyanobacteria or algae [98-107].These relatively small compounds were sought for their biological activity as indicated in this table.Amphibactin{70} from the protobacterumVibriosp.R-10 is an amphiphilic siderophore that collects the rare iron ions present in the marine environment [98].All the other listed marine natural products that contain D-AAs are cytotoxins (Table 3).Microcyctin {78} produced by the algaeNostocsp.152is a particularly virulent hepatotoxin [104].

5.3 Fungi

Fungi are eukaryotic multi-cellular organisms that are unable to directly synthesize essential nutrients.They are heterotrophic living entities that depend on other organisms for their sustenance.Fungi-produced peptides and depsipeptides that have been found to contain D-AAs are listed in Table 4 [108-119].They have various biological effects,most helping the fungus to survive.The cytotoxic fungal compounds were found to be pharmacological interesting,having anticancer,antimalarial,or antibiotic properties (Table 4).Malformins {90-92},a plant toxin produced by the fungusAspergillusniger,was extensively studied,presenting numerous variations [115-118].Studying the activity of artificially synthesized malmorfin compounds,it was demonstrated that the D-Cys and D-Leu amino acids found in the structure were critical for their biological activity (Fig.3) [118].D-His is rarely encountered in natural products.However,the ergot fungusVerticiliumkibienseproduces a polypeptide associating five L-Arg-D-His dipeptides [119].This decapeptide was the sole natural compound reported to date in the literature that contained the rare D-histidine moiety.

Fig. 2 Structure of D-amino acid containing natural compounds produced by bacteria.The bracketed numbers refer to the Table 2 number codes along with additional information

5.4 Multicellular organisms

Table 5 lists the D-AA containing natural compounds found in both invertebrate and vertebrate animals [101,102,120-150]

5.4.1.Inνertebrates: Most of the invertebrates that were found to produce D-AA containing compounds were marine mollusks,sponges and snails.The peptide and most often depsipeptide compounds found in invertebrates were relatively small with less than 15 amino acids and molecular weights lower than 1800 Da and were produced by NRPS processes.Kahalalides,depsipeptides from the herbivorousElysiamarine mollusk family and their algal dietBryopsispennatawere extensively studied and presented wide structural variations and promising pharmacological properties.Kahalalide D {98} and V {104} are small tripeptides each containing a D-Trp unit (Table 5).Kahalalide R {103} is a tridecapeptide containing five D-AAs.Kahalalides are cytotoxic and active against cancer cells or AIDS opportunistic infectious bacteria [101,102,120,121].All reported kahalalides were produced by NRPS processes [101].

An interesting exception in the making of invertebrate natural compounds is the largest compound in Table 5;polytheonamide A {125} with 48 amino acids wherein 18 are in the D-form (37.5%) with a total mass of 5029 Da[143].Polytheonamide A {125} is produced by the spongeTheonellaswinhoei.It was demonstrated that polytheonamide {125} is a RiPP produced compound: it was initially synthesized by the normal ribosome pathway with all proteinogenic L-AAs and then modified through numerous epimerisations.It contains 18 different D-AAs(Table 5).Also,specific N-methyltransferase reactions introduced several non-proteogenic D-amino acids [29,151].A single epimerase,named PoyD,generated all D-AA residues acting by protonation-deprotonation of the α-hydrogen of the L-AA [29].This post-translationally modified compound has cytotoxic capabilities by creating membrane channels acting on cation circulation[143,151].

5.4.2.Vertebrates: Since the discovery of D-alanine in dermorphin {129} produced by the frogPhyllomedusa bicolor[6] (Fig.4),a few more natural compounds containing D-AAs were reported in vertebrates as listed in Table 5.Defensin DLP-2 {132} was found in the venom of the mammalOrnithorynchusanatinus,the platypus of Australia.Out of the 42 AAs found in this defensive venomous natural product only one had the D-configuration (Table 5 and Fig.4).It is interesting to note that the synthesis of the all-L version of the venom has the same biological activity as the natural D-methionine containing,compounds.However,the natural and synthetic isomeric peptides showed significant differences in their liquid chromatography reversed-phase retention times[152].An isomerase,glycoprotein of 52 kDa was found in frog skin secretion [151].It could convert the L-form of the second amino acid residue into its D-form suggesting a RiPP synthesis of the observed D-AA containing compound {129} found in the frog [152].

Fig. 3 Structure of Malformin A2 {90},B {91} and C {92} produced by the fungus Aspergillus niger showing the constant position of the three D-amino acids.The bracketed numbers refer to the Table 4 number codes along with additional information

6 Outlook/overview of D-amino acid containing natural products

The selection of 132 D-containing natural products presented in this work consisted of 69 compounds found in eukaryotic bacteria (Table 2),13 compounds found in marine eukaryotic cyanobacteria and prokaryotic algae(Table 3),12 compounds found in fungi (Table 4),and 38 compounds found in invertebrate and vertebrate animals(Table 5).These 132 compounds contained a cumulated number of 1166 amino acids of which 442 had the D-configuration (37.9%).

Figure 5 presents the occurrence of D-AAs found in the analyzed set of 132 compounds.The non-proteinogenic amino acids are shown with light yellow bars.329 proteinogenic D-AAs (green bars) covering the whole set of the 19 amino acids having a stereogenic carbons were counted.The two most encountered D-proteinogenic amino acids were D-Ala and D-Val closely followed by D-Leu and D-Ser.D-allo-threonine is next being the most encountered D-non-proteogenic amino acid of the 113 found (Tables 2,3,4,5).It is pointed out that threonine has two asymmetric centers,hence four enantiomeric forms.L-and D-threonine have respectively the (S-R) and (R-S) configurations and L-and D-allo-threonine are the (S-S) and (R-R) enantiomers.A similar stereochemistry is found with isoleucine.All the proteinogenic L-AAs are of the S-configuration except for cysteine (and selenocysteine) in which the sulfur (or selenium) heavier atom alters the sequence making it of the R-configuration according to the Cahn-Ingold-Prelog rules [1,2].

Sixteen D-non-proteinogenic AAs were not shown in Fig.2 because they were encountered only in one compound.These D-non-proteinogenic AAs included D-citrulline found in the siderophore azobactin {46} (Table 2)or D-kynurenine found in discodermin E {113} (Table 5).Likely because there are a large variety of biologically modified AAs (Table 1),the number of encountered non-proteinogenic D-AAs (light yellow bars in Fig.5)is significantly lower than that of proteinogenic D-AAs(green bars).Including all found D-AAs,proteinogenic and non-proteinogenic,there is a trend that larger peptides with more AAs also contain more D-AAs.Figure 6 plots the number of D-AAs found in natural peptides presented in Tables 2,3,4,5 versus the total number of AAs in the compound to see if there is a trend.70% of the natural compounds listed in Tables 2,3,4,5 are small peptides containing a maximum of 10 AAs.The linear regression obtained for the relation [n D-AAs] versus[total AAs] is:numberofD-AA=0.42 ×totalAAnumber(Fig.6 green dotted line).The 0.42 slope means that,on average,in D-AA containing natural products,two D-AAs are encountered for each five AAs (Fig.6).This linear trend has a loose 0.73 regression coefficient with the 30% larger compounds having more weight than the 70% smaller ones.However,an outlier with 48 AAs,polytheonamide {125} (Table 5),is predicted to contain 19 D-AAs.It contains only 18,a close value.On the other hand,cyclosporine A {86} is an undecapeptide,it should contain two or even three D-AAs.Only one was found(Table 4).Considering polytheonamide A {125} as an outlier and excluding it,a quadratic regression provides a slightly better fit of the Fig.3 points (purple dashed line and regression equation in Fig.6).The regression coefficient is similar.The quadratic equation predicts that a 20 AA peptide should contain 11 D-amino-acids when the linear fit predicts only 8.Jessineptin {20} (Table 2)contains 19 AAs in which 13 are D-AAs.The quadratic fit is better than the linear one for large peptides.Neither fit predicts well the number of D-AAs in small peptides.However,a trend is perceptible: peptides with more than 10 AAs will contain several D-AAs.

Fig. 4 Top: the 7 amino acid sequence of dermorphin {129},the first natural compound containing a D-amino acid found in vertebrates.Bottom: the 42 amino acid sequence of the platypus defensin {132}all L,except D-methionine in position 2.The calculated ribbon quaternary structure of the large peptide shows the disulfide connectivities in yellow and secondary-structural element attractions in blue [150].The bracketed numbers refer to the Table 5 number codes along with additional information

7 Conclusions

The presence of unusual D-amino acids in natural peptide products was first identified by natural product chemists and biochemists.For years,they were the only researchers discovering and identifying an increasing number of D-AA containing natural products.The protocol was to purify a new natural peptide product,then analyze its AA sequence to obtain its formula and then synthesize the all L-AA containing peptide for biological activity.This step was often the one detecting AA chirality when the all L-AA synthesized compound was inactive (e.g.dermorphin {129}) implying the presence of D-AAs.

Fig. 5 Occurrence of the D-amino acids founds in a set of 128 peptidic compounds.Total number of D-amino acids=442.Green bars:proteinogenic amino acids;orange bars: non-proteinogenic amino acids.See Table 1 for AA codes and Tables 2,3,4,5 for full data

Fig. 6 Comparing the number of D-amino acids found in a natural product with the total number of AA in the peptide.Several points correspond to more than one compound.Linear fit in green,quadratic fit excluding polytheonamide A 125 in purple.See text and Tables 2,3,4,5 for full data

The occurrence of D-AAs in natural products having significant biological activities was demonstrated in a wide variety of cases involving monocellular organisms,bacteria,fungi or algae.For a long period of time,D-AAs were not sought in vertebrates that were unquestionably regarded as made exclusively of L-AAs.That may explain why the D-AA occurrence was not reported in organized living forms until recently.The all L-AA dogma was destroyed when D-AAs were found in compounds secreted by invertebrates,vertebrate amphibians and even in one particular mammal: the Australian platypus.Since the presence of D-AAs in more and more natural and biological compounds is detected as the technologies improve,an increasing part of the scientific community has become aware of the need to know the chiral status of AAs in natural products as well as in proteomic studies and obviously in peptide-based pharmaceutical compounds.If D-AAs are sought after,they may be found in many more compounds that what is now known[11].A powerful method to detect D-AAs in peptide was recently proposed [153].

The function of the known D-containing natural compounds generated by living organisms is to favor the survival and development of the producing entity either by impairing the growth of competing organisms (antibiotics,cytotoxics,siderophores),or by facilitating the entity’s expansion (surfactants).These natural products were sought for their bioactivity that could have important applications in medicine,pharmacy and agriculture.It was also demonstrated that free D-AAs were occasionally used by complex eco-systems [152].

Since the standard genetic code encodes only the 19 proteinogenic L-AAs,natural compounds containing D-AAs are mostly produced by NRPS routes.To date,there is no evidence that ribosomes can directly incorporate D-AAs in a peptide chain [152].However,RiPP posttranlational modifications of ribosomic all L-AA peptides into D-containing epimeric forms are known and may not be as rare as once thought.With the massive progress made in bioinformatic scanning of the genetic code for a vastly increasing number of living animals and plants,the biosynthetic gene cluster code needed to produce a given peptide or even protein can be calculated and searched within databases in a short time [150].The software antiSMASH searching for antibiotics and secondary metabolite biosynthetic gene clusters is freely available at https://antis mash.secon darym etabo lites.org/ [154].

In most cases,the biological activity of the D-containing natural products is completely different of that of the analogous all-L containing stereoisomer.This greatly helps in locating the D-AA(s) in the peptide chain using organic synthesis followed by biological tests.However,there are a few cases where little or no difference in the biological activity was observed between the D-and L-containing analogous natural products.Since these cases may render the D-forms unnoticed,they may be more common than have been reported [10].The interesting questions that arise for these rare cases are: why does Nature incorporate such D-amino acids? and does resistance to protease degradation play a role in such cases [155]?

Author Contribution

Daniel W.Armstrong: Conceptualization;Formal analysis;Methodology;Project administration;Visualization;Writing -original draft;Writing -review & editing.Alan Berthod: Investigation;Formal analysis;Writing -original draft;Writing-review & editing.All authors read and approved the final manuscript.

Funding

Robert A.Welch Foundation (Y-0026)

Data availability

Since this is a review article,there is no new experimentl data that was generated by the authors.

Declarations

Competing interests

The authors declare that they do not have any competing interests.

Author details

1Department of Chemistry and Biochemistry,University of Texas at Arlington,Arlington,TX 76019,USA.2Institut des Sciences Analytiques,CNRS,University of Lyon 1,69100 Villeurbanne,France.

Received:22 August 2023

Accepted:19 October 2023