Evolution of Central Metabolic Pathways: The Playground of Non-Orthologous Gene Displacement

Eugene V Koonin; Michael Y Galperin

Chapter 7Evolution of Central Metabolic Pathways: The Playground of Non-Orthologous Gene Displacement

One of the central goals of functional genomics is the complete reconstruction of the metabolic pathways of the organisms, for which genome sequences have been obtained. As discussed in Chapter 1, there is no chance that all necessary biochemical experiments are ever done in any substantial number of organisms. Therefore, reconstructions made through comparative genomics, combined with the knowledge derived from experiments on model systems, are the only realistic path to a satisfactory understanding of the biochemical diversity of life and to the characterization of poorly studied and hard-to-grow organisms (including extremely important ones, e.g. the syphilis spirochete T. pallidum [243,887]).

In the pre-genomic era, metabolic reconstruction might have seemed to be a relatively easy task, given the overall similarity of the key metabolic enzymes in several model organisms, such as E. coli, B. subtilis, yeast, plants, and animals. Although cases of non-orthologous (unrelated or distantly related) enzymes catalyzing the same reaction, such as the two distinct forms of fructose-1,6-bisphosphate aldolases, phosphoglycerate mutases, and superoxide dismutases, have been known for a long time, these cases were generally perceived as rare and, more or less, inconsequential [187,258,271,549]. The availability of complete genomes is gradually changing this perception, making us realize just how common these cases of analogous (as opposed to homologous) enzymes are in nature (see 2.2.5). The phenomenon of non-orthologous gene displacement turned out to be a major complication (but also a major source of unexpected findings) for the analysis of metabolic pathways, making it particularly hard to automate. Indeed, whenever an ortholog of a given metabolic enzyme from the model organisms is not detected in the organism of interest (the initial step of metabolic reconstruction, the identification of orthologs of known enzymes, can be automated almost completely), the process turns into “detective work”. The researcher needs to identify a set of gene products that, on the basis of their predicted biochemical activities, potentially could catalyze the reaction in question. Often, there is more than one such candidate, and the choice between these might not be possible without direct experiments. Furthermore, there is always a chance that, however plausible, all candidates detected in such searches are false, whereas the true culprit is a complete unknown. This makes metabolic reconstruction in the era of comparative genomics a less precise but much more exciting undertaking.

In this chapter, we show how a COG-based reconstruction of bacterial and archaeal metabolism helps organizing the existing data on microbial biochemistry, illuminates the remaining questions, suggests candidates for some of the “missing” enzymatic activities, and predicts the existence of novel enzymes that remain to be discovered. For each metabolic reaction, we list the COGs that are known to catalyze it or can be reasonably predicted to do so. We then compare the phyletic patterns of the corresponding COGs to see if the current set of COGs is sufficient to suggest candidate proteins to catalyze the given reaction in each organism with sequenced genome or still unexplained gaps remain in metabolic pathways.

7.1. Carbohydrate Metabolism

7.1.1. Glycolysis

We have already used the COG approach to demonstrate the complementarity of the phyletic patterns of the three forms of phosphoglycerate mutase (see 2.2.6). Figure 7.1 shows the COGs that are known or predicted to include glycolytic enzymes and shows their phyletic patterns. This superposition of COGs and metabolic pathways provides a convenient framework for a detailed analysis of the phylogenetic distribution of each of the glycolytic enzymes and the general principles of evolution of carbohydrate metabolism. This figure shows, for example, that R. prowazekii, an obligate intracellular parasite and a relative of the mitochondria [30], does not encode a single glycolytic enzyme. In contrast, all other organisms with completely sequenced genomes encode enzymes of the lower (tri-carbon) part of the pathway. This supports the notion that glycolysis is the central pathway of carbohydrate metabolism and makes comparative analysis of variants of this pathway all the more interesting.

Glucokinase (EC 2.7.1.2 )

Fermentation of glucose starts with its phosphorylation, which is catalyzed by glucokinase. Although many bacteria bypass the glucokinase step by phosphorylating glucose concomitantly with its uptake by the PEP-dependent phosphotransferase system, some of them, including E. coli, encode a glucokinase (COG0837) that shares little sequence similarity with yeast and human enzymes. There is also another bacterial form, found in S. coelicolor, Bacillus megaterium, and other bacteria [32,795].

Recently, P. furiosus has been reported to encode an ADP-dependent glucokinase [435]. This enzyme has no detectable sequence similarity to any other glucokinase but shows significant structural similarity to enzymes of the ribokinase family [383]. In retrospect, several conserved motifs were detected in this new glucokinase and the ribokinase family proteins, which indicates a homologous relationship. Thus, a clear-cut case of non-orthologous gene displacement is observed: a ribokinase family enzyme has been recruited to replace the typical glucokinase. So far, the ADP-dependent glucokinase has been found only in M. jannaschii and in pyrococci. The existence of at least three distinct forms of glucokinase is remarkable, especially given that this is apparently not an essential component of glycolysis. Moving down the glycolytic pathway, we find similar examples of non-orthologous gene displacement for several other, essential enzymes.

Glucose-6-phosphate isomerase (EC 5.3.1.9)

Bacteria and eukaryotes encode several distinct but homologous forms of glucose-6-phosphate isomerase (phosphoglucomutase) [624]. The classical (E. coli) form of the enzyme is found in Gram-negative bacteria and in the cytoplasm of the eukaryotic cell. A divergent version of this enzyme is found in Gram-positive bacteria including B. subtilis, in T. maritima, and some archaea, such as M. jannaschii and Halobacterium sp. [466,761]. The most divergent members of this family of glucose-6-phosphate isomerases were detected in A. aeolicus and another subset of archaea, including M. thermoautotrophicum, A. pernix, and Thermoplasma spp. No enzyme of this family seems to be encoded in the genomes of A. fulgidus or pyrococci. Instead, P. furiosus has been shown to encode a novel glucose-6-phosphate isomerase, which has highly conserved orthologs in P. horikoshii and in A. fulgidus, but so far not in any other organism [331]. Thus, two non-orthologous (in fact, apparently unrelated) versions of this enzyme together account for the phosphoglucomutase activity in all known microbial genomes, with the exception of R. prowazekii and U. urealyticum. As indicated above, the former does not encode any glycolytic enzymes, whereas the latter apparently obtains fructose-6-phosphate by importing fructose concomitantly with its phosphorylation through the fructose-specific phosphotransferase system, thus bypassing the phosphoglucomutase stage.

Phosphofructokinase (EC 2.7.1.11)

The next glycolytic enzyme, phosphofructokinase, offers an even more interesting example of non-orthologous gene displacement. It is also an example of an enzyme where several “missing” enzyme forms have been discovered just in the past year.

The most common version of this enzyme, PfkA, is an ATP-dependent kinase of unique structure found in bacteria and many eukaryotes. Plants have a homologous enzyme, which, however, uses pyrophosphate as the phosphate donor. Altogether, homologs of PfkA are found in nearly all bacteria and eukaryotes but are conspicuously missing in H. pylori and in all archaeal genomes sequenced so far. In addition, E. coli encodes a second phosphofructokinase, PfkB, which is unrelated to PfkA and instead belongs to the ribokinase family of carbohydrate kinases.

A unique ADP-dependent phosphofructokinase has been described in P. furiosus [853]. However, this enzyme appears to have a limited phyletic distribution: so far, it was found only in M. jannaschii and in pyrococci. This left the phosphofructokinase activity in other archaea unaccounted for and suggested that additional forms of this enzyme might exist. Very recently, a new ATP-dependent phosphofructokinase, which is a member of the ribokinase family but is not specifically related to PfkB, has been identified in A. pernix [712]. Close homologs of this protein (APE0012) were found in Halobacterium sp., A. fulgidus, M. thermoautotrophicum, and several other archaea. Therefore, it seems likely that these ribokinase family enzymes function as phosphofructokinases in all these archaea. Finally, Thermoplasma does not encode orthologs of any of the four forms of phosphofructokinase described above. This leaves two possibilities: either thermoplasmas lack phosphofructokinase altogether (along with fructose-1,6-bisphosphate aldolase; see below), or they might have yet another, fifth variant of this enzyme.

Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13)

For more than 50 years now, it has been known that fructose-1,6-bisphosphate aldolase exists in two distinct forms, a metal-independent one (class I) in multicellular eukaryotes and a metal-dependent one (class II) in bacteria and yeast [187,549,881]. Certain organisms, such as Euglena, seem to have enzymes of both classes. Although these two enzyme forms have similar structures, they do not share any detectable sequence similarity [257].

Sequence analysis of archaeal genomes and those of chlamydia showed that they encode neither a typical class I enzyme, nor a typical class II enzyme. Instead, chlamydia and all archaea, with the exception of thermoplasmas, encode orthologs of the recently described class I aldolase DhnA (FbaB) of E. coli, which is only distantly related to the regular class I enzymes and may be considered a third class of aldolases. Recently, fructose-1,6-bisphosphate aldolase activity was demonstrated in the P. furiosus homolog of DhnA; this enzyme has been referred to as a class IA aldolase [770]. The phyletic patterns of the bacterial-type class II aldolase (COG0191) and the DhnA-type aldolase (COG1830) are almost complementary, except that both types of aldolases are present in E. coli and A. aeolicus, and none of them is detectable in X. fastidiosa (Figure 7.1). X. fastidiosa, a plant pathogen, encodes a eukaryotic class I aldolase, which is specifically similar to the plant class I aldolase and probably has been acquired from the plant host via HGT. However, typical eukaryotic (class I) fructose-1,6-bisphosphate aldolase is also encoded in several other bacteria, in which cases the underlying evolutionary scenario is less clear.

Figure 7.1

Distribution of glycolysis (Embden-Meyerhoff-Parnas pathway) enzymes in organisms with completely sequenced genomes. Each rounded rectangle shows a glycolytic enzyme, denoted by its gene name and the COG number. Alternative enzymes catalyzing the same (more...)

Although most genomes encode only one type of fructose-1,6-bisphosphate aldolase, different forms of this enzyme do coexist in several organisms. In particular, the relatively large genome of the plant symbiont M. loti encodes fructose-1,6-bisphosphate aldolases of all three classes.

The nature of the aldolase, if any, in thermoplasmas remains unclear. The apparent absence in these archaea of both phosphofructokinase and fructose-1,6-bisphosphate aldolase might indicate that these organisms split hexoses into trioses exclusively via the Entner-Doudoroff pathway (see below). Indeed, thermoplasmas encode close homologs of the recently described fructose-6-phosphate aldolase [758].

Finally, given that chlamydiae are important human pathogens and that the unusual class IA fructose-1,6-bisphosphate aldolase is the only aldolase encoded in their genomes, this presumably essential enzyme might be a promising target for anti-chlamydial drug therapy (see 7.6) [257,266].

Triose phosphate isomerase (EC 5.3.1.1)

Triose phosphate isomerase is conserved in all organisms, with the exception of Rickettsia. Bacterial-eukaryotic and archaeal isomerases form two clearly separated clusters [239]. This gave rise to the notion that eukaryotic triose phosphate isomerases originated from the promitochondrial endosymbiont whose genes have been transferred into the nucleus of the eukaryotic host [432].

Glyceraldehyde-3-phosphate dehydrogenase (EC 1.2.1.12)

Like triosephosphate isomerases, archaeal glyceraldehyde-3-phosphate dehydrogenases are homologous to those from bacteria and eukaryotes but form a well-defined cluster, suggesting the mitochondrial origin of this enzyme in eukaryotes. In pyrococci and, probably, in several other archaea, the main glycolytic flow goes through a different enzyme, glyceraldehyde-3-phosphate:ferredoxin oxidoreductase, whereas glyceraldehyde-3-phosphate dehydrogenase appears to be confined to gluconeogenesis [584,867].

In U. urealyticum, the typical NADH-dependent glyceraldeldehyde-3-phosphate dehydrogenase is missing, and this reaction is apparently catalyzed by a non-phosphorylating, NADP-dependent enzyme, similar to the well-characterized enzymes from plants and Streptococcus mutans [326,544]. These enzymes belong to a large superfamily of NADP-dependent aldehyde dehydrogenases and are unrelated to the phosphorylating glyceraldeldehyde-3-phosphate dehydrogenase [568]. Remarkably, an archaeal member of the non-phosphorylating glyceraldeldehyde-3-phosphate dehydrogenase family uses NAD instead of NADP [124].

Phosphoglycerate kinase (EC 2.7.2.3)

Like triose phosphate isomerases and glyceraldehyde-3-phosphate dehydrogenases, phosphoglycerate kinase is conserved in all organisms that have glycolysis, and the sequences from bacteria and eukaryotes are closer to each other than they are to their archaeal counterparts, suggesting the mitochondrial origin of the eukaryotic enzyme.

Phosphoglycerate mutase (EC 5.4.2.1)

The diversity of phosphoglycerate mutases was discussed earlier (see 2.2.6). We would only like to reiterate that there are two unrelated forms of this enzyme, 2,3-bisphosphoglycerate-dependent (animal-type) and 2,3-bisphosphoglycerate-independent (plant-type), either one of which (or both) can be found in various bacteria [138]. Although E. coli pgm mutants devoid of its principal (cofactor-dependent) form of phosphoglycerate mutase clearly exhibit a mutant phenotype, a recent study of the second (cofactor-independent) form of this enzyme showed that it accounts for as much as 10% of the total phosphoglycerate mutase activity in E. coli [244].

Remarkably, neither form of phosphoglycerate mutase is encoded in any archaeal genome available to date, with the sole exception of Halobacterium spp., which has a typical cofactor-independent enzyme, similar to the one in B. subtilis. Sequence analysis of archaeal genomes showed that they encode enzymes of the alkaline phosphatase superfamily that are distantly related to the cofactor-independent phosphoglycerate mutase and contain all the principal active-site residues [258]. These enzymes were predicted to have a phosphoglycerate mutase activity [258]. This prediction was supported by the structural analysis of the cofactor-independent phosphoglycerate mutase [261,394] and has been recently confirmed by direct experimental data [308,866]. Thus, like phosphofructokinase and fructose-1,6-bisphosphate aldolase, phosphoglycerate mutase is found in three different (unrelated or distantly related) variants.

Enolase (EC 4.2.1.11)

Enolases encoded in bacterial, archaeal, and eukaryotic genomes are highly conserved; phylogenetic trees for enolases show a “star topology”, which precludes any definitive conclusions on the evolutionary scenario for this enzyme. Pyrococci and M. jannaschii encode additional, divergent paralogs of enolase whose function(s) remains unknown.

Pyruvate kinase (EC 2.7.1.40)

Pyruvate kinase, the terminal glycolytic enzyme, is not encoded in some bacterial (A. aeolicus, T. pallidum) and archaeal (A. fulgidus, M. thermoautotrophicum) genomes. In these organisms, the pyruvate kinase function is probably taken over by phosphoenolpyruvate synthase, which is capable of catalyzing pyruvate formation by reversing its typical reaction.

Pyruvate kinase, like phosphofructokinase (see above), is also missing in H. pylori. Although a ribokinase-like phosphofructokinase and phosphoenolpyruvate synthase could be considered as possible bypasses for these enzymes, it seems more likely that glycolysis is not functional in H. pylori. In contrast, this bacterium encodes the complete set of enzymes involved in gluconeogenesis (Figure 8.2). Such organization of metabolism seems to make perfect sense for H. pylori, given the challenge of maintaining near-neutral intracellular pH in the highly acidic gastric environment. Sugar fermentation, resulting in intracellular production of acid, would place an additional burden on the pH maintenance mechanism, whereas gluconeogenesis converts organic acids into sugars and thus removes H from the cytoplasm. For the purposes of energy production, H. pylori apparently depends on fermentation of amino acids and oligopeptides that are produced by gastric proteolysis and are transported into the bacterial cells by ABC-type transporters. Amino acid fermentation results in alkalinization of the cytoplasm and could relieve part of the burden of pH maintenance in H. pylori. This simple example shows that, even when seemingly plausible candidates for missing steps in a pathway can be suggested, this should be done with caution, and the resulting predicted pathways should be assessed against the biological background of the respective organism.

After a string of recent publications [331,383,770], it appears that most glycolytic enzymes have now been accounted for. While there are no clear candidates for phosphofructokinase and fructose-1,6-bisphosphate aldolase in Thermoplasma spp., the chances of discovering new enzyme variants in this pathway appear very slim.

7.1.2. Gluconeogenesis

With the exception of reactions catalyzed by phosphofructokinase and pyruvate kinase, glycolytic reactions are reversible and function also in gluconeogenesis (Figure 7.2). The reversal of the latter reaction, i.e. conversion of pyruvate into phosphoenolpyruvate, can be catalyzed by two closely related enzymes, phosphoenolpyruvate synthase and pyruvate, phosphate dikinase. The only other reaction that is specific for gluconeogenesis is the dephosphorylation of fructose-1,6-bisphosphate.

Figure 7.2

Distribution of gluconeogenesis enzymes in organisms with completely sequenced genomes. All details are as in Figure 2.7.

Phosphoenolpyruvate synthase (EC 2.7.9.2)

Phosphoenolpyruvate synthase (pyruvate, water dikinase, EC 2.7.9.2) and pyruvate, phosphate dikinase (EC 2.7.9.1) catalyze two similar reactions of phosphoenolpyruvate biosynthesis

and have highly similar sequences. This enzyme is widely present in bacteria, archaea, protists, and plants but is missing in animals, where PEP is synthesized from oxaloacetates in a PEP carboxykinase-catalyzed reaction.

Phosphoenolpyruvate carboxykinase (EC 4.1.1.32 and EC 4.1.1.49)

Phosphoenolpyruvate carboxykinase exists in two unrelated forms, which catalyze ATP-dependent (EC 4.1.1.49) or GTP-dependent (EC 4.1.1.32) decarboxylation of oxaloacetate:

These forms show remarkably complex phyletic distributions. The GTP-dependent form is found in animals and in a limited number of bacteria, such as Chlamydia spp., Mycobacterium spp., T. pallidum, and the green sulfur bacterium Chlorobium limicola. Among archaea, it is encoded only in the genomes of pyrococci, thermoplasmas, and Sulfolobus. In contrast, the ATP-dependent form of phosphoenolpyruvate carboxykinase is found in plants, yeast, and many bacteria. The only complete archaeal genome that has been found to encode the ATP-dependent form is that of A. pernix (Figure 7.2).

Since the typical bacterial ATP-dependent phosphoenolpyruvate carboxykinase appears to be unrelated to the GTP-dependent form found in humans, this key enzyme of central metabolism might be an interesting drug target for such pathogenic bacteria as H. influenzae and C. jejuni(see 7.6).

There have also been reports of a third, pyrophosphate-dependent, form of phosphoenolpyruvate carboxykinase [819], but they remain unconfirmed and no sequence so far has been identified with this form. Absent this third form, phosphoenolpyruvate carboxykinase appears to be missing in a large number of microorganisms, leaving room for discovery of a new enzyme.

Fructose-1,6-bisphosphatase (EC 3.1.3.11)

The best-studied form of fructose-1,6-bisphosphatase, found in E. coli, yeast, and human (COG0158), has a limited phyletic distribution: it is not encoded in the genomes of chlamydia, spirochetes, Gram-positive bacteria, A. aeolicus, or T. maritima. Among archaea, it is present only in Halobacterium sp. A second form of this enzyme (COG1494), originally described in cyanobacteria, has been reported to function both as a fructose-1,6-bisphosphatase and as a sedoheptulose-1,7-bisphosphatase [823,824].

This form also has a limited phyletic distribution, being found in a relatively small number of bacteria (Figure 7.2). Although a member of this second family is encoded in B. subtilis, this organism also has a distinct form of fructose-1,6-bisphosphatase that is unrelated to the first two and is found only in several other low-GC, Gram-positive bacteria [251]. Finally, archaea encode yet another, fourth form of this enzyme that belongs to the inositol monophosphatase family and only recently has been shown to possess fructose-1,6-bisphosphatase activity [399,801]. Like B. subtilis, several bacterial genomes encode members of more than one protein family, which include known or potential fructose-1,6-bisphosphatases; this makes it hard to predict which of them actually has this function in gluconeogenesis. In contrast, there is no clear candidate for this function in A. aeolicus, T. maritima, X. fastidiosa, Chlamydia spp., mycoplasmas, spirochetes, and thermoplasmas. While the first three of these organisms and B. burgdorferi encode enzymes of the inositol monophosphatase family, they are not closely related to the archaeal fructose-1,6-bisphosphatase (typified by the MJ0109 protein from M. jannaschii) and might represent an independent case of enzyme recruitment. Proteins that function as fructose-1,6-bisphosphatase in Chlamydia spp., Thermoplasma spp., mycoplasmas, and T. pallidum, if any, remain to be identified.

7.1.3. Entner-Doudoroff pathway and pentose phosphate shunt

Alternative pathways for converting hexoses into trioses, the pentose phosphate shunt and the Entner-Doudoroff pathway, are found in many organisms but cannot be considered universal. Both of these pathways start from the NADP-dependent oxidation of glucose-6-phosphate into phosphogluconolacton and proceed through 6-phosphogluconate (Figure 7.3). Instead of the standard Entner-Doudoroff pathway, some archaea encode the so-called non-phosphorylating variant of this pathway, which starts from glucose and includes unphosphorylated intermediates.

Figure 7.3

Distribution of enzymes of the pentose phosphate and Entner-Doudoroff pathways in organisms with completely sequenced genomes. Details are as in Figure 2.7.

Glucose-6-phosphate 1-dehydrogenase (EC 1.1.1.49)

Glucose 6-phosphate dehydrogenase (Zwischenferment) primarily uses NADP as the electron acceptor, although there have been reports of NAD-dependent forms. This enzyme is found in many bacteria and eukaryotes but is not encoded in any of the archaeal genomes sequenced to date. In addition, it is missing in several bacteria, such as M. leprae, B. halodurans, S. pyogenes, C. jejuni, and mycoplasmas (Figure 7.3).

6-Phosphogluconolactonase (EC 3.1.1.31)

Although this enzymatic activity had been characterized many years ago, the gene for the lactonase remained unidentified until very recently, which was due, in part, to the inherent instability of its substrate and, in part, to the fact that this activity resides in a protein that is closely related to glucosamine-6-phosphate isomerase/deaminase and might even combine both activities [154,328]. In humans, the lactonase is fused to the glucose-6-phosphate dehydrogenase, forming the C-terminal domain of a bifunctional enzyme. Interestingly, in Plasmodium falciparum, the fusion partners switch places, with the lactonase located at the N-terminus [551]. The lactonase is found largely in the same set of species as glucose dehydrogenase, although it appears to be missing, additionally, in A. aeolicus and D. radiodurans.

7.1.3.1. Pentose phosphate shunt

6-Phosphogluconate dehydrogenase (decarboxylating, EC 1.1.1.44)

6-Phosphogluconate dehydrogenase, the product of the gnd gene in E. coli, is the upstream enzyme specific for the pentose phosphate pathway. Of those organisms that encode phosphogluconate dehydrogenase (COG0362), several (M. loti, B. subtilis, L. lactis) also encode its close paralog (COG1023), whose function remains unknown but which is likely to have the same activity. Phosphogluconate dehydrogenase has an even more narrow phylogenetic distribution than phosphogluconolactonase, being additionally absent from S. pyogenes, X. fastidiosa, H. pylori, and C. jejuni (Figure 7.3).

Pentose-5-phosphate-3-epimerase (EC 5.1.3.1)

The next reaction of the pentose phosphate pathway, isomerization of ribulose 5-phosphate into xylulose 5-phosphate, is catalyzed by phosphoribulose epimerase. In addition to the pentose phosphate pathway, this enzyme also participates in the interconversions of pentose phosphates in the Calvin cycle, which accounts for its wider phyletic distribution than seen for phosphogluconate dehydrogenase.

Ribose 5-phosphate isomerase (EC 5.3.1.6)

Ribose-5-phosphate isomerase, which catalyzes interconversion of ribulose 5-phosphate and ribose 5-phosphate, is found in two apparently unrelated forms, both of which, RpiA and RpiB, have been characterized in E. coli [793]. RpiA is found in many bacteria, archaea, and eukaryotes. In contrast, RpiB is limited to certain bacterial species and is the sole form of ribose-5-phosphate isomerase in B. subtilis, M. tuberculosis, H. pylori, and several other bacteria. The phyletic patterns of the two forms of the enzyme are largely complementary:

Like phosphoribulose epimerase, phosphoribose isomerase participates in the Calvin cycle, which might explain its universal distribution.

Transketolase (EC 2.2.1.1)

In eukaryotes and bacteria, transketolase is a single protein of 610–630 amino acid residues [821]. In archaea, however, this enzyme is either missing altogether (e.g. A. fulgidus, M. thermoautotrophicum) or is encoded by two separate genes that may not even be adjacent (in M. jannaschii). The hyperthermophilic bacterium T. maritima has both types of genes, one full-length and one split gene, the latter probably acquired from archaea via HGT. Transketolase shows high sequence similarity to deoxyxylulose-5-phosphate synthase and other thiamine pyrophosphate-dependent enzymes, which might point to a broad substrate specificity of this enzyme, particularly in thermophiles.

Transaldolase (EC 2.2.1.2)

Transaldolase is a protein of 310–330 amino acid residues, which is present in eukaryotes and many bacteria and catalyzes the transfer of the tri-carbon unit of sedoheptulose-7-phosphate to glyceraldehyde-3-phosphate, producing fructose-6-phosphate and erythrose-4-phosphate [821]. Archaea and some other bacteria encode a closely related but shorter protein, about 210–230 aa long, which has recently been demonstrated to function not as transaldolase, but as fructose-6-phosphate aldolase, which splits fructose-6-phosphate into glyceraldehyde-3-phosphate and dihydroxyacetone [758]. While E. coli encodes two paralogous transaldolases (talA, talB) and two paralogs of the smaller related enzyme (talC, mipB), many other micro-prokaryotes, including B. subtilis, M. jannaschii, and Thermoplasma spp., encode only the latter protein. Although the exact substrate specificity of these enzymes is not known, enzymes from B. subtilis and T. maritima have been reported to have transaldolase activity [758]. Thus, different MipB orthologs could have different (primary) activities, which makes complete reconstruction of the pentose phosphate pathway in organisms having these enzymes unrealistic at this time. Clearly, however, the phyletic patterns of the enzymes of this pathway differ significantly, which suggests the existence of still uncharacterized enzyme forms.

7.1.3.2. The Entner-Doudoroff pathway

Conversion of 6-phosphogluconate into two tri-carbon molecules, 3-phosphoglyceraldehyde and pyruvate, via the Entner-Doudoroff pathway includes only two steps, which are catalyzed by 6-phosphogluconate dehydratase and 2-keto-3-deoxy-6-phosphogluconate aldolase (the products of E. coli genes edd and eda, respectively (Figure 7.3).

Phosphogluconate dehydratase (EC 4.2.1.12)

Phosphogluconate dehydratase is a close paralog of dihydroxyacid dehydratase, an enzyme of isoleucine/valine biosynthesis, which is encoded in almost every genome. As a result, it is not easy to decide which organisms encode phosphogluconate dehydratase. In E. coli and several other proteobacteria, edd and eda genes form operons. In other organisms, such as P. aeruginosa, even though both these genes are present, they are not adjacent, which complicates the identification of phosphogluconate dehydratase.

2-Keto-3-deoxy-6-phosphogluconate aldolase (EC 4.1.2.14)

KDPG aldolase has a much more narrow phyletic distribution than phosphogluconate/dihydroxyacid dehydratase (Fig 7.3). Assuming that a functional Entner-Doudoroff pathway requires the presence of each of these enzymes, as well as glucose-6-phosphate dehydrogenase and phosphogluconolactonase, the available genomic data suggest that the pathway is limited to certain proteobacteria, T. maritima, and some Gram-positive bacteria of the Bacillus/Clostridium group.

7.1.3.3. Non-phosphorylated variants of the Entner-Doudoroff pathway

While the standard Entner-Doudoroff pathway starts from glucose-6-phosphate and proceeds through phosphorylated sugar intermediates, a variety of bacteria and archaea possess so-called “non-phosphorylated” variants of this pathway, which all start from glucose and delay phosphorylation until later stages. The simplest version of such a modified pathway includes glucose oxidation into gluconate, followed by its phosphorylation into 6-phosphogluconate. The resulting 6-phosphogluconate rejoins the standard Entner-Doudoroff pathway. Another variant of the modified pathway includes an additional non-phosphorylated step, dehydratation of gluconate into 2-keto-3-deoxygluconate, followed by its phosphorylation. In yet another variant of this pathway, phosphorylation is delayed even further, until after splitting of 2-keto-3-deoxygluconate into two tri-carbon molecules, pyruvate and glyceraldehyde. The latter compound is then phosphorylated into 3-phosphoglyceraldehyde. Finally, phosphorylation can be delayed one step further, with glyceraldehyde first oxidized into glycerate and then phosphorylated into 2-phosphoglycerate.

Glucose 1-dehydrogenase (EC 1.1.1.47, 1.1.99.10)

Glucose dehydrogenase, which catalyzes glucose oxidation into glucono-1,5-lactone, is known in several variants, which use different electron acceptors. Two non-orthologous NAD-dependent variants of this enzyme (EC 1.1.1.47), typified by enzymes from T. acidophilum [397] and Bacillus megaterium [929], belong, respectively, to the Zn-containing dehydrogenase family and to the short-chain reductases/dehydrogenases family. One more variant of glucose dehydrogenase, which is present in E. coli and several other bacteria, uses pyrroloquinoline quinone as the electron acceptor [637]. Finally, the enzyme from Drosophila (DHGL_DROME) is a flavoprotein that can use a variety of electron acceptors [141].

Gluconolactonase (EC 3.1.1.17)

Only a single variant of gluconolactonase has been characterized so far [413]. It has a patchy and relatively narrow phyletic distribution (COG3386), suggesting that alternative versions of this enzyme might exist.

Gluconate kinase (EC 2.7.1.12)

Gluconate kinase is found in two distinct versions, one unique and the other belonging to a large family of sugar kinases. This second form of gluconate kinase has probably evolved from a glycerol kinase or a xylulose kinase via enzyme recruitment (2.2.5). Gluconate kinases of the first type are found in yeast, D. radiodurans, E. coli, and several other proteobacteria, whereas the second form is apparently limited to B. subtilis and a handful of other Gram-positive bacteria.

Gluconate dehydratase (EC 4.2.1.39)

Although gluconate dehydratase activity has been described in bacteria long ago [299] and can be easily detected in archaea [398], the gene(s) for this enzyme has not been identified. E. coli, some other bacteria, and Thermoplasma spp. encode an enzyme with similar activity, D-mannonate dehydratase (EC 4.2.1.8, the product of uxuA gene), which converts mannonate into 2-keto-3-deoxygluconate. It is not known whether this enzyme can use gluconate as a substrate. In any case, its narrow phyletic distribution suggests that, even if UxuA functions as gluconate dehydratase in E. coli, M. loti, B. subtilis, and Thermoplasma spp., there should exist a different form of this enzyme, which would participate in the non-phosphorylated Entner-Doudoroff pathway in other archaea.

2-Keto-3-deoxygluconate aldolase

Although splitting of 2-keto-3-deoxygluconate into pyruvate and glyceraldehyde has been described long ago [16], the first gene for 2-keto-3-deoxygluconate has been identified only recently in the hyperthermophilic crenarchaeon S. solfataricus. This enzyme is closely related to N-acetyl-neuraminate lyase and belongs to the same superfamily of Schiff base-dependent aldolases [126]. Enzymes of this family (COG0329) are present in all archaeal genomes sequenced so far, as well as in most bacteria. Although the exact substrate specificity of each particular member of this family is not yet clear, Thermoplasma spp. and P. abyssi encode proteins that are highly similar to the enzyme from Sulfolobus and can be confidently predicted to catalyze this reaction.

7.1.4. The TCA cycle

The tricarboxylic acid cycle (Krebs cycle) is the central metabolic pathway that links together carbohydrate, amino acid, and fatty acid degradation and supplies precursors for various biosynthetic pathways. Remarkably, the complete TCA cycle, which has been studied in much detail in animal and yeast mitochondria, E. coli, and B. subtilis, is only found in a handful of microorganisms (Figure 7.4). Most organisms with completely sequenced genomes encode only a certain subset of TCA cycle enzymes and, instead of performing the entire cycle, utilize only fragments of it. Another remarkable feature is the diversity of this pathway: cases of non-orthologous gene displacement are detectable for at least five of the eight TCA cycle enzymes. A detailed analysis of the phyletic distribution and evolution of the TCA cycle enzymes has been recently published by Huynen and coworkers [370]. Most of their conclusions remain valid, although the sequences of the genomes of two aerobic archaea, the crenarchaeon A. pernix and the euryarchaeaon Halobacterium sp., have substantially changed the notions of what can and cannot be found in archaeal genomes. In an impressive confirmation of early biochemical results on halobacterial metabolism [8], both of these organisms have been found to encode the complete set of TCA cycle enzymes as was the microaerophile Thermoplasma spp. A reconstruction of the TCA cycle reactions occurring in each organism can be a very interesting project, which we recommend the readers to do on their own (see Problems). We concentrate here exclusively on the cases of non-orthologous gene displacement.

Citrate synthase (EC 4.1.3.7)

Citrate synthase is a highly conserved enzyme, which is encoded in most bacterial, archaeal, and eukaryotic genomes (Figure 7.3). It serves as the principal port of entry of acetyl-CoA into the TCA cycle and, in eukaryotes, is exclusively located in the mitochondria. A very similar reaction is catalyzed by ATP:citrate lyase (EC 4.1.3.8), which contains a citrate synthase-like domain at its C-terminus.

However, ATP:citrate lyase so far has been found exclusively in eukaryotes, where it localizes in the cytoplasm and preferentially catalyzes the reverse reaction, citrate cleavage.

Citrate synthase is missing in spirochetes and mycoplasmas, which do not encode any enzymes of the TCA cycle. It is also missing in pyrococci, M. jannaschii, S. pyogenes, and H. influenzae, which encode unlinked branches of the TCA cycle (Figure 7.4). It has been suggested that the TCA cycle has evolved from two separate reductive branches [711], which were subsequently linked by (i) citrate synthase and (ii) either an α-ketoglutarate dehydrogenase or an α-ketoglutarate:ferredoxin oxidoreductase [445,538]. In any case, due to the absence of known displacements, citrate synthase seems to be a good indicator of the presence of a (nearly) complete TCA cycle in a given organism.

Figure 7.4

Distribution of the TCA cycle enzymes in organisms with completely sequenced genomes. All details are as in Figure 2.7.

Aconitase (EC 4.2.1.3)

There are two distantly related, paralogous aconitases, referred to as aconitase A and aconitase B, both of which are present in E. coli and many other proteobacteria (Figure 7.4). Aconitase A has a much wider phyletic distribution and is the form of the enzyme present in α-proteobacteria M. loti, C. crescentus, and R. prowazekii. Accordingly, this is also the form of aconitase found in the mitochondria. Although aconitase B has a much more narrow phyletic distribution, it is the only form of the enzyme encoded in Synechocystis sp., P. multocida, H. pylori, and C. jejuni.

Aconitase is closely related to 3-isopropylmalate dehydratase, an enzyme of leucine biosynthesis (see 7.4.4), which sometimes makes its annotation of these enzymes in sequenced genomes not entirely straightforward. However, leu genes are usually found in a conserved operon, which helps make the correct assignment.

Isocitrate dehydrogenase (EC 1.1.1.42)

Like aconitase, isocitrate dehydrogenase is also found in two forms, which, however, appear to be unrelated. The mitochondrial form of this enzyme, also found in E. coli and many other bacteria and archaea, is closely related to isopropylmalate dehydrogenase, an enzyme of leucine biosynthesis (see 7.4.4), and it is believed that it could have evolved via a duplication of the leuB gene [579]. Again, genome annotation here has to rely on the genetic context, i.e. on the presence or absence of adjacent leu genes. In any case, the product of such a gene is very likely to have both isocitrate dehydrogenase and isopropylmalate dehydrogenase activity. This form is active as a homodimer, which distinguishes it from the second form, referred to as monomeric isocitrate dehydrogenase. This second form was originally found in Vibrio sp. [814] and was subsequently discovered in many other bacteria [803]. It is the only form of the enzyme encoded in the genomes of M. leprae, V. cholerae, and C. jejuni.

2-Ketoglutarate dehydrogenase (EC 1.2.4.2)

In mitochondria and many aerobic bacteria and archaea, decarboxylation of α-ketoglutarate into the succinyl moiety of succinyl-CoA is catalyzed by the thiamine pyrophosphate and lipoate-dependent α-ketoglutarate dehydrogenase complex. In contrast, many anaerobic bacteria and archaea utilize α-ketoglutarate ferredoxin oxidoreductase, an unrelated enzyme [445,538].

Succinyl-CoA synthetase (EC 6.2.1.4, 6.2.1.5)

Succinyl-CoA synthetases are divided into paralogous, highly similar GTP-dependent and ATP dependent forms. Succinyl-CoA synthetase is a member of a large family of acyl-CoA synthetases (NDP-forming), which also includes acetyl-CoA synthetase found in many archaea and lower eukaryotes. For ATP binding, these enzymes employ the ATP-grasp domain (Table 3.2). Variants of this enzyme with shifted substrate specificities are found in most phylogenetic lineages.

Succinate dehydrogenase/fumarate reductase (EC 1.3.99.1)

Mitochondrial succinate dehydrogenase, which couples the oxidation of succinate to fumarate with the reduction of ubiquinone to ubiquinol, consists of four subunits carrying three iron-sulfur centers, a covalently bound flavin and two b-type hemes (the history of the discovery of these complexes is vividly described in [773]). The fumarate reductase (quinol:fumarate reductase) complex also contains iron-sulfur centers and a covalently bound flavin but usually consists of only two or three subunits. Succinate dehydrogenase is part of the aerobic respiratory chain, whereas fumarate reductase is involved in anaerobic respiration, with fumarate functioning as the terminal electron acceptor. Accordingly, one or both of these enzymes is found in all organisms, with the exception of pyrococci, spirochetes, and mycoplasmas (Figure 7.4).

Fumarate hydratase (fumarase, EC 4.2.1.2)

Like several other TCA cycle enzymes, fumarase is represented by two unrelated forms. The mitochondrial form of this enzyme (class II) is also encoded in many bacteria and in aerobic archaea, A. pernix and Halobacterium sp. The second form of fumarase (class I) consists of two subunits that are fused in most bacterial genomes but are encoded by separate genes in archaea, T. maritima and A. aeolicus. The two forms of fumarase have largely complementary phyletic patterns:

The only archaeal genome that appears not to encode a fumarase is T. acidophilum, whose fumarase homolog Ta0258 is much more closely related to aspartate ammonia-lyase (COG1027) than to a typical fumarase (COG0114). The actual activity of this Thermoplasma enzyme has not been determined.

Malate dehydrogenase (EC 1.1.1.37)

Malate dehydrogenase is also found in two forms, with the mitochondrial form showing a much wider phyletic distribution. The second form of malate dehydrogenase was originally described in archaea [8,78,320] and is often referred to as the “archaeal” form of the enzyme. However, it is also encoded in certain bacterial genomes, including three paralogous genes in E. coli (ybiC, yiaK, and ylbC) and M. loti. It is the only form of malate dehydrogenase in pyrococci and in P. aeruginosa. Remarkably, M. thermoautotrophicus, M. jannaschii, B. subtilis, H. influenzae, and P. multocida encode both forms of malate dehydrogenase ([842], Figure 7.3). Why do these organisms, with their relatively small genomes, need two paralogous forms of this enzyme remains unclear. U. urealyticum and T. pallidum do not encode either of the two forms of malate dehydrogenase, in contrast to their respective relatives M. genitalium and B. burgdorferi. Therefore, the possibility remains that there exists yet another, third form of this enzyme.

7.2. Pyrimidine Biosynthesis

In contrast to the pathways of carbohydrate metabolism discussed above, enzymes of the pyrimidine biosynthesis pathway show a fairly consistent phyletic pattern, although cases of non-orthologous gene displacement can be found here, too (see Figure 2.7 ). The whole pathway, with the exception of the last three steps, is missing in the obligate parasitic bacteria with small genomes: rickettsiae, chlamydiae, spirochetes, and mycoplasmas, whereas bacteria and archaea with larger genomes encode all or almost all enzymes of pyrimidine biosynthesis.

Carbamoyl phosphate synthase (EC 6.3.5.5)

In bacteria and archaea, carbamoyl phosphate synthase consists of two subunits, which in eukaryotes are fused into a single multifunctional CAD protein that additionally contains dihydroorotase and aspartate carbamoyl-transferase domains. The small subunit, encoded by the carA gene, is a typical glutamine amidotransferase of the Triad family [936]. The large subunit consists of two ATP-grasp domains (see 3.3.3) fused in the same polypeptide chain [391,800,841]. In M. jannaschii and M. thermoautotrophicus, the large subunit is split into two proteins, which are encoded by different, albeit adjacent, genes. In addition to the obligate parasites mentioned above, carbamoyl phosphate synthase is missing in P. horikoshii, P. abyssi, Thermoplasma spp., and H. influenzae (Figure 2.7). It is present, however, in Pyrococcus furiosus, suggesting a relatively recent loss of this enzyme in the other two pyrococci. In P. abyssi, carbamoyl phosphate biosynthesis is carried out by an unrelated form of the enzyme, which is closely related to carbamate kinase [683,684]. This second form is also responsible for the carbamoyl phosphate synthase activity in P. furiosus [201,692] and might account for this activity in Thermoplasma spp. Although both subunits of carbamoyl phosphate synthase belong to large protein superfamilies and are similar to many proteins with different substrate specificities, the sheer size of the large subunit, which typically contains more than 1,050 amino acid residues, allows an easy identification of this enzyme in genome analyses. However, caution is due with respect to the annotation of any shorter proteins that give statistically significant hits to the large subunit of carbamoyl phosphate synthase: these are likely to be other ATP-grasp superfamily enzymes (see Table 3.2).

Aspartate carbamoyltransferase (EC 2.1.3.2)

Aspartate carbamoyltransferase, the second enzyme of pyrimidine biosynthesis, has a wide distribution with a phyletic pattern, which is similar to that of carbamoyl phosphate synthase but additionally includes pyrococci and Thermoplasma spp. (Figure 2.7). This enzyme, however, is lacking in H. influenzae and in its close relative P. multocida. In eukaryotes, aspartate carbamoyltransferase comprises the C-terminal domain of the multifunctional CAD protein [771].

Dihydroorotase (EC 3.5.2.3)

The well-characterized form of dihydroorotase (COG0418), encoded by the E. coli pyrC gene [65] and by the URA4 gene in yeast [325], has a very limited phyletic distribution (Figure 2.7). In contrast, the second form of this enzyme (COG0044) is almost universal, being present in many bacteria, archaea, and eukaryotes [686]. In eukaryotes, this enzyme forms the middle portion of the multifunctional CAD protein [771,943]. In yeast, however, this domain is apparently inactive [794], most likely because of the presence of the alternative form of dihydroorotase. Simlilar to aspartate carbamoyltransferase, neither form of dihydroorotase is encoded in H. influenzae or P. multocida. Notably, the union of the phyletic patterns for the two forms of dihydroorotase is identical to the phyletic pattern of aspartate carbamoyl-transferase:

Dihydroorotate dehydrogenase (EC 1.3.3.1)

Dihydroorotate dehydrogenase displays the same phyletic pattern as dihydroorotase and aspartate carbamoyltransferase, with the addition of H. influenzae and P. multocida. Both these bacteria encode the enzymes for all downstream steps of pyrimidine biosynthesis.

Orotate phosphoribosyltransferase (EC 2.4.2.10)

The phyletic pattern of orotate phosphoribosyltransferase differs from that of dihydroorotate dehydrogenase in only one respect, the presence of a pyrE-related gene in C. pneumoniae. The function of the product of this gene in C. pneumoniae is unknown, but given the absence in this organism of the enzymes for the upstream and the downstream steps of the pathway, it is unlikely to function as orotate phosphoribosyltransferase. Rather, this enzyme might be recruited to catalyze a different phosphoribosyltransferase reaction. In eukaryotes, orotate phosphoribosyltransferase is fused to the next enzyme of the pathway, OMP decarboxylase, forming a two-domain UMP synthase. As a result, orotate phosphoribosyltransferase and OMP decarboxylase are occasionally misannotated as UMP synthases and vice versa [264].

Orotidine-5′-monophosphate decarboxylase (EC 4.1.1.23)

Although the phyletic pattern of OMP decarboxylase is identical to that of dihydroorotate dehydrogenase, a closer look at COG0284 shows that it consists of three distantly related families. Two of these include well-characterized enzymes from E. coli and other bacteria [856] and from yeast and other eukaryotes [234,570]. The third family includes OMP decarboxylases from archaea and a small number of bacteria, such as M. tuberculosis, M. leprae, and Myxococcus xanthus [12,439]. Mycobacterial OMP decarboxylases seem to be sufficiently distinct from those of eukaryotes and other bacteria to consider them promising targets for antituberculine drugs [266].

Uridylate kinase (EC 2.7.4.-, 2.7.4.14)

There seem to be two distinct forms of uridylate kinase: one specific for UMP and found in bacteria and archaea (COG0528) and another one that phosphorylates both UMP and CMP and is found in eukaryotes [628,739]. The eukaryotic form of the enzyme is closely related to bacterial adenylate kinase and could have been recruited from an ancestral prokaryotic adenylate kinase. The prokaryotic form of uridylate kinase is encoded in all bacterial and archaeal genomes sequenced to date, including the ‘minimal’ (see 2.2.5) genomes of mycoplasmas and Buchnera.

Nucleoside diphosphate kinase (EC 2.7.4.6)

Nucleotide diphosphate kinase (COG0105) is highly conserved in most bacteria, archaea, and eukaryotes. Surprisingly, however, this enzyme is not encoded in T. maritima, L. lactis, S. pyogenes, and mycoplasmas. One could imagine that these organisms employ a different nucleotide diphosphate kinase that might have been recruited, just like the eukaryotic uridylate kinase, from the adenylate kinase family (COG0563).

This, however, would not solve the problem for T. maritima and mycoplasmas, which encode only a single enzyme of that family. It therefore seems likely that nucleotide diphosphate kinase in these organisms has been recruited from yet another kinase family. Indeed, a phyletic pattern search for a protein that would be encoded in those four genomes, but not in other organisms with relatively small genomes, such as chlamydiae, spirochetes, or H. pylori, easily finds an uncharacterized (predicted) kinase related to dihydroxyacetone kinase (COG1461), which appears to be a good candidate for the role of nucleoside diphosphate kinase in these organisms:

CTP synthase (UTP-ammonia ligase, EC 6.3.4.2)

CTP synthase is a two-domain protein, which consists of an N-terminal nucleotide-binding synthetase domain and a C-terminal glutamine amidotransferase domain. This enzyme is extremely highly conserved in bacteria, archaea, and eukaryotes. It is missing only in the genomes of M. genitalium and M. pneumoniae, which apparently make CTP from CDP or CMP in a salvage pathway, rather than from UTP.

General notes on pyrimidine biosynthesis evolution

Comparison of the phyletic patterns for the enzymes of pyrimidine biosynthesis reveals two important evolutionary trends. First, there appears to be a tendency toward decreasing the genome size by losing genes that have ceased to be essential. Indeed, ample evidence indicates that mycoplasmas evolved from a Gram-positive ancestor by way of massive gene loss associated with their adaptation to parasitism. While bacilli, lactococci, and many other Gram-positive bacteria carry the full set of genes of pyrimidine biosynthesis, most of the pyr genes have been lost in the mycoplasmal lineage. Similarly, many pyr genes apparently have been lost in other parasitic bacteria with small genomes, such as spirochetes, rickettsiae, and chlamydiae (Figure 2.7).

The trend toward gene loss is much more pronounced for the initial steps of the pyrimidine biosynthesis pathway than it is for the distal steps. Thus, genes for the first three steps of pyrimidine biosynthesis from bicarbonate and ammonia to dihydroorotate (carA, carB, pyrB, and pyrC) are missing in H. influenzae, but the genes for all the subsequent steps of pyrimidine biosynthesis, from dihydroorotate to CTP, are present (Figure 2.7). This means that, although H. influenzae is incapable of de novo pyrimidine biosynthesis, it still can synthesize UTP and CTP from dihydroorotate, orotate, or OMP. Spirochetes, chlamydiae, rickettsiae, and mycoplasmas show an even deeper loss of pyrimidine biosynthesis genes but nevertheless retain genes for the last three steps of the pathway, the conversion of UMP into CTP. Thus, while depending on the host for the supply of essential nutrients, this strategy allows the parasite to preserve at least some metabolic plasticity. In particular, every organism seems to encode enzymes to synthesize its own nucleoside triphosphates (NTPs). For thermodynamic reasons, bacteria cannot import NTPs directly, although intracellular bacterial parasites do encode ATP/ADP translocases, which are capable of exchanging ADP generated by the parasite for cytoplasmic ATP [899,910].

7.3. Purine Biosynthesis

Like pyrimidine biosynthesis enzymes, enzymes of the purine biosynthesis pathway follow a consistent phylogenetic pattern, albeit with some inevitable complications (Figure 7.5). With only a few exceptions, enzymes that catalyze the common reactions of the pathway, which leads to the formation of inosine-5′-monophosphate, are missing in parasitic bacteria with small genomes, namely mycoplasmas, rickettsiae, chlamydiae, spirochetes, Buchnera sp., and H. pylori, and, interestingly, in the aerobic crenarchaeon A. pernix. Other bacteria encode the complete set of purine biosynthesis enzymes, whereas the distribution of these enzymes in archaeal genomes is more complex and has to be discussed separately for each enzyme.

Figure 7.5

Distribution of purine biosynthesis enzymes in organisms with completely sequenced genomes. All details as in Figure 2.7.

Phosphoribosylpyrophosphate synthetase (EC 2.7.6.1)

PRPP synthetase (ribose-phosphate diphosphokinase) is an enzyme that is shared by purine biosynthesis and histidine biosynthesis pathways. This enzyme is found in most completely sequenced genomes, including those of mycoplasmas, spirochetes, and Buchnera, which do not encode most purine biosynthesis enzymes (Figure 7.5).

Amidophosphoribosyltransferase (EC 2.4.2.14)

Glutamine phosphoribosylpyrophosphate amidotransferase (PurF) belongs to the N-terminal nucleophile (Ntn) family of glutamine amidotransferases [936]. This enzyme is encoded in every sequenced bacterial genome, with the exception of some obligate parasites, such as rickettsiae, chlamydiae, spirochetes, mycoplasmas, and H. pylori, and in every archaeal genome except for A. pernix. The same phyletic pattern is seen for the majority of purine biosynthesis enzymes.

Phosphoribosylamine-glycine ligase (EC 6.3.4.13)

Phosphoribosylglycinamide synthetase PurD, an ATP-grasp superfamily (Table 3.2) enzyme, has the same phyletic pattern as amidophosphoribosyltransferase and many other enzymes of this pathway.

Phosphoribosylglycinamide formyltransferase (EC 2.1.2.2)

5′-Phosphoribosyl-N-formylglycinamide synthase (GAR transformylase) exists in two different forms, formate-dependent (PurN) and folate-dependent (PurT), which are unrelated to each other and catalyze entirely different reactions. The folate-dependent form functions as a transferase, catalyzing transfer of the formyl group from formyltetrahydrofolate to phosphoribosylglycinamide. This enzyme is found in many bacteria and eukaryotes but only in a few archaea, such as Halobacterium sp. and Thermoplasma spp. The formate-dependent form of the enzyme belongs to the ATP-grasp superfamily (see Table 3.2) and catalyzes an ATP-dependent ligation of phosphoribosylglycinamide with formic acid. This is the only form of GAR transformylase in methanogens and pyrococci. Surprisingly, neither form of the enzyme is encoded in the A. fulgidus genome. With the exception of A. fulgidus, the combined phyletic pattern of the two forms of GAR transformylase coincides with the patterns for amidophosphoribosyltransferase and phosphoribosylamine-glycine ligase:

Phosphoribosylformylglycinamidine synthase (EC 6.3.5.3)

Like many other amidotransferases, phosphoribosylformylglycinamidine (FGAM) synthase PurL consists of two subunits, a glutamine amidotransferase of the Triad family [936] and a synthetase. The phyletic pattern of both FGAM synthase subunits is the same as that of PurF and PurD. In E. coli and many other γ-proteobacteria, as well as in yeast and other eukaryotes, these two subunits are fused in one polypeptide chain, whereas in most other bacteria and in archaea they are encoded by separate genes. In this latter case, FGAM synthase apparently requires an additional 80-aa subunit, referred to as PurS [747].

Phosphoribosylaminoimidazol synthetase (EC 6.3.3.1)

Phosphoribosylformylglycinamidine cycloligase (AIR synthetase) PurM has the same phyletic pattern as PurF, PurD, and PurL.

Phosphoribosylaminoimidazole carboxylase (EC 4.1.1.21)

Phosphoribosylaminoimidazole (AIR) carboxylase (NCAIR synthetase) PurK is, like PurD, an ATP-grasp superfamily enzyme (Table 3.2), which catalyzes ATP-dependent carboxylation of AIR. Unlike other enzymes of purine biosynthesis, PurK is not encoded in the genomes of A. fulgidus, C. jejuni, methanogens, and pyrococci (Figure 7.5), so that the mechanism of AIR carboxylation in these organisms remains unknown. This reaction can occur spontaneously at elevated temperatures in a CO₂-rich atmosphere, which could explain the absence of this enzyme in hyperthermophilic archaea. This explanation does not seem to work, however, for C. jejuni, suggesting the existence of a still unidentified alternative version of PurK (see [270,567] for discussion).

Phosphoribosylcarboxyaminoimidazole mutase

Phosphoribosylcarboxyaminoimidazole (NCAIR) mutase, previously thought to be a subunit of NCAIR synthetase but recently identified as an individual enzyme [567,583], has the typical phyletic pattern of purine biosynthesis enzymes, identical to the phyletic patterns PurF, PurD, and PurL.

Phosphoribosylaminoimidazolesuccinocarboxamide synthase (EC 4.3.3.2)

Phosphoribosylaminoimidazolesuccinocarboxamide (SAICAR) synthase (PurC) contains a distinct version of the ATP-grasp domain. In addition to the standard set of organisms that are capable of purine biosynthesis, SAICAR synthase is encoded in the genome of R. prowazekii. It is hard to imagine what might be the function of this enzyme in an intracellular parasite, which lacks all other enzymes of purine biosynthesis. The sequence of R. prowazekii PurC is closely related to the enzymes from other α-proteobacteria but has at least three substitutions of amino acid residues that are otherwise conserved in SAICAR synthases (E.V.K., unpublished observations). This suggests that rickettsial SAICAR synthase might have lost its enzymatic activity and acquired another, perhaps regulatory function.

Adenylosuccinate lyase (EC 4.3.2.2)

Adenylosuccinate lyase (PurB) has the typical phyletic pattern of purine biosynthesis enzymes, with the addition of H. pylori. This is most likely due to the involvement of PurB in the conversion of IMP into AMP, the reaction that appears to occur in H. pylori.

AICAR transformylase (EC 2.1.2.3)

Phosphoribosylaminoimidazolecarboxamide (AICAR) formyltransferase (PurH) catalyzes the transfer of the formyl group from formyltetrahydrofolate to AICAR. In every organism studied to date, this protein is fused to the IMP cyclohydrolase in a bifunctional enzyme. AICAR transformylase comprises the C-terminal 300-aa portion of the PurH protein, whereas IMP cyclohydrolase comprises the N-terminal 200-aa region [693]. AICAR transformylase is encoded in almost the same set of organisms as all other purine biosynthesis enzymes, with the exception of A. fulgidus, which encodes only the IMP cyclohydrolase portion of PurH, and methanogens and pyrococci that do not encode either of these enzymes.

IMP cyclohydrolase (EC 3.5.4.10)

IMP cyclohydrolase, which catalyzes the last step of purine biosynthesis, is fused to AICAR transformylase in every organism, except for A. fulgidus, which does not have an AICAR transformylase at all, and Halobacterium sp., in which the AICAR transformylase domain is fused to PurN, a different folate-dependent GAR transformylase (see above). As noted above, methanogens and pyrococci do not encode a recognizable IMP cyclohydrolase.

Adenylosuccinate synthase (EC 6.3.4.4)

Conversion of IMP into AMP can occur in one step, which is catalyzed by the eukaryote-specific enzyme AMP deaminase (EC 3.5.4.6), or in two steps, as in most bacteria and archaea. First, IMP is converted into adenylosuccinate by adenylosuccinate synthase PurA. In addition to the entire set of organisms that encode enzymes of IMP biosynthesis, PurA is also encoded in H. pylori. The second step, the conversion of adenylosuccinate into AMP, is catalyzed by adenylosuccinate lyase PurB (see above), which has the same phyletic pattern as PurA.

IMP dehydrogenase (EC 1.1.1.205)

Although the reverse reaction, catalyzed by GMP reductase (EC 1.6.6.8), occurs in one step, conversion of IMP into GMP takes two steps. First, IMP is oxidized into XMP by IMP dehydrogenase GuaB, a close paralog of GMP reductase, which, however, contains an ~120-amino acid insert comprising two CBS domains involved in allosteric regulation of the enzyme activity [79,941]. Because CBS is a “promiscuous” domain, which is found in association with various proteins [26], it has caused numerous errors in automated genome annotation (see 5.2.2) [264]. Thus, at least twelve A. fulgidus proteins have been annotated as IMP dehydrogenases or “IMP dehydrogenase-related” proteins [444], whereas, ironically, the real IMP dehydrogenase appears not to be encoded in the A. fulgidus genome. With the exception of this archaeon, IMP dehydrogenase is present in almost every bacterial and archaeal genome sequenced to date, including A. pernix, C. pneumoniae, and B. burgdorferi, which do not encode any enzymes of IMP biosynthesis and apparently have to import this nucleotide.

GMP synthase (EC 6.3.5.2)

GMP synthase is another amidotransferase that consists of two subunits, a glutamine amidotransferase of Triad family [936] and a synthetase subunit, which belongs to the PP-loop superfamily of ATP pyrophosphatases [102]. The phylogenetic pattern of both GMP synthase subunits is the same as that of IMP dehydrogenase, with the addition of A. fulgidus, i.e. this enzyme is also found in A. pernix, C. pneumoniae, and B. burgdorferi, which lack many other purine biosynthesis enzymes. In bacteria, yeast, and other eukaryotes, and in A. pernix, these two subunits are fused together in the same polypeptide, whereas in other archaea, they are encoded by separate genes.

General notes on purine biosynthesis evolution

The phyletic distribution of purine biosynthesis enzymes shows some of the trends noted above for other pathways, i.e. non-orthologous gene displacement and increased loss of enzymes for upstream steps of the pathway as compared to the downstream steps. With the exception of several obligate parasites with very small genomes and Buchnera sp., most bacteria encode the entire set of purine biosynthesis enzymes; there is little doubt that they are all capable of IMP formation. Based on their gene content, bacteria H. pylori, C. pneumoniae, and B. burgdorferi and the archaeon A. pernix are only capable of converting IMP into GMP; AMP formation in these organisms probably occurs through the activity of adenine phosphoribosyltransferase or some other mechanism. While Halobacterium sp. and Thermoplasma spp. encode all the enzymes of purine biosynthesis, other archaea appear to miss at least two pur genes. Methanogens and pyrococci lack purK and purH genes, and A. fulgidus additionally lacks purN/purT and guaB, making it hard to judge whether the purine biosynthesis pathway is functional in this organism. Purine biosynthesis is much more likely to occur in methanogens and pyrococci, which would then need to harbor alternative versions of AICAR transformylase and IMP cyclohydrolase and, potentially, an alternative version of AIR carboxylase. Thus far, no obvious candidates for these activities have been identified by comparative genome analysis of these organisms. It is amazing that, although purine biosynthesis has been intensely studied for over 50 years, comparative genomics reveals unsuspected gaps in our understanding of this pathway and may eventually lead to the discovery of novel enzymes.

7.4. Amino Acid Biosynthesis

7.4.1. Aromatic amino acids

7.4.1.1. Common steps of the pathway

The biosynthetic pathways for phenylalanine, tyrosine, and tryptophan in bacteria and eukaryotes share common steps leading from phosphoenolpyruvate and erythrose-4-phosphate to chorismate. Enzymes for most of these steps are encoded also in archaeal genomes.

2-Dehydro-3-deoxy-D-arabino-heptonate 7-phosphate synthase (EC 4.1.2.15)

Although 2-dehydro-3-deoxy-D-arabino-heptonate 7-phosphate (DAHP) synthase is found in E. coli in three different versions, AroF, AroG, and AroH, all of these enzymes are close paralogs and represent the so-called microbial form of DAHP synthase. A different form of this enzyme was originally described in potato and Arabidopsis and designated the plant form. Subsequently, this form has been discovered also in bacteria [205,298,433,879]. This form is encoded in many complete genomes and is the only DAHP synthase in M. tuberculosis, M. leprae, H. pylori, and C. jejuni (Figure 7.6). B. subtilis and several other Gram-positive bacteria encode a third form of DAHP synthase, referred to as AroA(G), which is homologous to 3-deoxy-D-manno-octulosonate 8-phosphate synthase of E. coli [96]. Remarkably, this third form is also found in T. maritima and in several archaea, such as P. abyssi, A. pernix, and Thermoplasma spp. Other archaea, such as A. fulgidus, M. jannaschii, and M. thermoautotrophicum, as well as the bacterium A. aeolicus, encode neither of these three DAHP synthases and appear to synthesize 3-dehydroquinate via a different mechanism that does not include DAHP as an intermediate (see below).

Figure 7.6

Distribution of tryptophan biosynthesis enzymes in organisms with completely sequenced genomes. All details are as in Figure 2.7.

3-Dehydroquinate synthase (EC 4.6.1.3)

3-Dehydroquinate synthase is found in many bacteria and in some archaea. Remarkably, the phyletic pattern of this enzyme exactly corresponds to the overlap of the phyletic patterns for the three forms of DAHP synthase:

This correlation suggests that a single form of 3-dehydroquinate synthase can account for the conversion of DAHP into 3-dehydroquinate in all the organisms with completely sequenced genomes and probably represents the only form of this enzyme.

3-Dehydroquinate dehydratase (EC 4.2.1.10)

Two forms of 3-dehydroquinate dehydratase have been characterized and designated class I (encoded by aroD gene) and class II (encoded by aroQ or QUTE genes), respectively. Taken together, these two enzymes completely cover the phyletic diversity of the organisms that encode 3-dehydroquinate synthase:

Notably, dehydroquinate dehydratase (as well as most of the other enzymes of tryptophan biosynthesis) is found in several genomes that do not encode dehydroquinate synthase, indicating the existence of an alternative, still uncharacterized pathway of dehydroquinate formation in Halobacterium sp., A. fulgidus, M. jannaschii, and M. thermoautotrophicum, and A. aeolicus.

Shikimate 5-dehydrogenase (EC 1.1.1.25)

The last enzyme in the shikimate-producing part of the pathway, shikimate 5-dehydrogenase, is encoded in most of the completely sequenced bacterial (with the exception of rickettsiae, spirochetes, and mycoplasmas) and archaeal (with the exception of P. horikoshii) genomes. The phyletic pattern for shikimate 5-dehydrogenase coincides with the combined pattern of the two forms of 3-dehydroquinate synthase:

Shikimate kinase (EC 2.7.1.71)

The typical form of shikimate kinase, found in bacteria and eukaryotes, is not encoded in any archaeal genome sequenced so far. Recently, a shikimate kinase of the GHMP superfamily has been identified and experimentally studied in M. jannaschii [171]. This enzyme is encoded in each archaeal genome, except for P. horikoshii. Together, these two forms of shikimate kinase have the same phyletic pattern as the combination of the two forms of 3-dehydroquinate dehydratase or shikimate dehydrogenase:

5-Enolpyruvylshikimate 3-phosphate synthase (EC 2.5.1.19)

Like shikimate dehydrogenase, 5-enolpyruvylshikimate 3-phosphate synthase (AroA) is found in only one form with the same phyletic pattern as the preceding enzyme.

Chorismate synthase (EC 4.6.1.4)

The only known chorismate synthase (AroC) has the same phyletic pattern as shikimate dehydrogenase and 5-enolpyruvylshikimate 3-phosphate synthase.

7.4.1.2. Tryptophan biosynthesis

After chorismate, the tryptophan biosynthetic pathway deviates from the pathways leading to phenylalanine and tyrosine. In the tryptophan branch, all the remaining enzymes have very similar phyletic patterns.

Anthranilate synthase (EC 4.1.3.27)

Anthranilate synthase and the closely related para-aminobenzoate synthase consist of two components, the synthetase subunit and the glutamine amidotransferase subunit, which, in most organisms, are encoded by separate genes trpG (or pabA) and trpE (or pabB). In E. coli, the trpG gene for glutamine amidotransferase subunit is fused to the trpD gene that encodes anthranilate phosphoribosyltransferase, the enzyme catalyzing the next step of the pathway. This sometimes leads to a confusion in nomenclature, with the trpG gene being referred to as trpD or as trpD_1. The phyletic pattern for anthranilate synthase is the same as the patterns described above for shikimate dehydrogenase, 5-enolpyruvylshikimate 3-phosphate synthase, and chorismate synthase, with the exception that anthranilate synthase is missing in chlamydiae.

Anthranilate phosphoribosyltransferase (EC 2.4.2.18)

There is only one form of anthranilate phosphoribosyltransferase that shows almost the same phyletic pattern as anthranilate synthase. The only diversification in the existing set of genomes is the absence of the trpD gene (as well as genes for the remaining steps of tryptophan biosynthesis) in S. pyogenes. This probably means that genes annotated as trpG and trpE in S. pyogenes actually encode para-aminobenzoate synthase and have no role in tryptophan biosynthesis.

N-(5′-Phosphoribosyl)anthranilate isomerase (EC 5.3.1.24)

Phosphoribosylanthranilate isomerase is also represented by only one form with essentially the same phyletic pattern as anthranilate phosphoribosyltransferase. Here again, the nomenclature is somewhat complicated because of a gene fusion in E. coli. The phosphoribosylanthranilate isomerase gene that, in most species, is referred to as trpF, in E. coli is fused to the trpC gene that encodes indole-3-glycerol phosphate synthase, the enzyme for the next step of the pathway. Therefore, in E. coli, the trpF gene is sometimes also referred to as trpC, which can lead to confusion. The phyletic pattern of phosphoribosylanthranilate isomerase is essentially the same as that of other enzymes of tryptophan biosynthesis, with the most notable difference being the apparent absence of trpF in M. tuberculosis and M. leprae. Another peculiarity is the unusual distribution of phosphoribosylanthranilate isomerase in different chlamydial species: while C. trachomatis and C. muridarum both have the trpF gene, C. pneumoniae does not. This probably reflects the ongoing gene loss in the evolution of chlamydiae.

Indole-3-glycerol phosphate synthase (EC 4.1.1.48)

Indole-3-glycerol phosphate synthase exists in a single form with the same phyletic pattern as anthranilate phosphoribosyltransferase.

Tryptophan synthase (EC 4.2.1.20)

Tryptophan synthase consists of two subunits, which are encoded by trpA and trpB genes. Their phyletic patterns are similar to that of anthranilate phosphoribosyltransferase, with the exception that, as seen above for trpC, trpA, and trpB genes are found in C. trachomatis and C. muridarum but not in C. pneumoniae.

7.4.1.3. Phenylalanine and tyrosine biosynthesis

Chorismate mutase (EC 5.4.99.5)

Chorismate mutase is involved in both phenylalanine and tyrosine biosynthesis. The best-known version of this enzyme is found in E. coli in two paralogous forms fused with prephenate dehydratase in PheA and prephenate dehydrogenase in TyrA. In addition to these two forms, there is (i) a distantly related form of chorismate mutase encoded in yeast, fungi, and in plant cells and (ii) an unrelated monofunctional form found in B. subtilis, Synechocystis sp., and many other Gram-positive bacteria and cyanobacteria.

Although these three forms of chorismate mutase show almost no sequence similarity to each other, structural comparisons indicate that the E. coli and yeast enzymes are related to each other and are unrelated to the form found in B. subtilis and Th. thermophilus. A comparison of the combined phyletic pattern of all these forms of chorismate mutase with that of chorismate synthase shows that, with the exception of P. abyssi and in Chlamydia spp., all organisms that produce chorismate are capable of converting it to prephenate.

Prephenate dehydrogenase (EC 1.3.1.12)

Prephenate dehydrogenase TyrA, an enzyme of the tyrosine biosynthesis branch of the pathway, is found in a single form with almost the same phyletic pattern as chorismate mutase (Figure 7.7). The only exception is S. pyogenes that appears not to encode prephenate dehydrogenase.

Figure 7.7

Distribution of phenylalanine and tyrosine biosynthesis enzymes in organisms with completely sequenced genomes. All details are as in Figure 2.7.

Prephenate dehydratase (EC 4.2.1.51)

Prephenate dehydratase, an enzyme of the phenylalanine biosynthesis branch of the pathway, is also represented by a single form in all known organisms. However, its phyletic pattern shows the absence of this enzyme in S. pyogenes, H. pylori, and A. pernix, which are all capable of producing prephenate, suggesting that these organisms either lack phenylalanine biosynthesis or have an alternative form of prephenate dehydratase.

Aromatic aminotransferase (EC 2.6.1.1, 2.6.1.5, 2.6.1.9, 2.6.1.57)

There are several families of pyridoxal-phosphate-dependent aminotransferases that are capable of producing tyrosine and phenylalanine from, respectively, 4-hydroxyphenylpyruvate and phenylpyruvate. Although the best-studied tyrosine aminotransferase, E. coli TyrB, has a relatively narrow phyletic distribution, homologs of histidinol phosphate aminotransferase and aspartate aminotransferase are encoded in every bacterial and archaeal genome except for spirochetes and mycoplasmas. Thus, once phenylpyruvate and 4-hydroxyphenylpyruvate are synthesized, their transamination into, respectively, phenylalanine and tyrosine can be performed by all organisms whose genome sequences are currently available.

A summary on aromatic amino acid biosynthesis

Because aromatic amino acid biosynthesis shares common steps with biosynthesis of ubiquinone, this pathway displays a stunning variety of alternative enzymes catalyzing the same reaction. This makes analysis of their phyletic patterns rather complicated, but, at the same time, allows one to draw some interesting conclusions. Most bacteria and archaea retain the complete set of genes for tryptophan biosynthesis. The exceptions are the obligate archaeal heterotroph P. horikoshii and some obligate bacterial parasites, such as S. pyogenes, rickettsiae, chlamydiae, spirochetes, and mycoplasmas, which apparently obtain tryptophan, just like many other nutrients, from other microbes and from the host, respectively.

Enzymes of the tyrosine biosynthesis pathway are encoded in almost as many complete genomes, with the conspicuous exception of P. abyssi. One could speculate that, while tryptophan is rapidly degraded at 105ºC (the optimal growth temperature of this organism), tyrosine is not, which alleviates the requirement for de novo synthesis. These considerations could also explain the absence of phenylalanine biosynthesis in P. abyssi and A. pernix.

The consistency of the phyletic patterns of the enzymes for the downstream stages of aromatic amino acid biosynthesis underscores the remaining problem with the early stages. Indeed, A. aeolicus and four archaeal species encode 3-dehydroquinate dehydratase and all the downstream enzymes but do not encode either DAHP synthase or 3-dehydroquinate synthase:

It appears that these organisms produce 3-dehydroquinate via a different mechanism, which does not include DAHP as an intermediate. Using the COG phyletic pattern search tool, one could search for orthologous protein sets that are represented in those five genomes but are missing in thermoplasmas, pyrococci, A. pernix, and T. maritima, all of which encode a DAHP synthase and a 3-dehydroquinate synthase. Such a search identified just four COGs, only one of which, COG1465, consisted of uncharacterized proteins. These proteins, orthologs of M. jannaschii MJ1249, can be predicted to function as an alternative 3-dehydroquinate synthases (M.Y.G., unpublished). This prediction seems to be further supported by the adjacency of the genes encoding COG1465 members AF0229 and VNG0310C to the aroC gene in the genomes of A. fulgidus and Halobacterium sp., respectively. However, even if this prediction is correct, the exact nature of the precursor for 3-dehydroquinate and the mechanism of its biosynthesis in these organisms need to be elucidated experimentally.

7.4.2. Arginine biosynthesis

N-Acetylglutamate synthase (EC 2.3.1.1, 2.3.1.35)

The first step in arginine biosynthesis from glutamate is its acetylation, with either acetyl-CoA or acetylornithine utilized as donors of the acetyl group (Figure 7.8). In E. coli and several other organisms, this reaction is catalyzed by the acetyltransferase ArgA, which employs acetyl-CoA as the acetyl donor. In all proteobacteria that encode this enzyme, the argA gene is fused to the gene for N-acetylglutamate kinase, which catalyzes the next step of the pathway. Like in other domain fusion cases, confusion occasionally emerges during genome annotation, especially because the N-terminal kinase domain, which consists of ~300 amino acid residues, can make the C-terminal acetyltransferase domain almost invisible in BLAST outputs (see 4.4.4).

Figure 7.8

Distribution of arginine biosynthesis enzymes in organisms with completely sequenced genomes. All details as in Figure 2.7.

A different, unrelated N-acetylglutamate synthase (N-acetylornithine transferase, the argJ gene product) is present in B. subtilis, yeast, and many other organisms. This enzyme couples acetylation of glutamate with deacetylation of N-acetylornithine, which is the fifth step in arginine biosynthesis. This activity allows recycling of the acetyl group in the arginine biosynthesis pathway.

N-Acetylglutamate kinase (EC 2.7.2.8)

Phosphorylation of N-acetylglutamate is catalyzed by the product of the argB gene, a kinase with the carbamate kinase fold. This enzyme is found in a wide variety of organisms, such that its phyletic pattern is even broader than the combined patterns of both enzymes that generate N-acetylglutamate:

However, N-acetylglutamate kinase is not encoded in the genomes of many parasitic bacteria, such as S. pyogenes, H. influenzae, H. pylori, chlamydiae, rickettsiae, spirochetes, and mycoplasmas.

N-Acetyl-gamma-glutamyl phosphate reductase (EC 1.2.1.38)

The enzyme that catalyzes the next step of the pathway, ArgC, has the same phyletic pattern as ArgB. In fungi, argB and argC genes are fused and encode a single bifunctional protein.

N-Acetylornithine aminotranferase (EC 2.6.1.11)

N-Acetylornithine deacetylase (N-acetylornithinase) belongs to a large family of closely related acetyltransferases (deacetylases), which is represented by two or more paralogs even in the relatively small genomes of H. influenzae, L. lactis, and S. pyogenes. Although proper assignment of substrate specificity in such a case is difficult, if not impossible, the few organisms that produce N-acetylornithine but lack the ArgJ-type N-acetylglutamate synthase offer an ample choice of candidates for the function of N-acetylornithinase.

N-Acetylornithine deacetylase (EC 3.5.1.16)

N-Acetylornithinase belongs to a large family of closely related acetyltransferases (deacylases), which is represented by two or more paralogs even in the relatively small genomes of H. influenzae, L. lactis, and S. pyogenes. Although proper assignment of substrate specificity in such a case is difficult, if not impossible, the few organisms that produce N-acetylornithine but lack the ArgJ-type N-acetylglutamate synthase offer an ample choice of candidates for the role of N-acetylornithinase.

Ornithine carbamoyltransferase (EC 2.1.3.6)

Ornithine carbamoyltransferase catalyzes the sixth step of arginine biosynthesis, conversion of ornithine into citrulline. Carbamoyl phosphate that serves as the second substrate of this reaction is provided by carbamoyl phosphate synthetase, which was discussed above (see 7.2). Ornithine carbamoyltransferase has a much wider phyletic distribution than other enzymes of arginine biosynthesis. This is probably due to the fact that it also catalyzes the reverse reaction, i. e. phosphorolysis of citrulline with the formation of ornithine and carbamoyl phosphate, which is part of the urea cycle. Accordingly, ornithine carbamoyltransferase is found in humans and other higher eukaryotes, which have the urea cycle but are incapable of arginine biosynthesis.

Argininosuccinate synthase (EC 6.3.4.5)

Like ornithine carbamoyltransferase, argininosuccinate synthase participates in the urea cycle. As a result, its phyletic distribution is also wider than that of the enzymes that catalyze early steps of arginine biosynthesis. This enzyme, too, is found in humans and in bacteria, such as H. influenzae, which have all the urea cycle enzymes but lack several enzymes of arginine biosynthesis.

Argininosuccinate lyase (EC 4.3.2.1)

Argininosuccinate lyase, the last enzyme of arginine biosynthesis, splits argininosuccinate into arginine and fumarate. Like the two preceding enzymes, it also participates in the urea cycle, and its phyletic pattern is nearly identical to that of argininosuccinate synthetase.

7.4.3. Histidine biosynthesis

In contrast to the pathways of aromatic amino acid and arginine biosynthesis, histidine biosynthesis exhibits remarkable consistency of the phyletic patterns of all the enzymes involved (Figure 7.9). While the first enzyme of the pathway, phosphoribosylpyrophosphate synthetase (EC 2.7.6.1), also participates in the purine biosynthesis pathway (see above), nearly all the committed enzymes of histidine biosynthesis have the same phyletic pattern, indicating that this pathway is encoded in the great majority of complete prokaryotic genomes sequenced to date. The exceptions are the heterotrophic archaea Thermoplasma spp., Pyrococcus sp., and A. pernix, and parasitic bacteria with small genomes, namely rickettsiae, chlamydiae, spirochetes, and mycoplasmas, as well as S. pyogenes and H. pylori (despite their larger genomes). Remarkably, the aphid symbiont Buchnera sp., which has the second smallest genome available to date, encodes the complete set of histidine biosynthesis enzymes.

Figure 7.9

Distribution of histidine biosynthesis enzymes in organisms with completely sequenced genomes. All details as in Figure 2.7.

There are several deviations from this common pattern. First, phosphoribosyl-ATP pyrophosphatase (EC 3.6.1.31) was not detected in A. fulgidus. Since this organism encodes genes for all other enzymes of histidine biosynthesis, one should assume that this reaction in A. fulgidus is catalyzed by an unrelated pyrophosphatase. Indeed, A. fulgidus genome encodes several predicted pyrophosphatases of unknown specificity (COG1694) that could be good candidates for the role of the missing phosphoribosyl-ATP pyrophosphatase.

Another deviation from the common pattern is the existence of at least two unrelated histidinol phosphatases (Figure 7.9), one of which has been experimentally characterized in E. coli and the other in yeast and B. subtilis [15,500]. The latter form of this enzyme (COG1387) belongs to a large superfamily of PHP-type phosphohydrolases [41], which have common sequence motifs but clearly differ in substrate specificity. A closer inspection of COG1387 shows that proteins from yeast, B. subtilis, B. halodurans, L. lactis, D. radiodurans, and T. maritima, comprise a tight orthologous set and can be confidently predicted to possess histidinol phosphatase activity. Other members of this COG are more distantly related to the experimentally characterized histidinol phosphatases from yeast and B. subtilis and might have other substrates. In addition, both forms of histidinol phosphatase are missing in Halobacterium sp. Therefore it appears likely that there is yet another, so far unrecognized, form of histidinol phosphatase in Halobacterium sp., Thermoplasma spp., and other organisms. There are plenty of unassigned predicted hydrolases that could potentially have this activity.

Remarkably, the ortholog of the E. coli histidinol phosphatase (HisB, COG0241) is encoded in H. pylori, which lacks all the other enzymes of histidine biosynthesis. This protein most likely represents a case of enzyme recruitment and functions as a phosphatase that hydrolyzes some other phosphoester.

7.4.4. Biosynthesis of branched-chain amino acids

To those readers who are already tired of numerous instances of non-orthologous gene displacement in metabolic pathways, biosynthesis of leucine, isoleucine, and valine offers a well-deserved reprieve. In these pathways, the only instance of alternative enzymes catalyzing the same reaction is the last step, amination of α-ketomethylvaleriate, α-ketoisovaleriate, and α-ketoisocaproate. In addition to the branched-chain amino acid aminotransferase IlvE, this reaction can be catalyzed by alternative aminotransferases (Figure 7.10)

Figure 7.10

Distribution of isoleucine/leucine/valine biosynthesis enzymes in organisms with completely sequenced genomes. All details are as in Figure 2.7.

7.4.5. Proline biosynthesis

The best characterized pathway of proline biosynthesis is a three-step chain of reactions (Figure 7.11) that converts glutamate into proline through consecutive action of glutamate kinase (ProB, EC 2.7.2.11), γ-glutamyl phosphate reductase (ProA, EC 1.2.1.41), and Δ-pyrroline-5-carboxylate reductase (ProC, EC 1.5.1.2). This pathway is encoded in yeast, E. coli, B. subtilis, and in many other bacteria, including C. jejuni (but not H. pylori) and T. pallidum (but not B. burgdorferi). This pathway, however, is not detectable in archaea, except for the two species of Methanosarcina, which, in all likelihood, acquired it through HGT. Instead, Halobacterium sp., A. fulgidus, M. thermoautotrophicum, Thermoplasma spp., and A. pernix encode an unusual enzyme, ornithine cyclodeaminase (EC 4.3.1.12), which directly makes proline from ornithine. This enzyme, first discovered in tumor-inducing (Ti) plasmids of A. tumefaciens, was later found in pseudomonads and other bacteria [183,798]. In plants, expression of this interesting enzyme stimulates flowering [850], whereas the mammalian ortholog of this enzyme is expressed in neural tissue, including human retina, and functions as μ-crystallin, a major component of the eye lens in marsupials [438]. Although no such gene was detected in M. jannaschii, this archaeon, too, has been reported to possess ornithine cyclodeaminase activity [309].

Figure 7.11

Distribution of proline biosynthesis enzymes in organisms with completely sequenced genomes.

An interesting aspect of proline metabolism is that its biosynthesis and degradation both proceed through the Δ-pyrroline-5-carboxylate intermediate. As a result, the proline biosynthetic pathway is sometimes confused with proline catabolism. Another complication in the analysis of proline metabolism is that, in E. coli and several other bacteria, the genes for proline dehydrogenase (EC 1.5.99.8) and γ-glutamate semialdehyde dehydrogenase (EC 1.5.1.12), the first and second enzymes of proline catabolism, respectively, are fused, forming the bifunctional protein PutA. In the COG database, these two domains of the PutA protein belong to two different COGs, COG0506 and COG1012 (Figure 7.11).

In conclusion, proline metabolism is tightly interlinked with arginine metabolism. Proline biosynthesis from glutamate can be reconstructed in all organisms with completely sequenced genomes with the exception of pyrococci, H. pylori, B. burgdorferi, chlamydiae, and mycoplasmas. The gene encoding ornithine cyclodeaminase in M. jannaschii [309] remains to be identified. It can be expected to be a member of a different enzyme family, unrelated to the known ornithine cyclodeaminases (COG2423).

7.5. Coenzyme Biosynthesis

7.5.1. Thiamine

Biosynthesis of cofactors (coenzymes), particularly thiamine, is a surprisingly poorly studied area of biochemistry. Although the first thi mutations in E. coli were characterized half a century ago, the complete list of thiamine biosynthesis genes has been determined only in the 1990's [868], and the functions of their products have been characterized only in the last several years [81,83,927]. The scheme for thiamine biosynthesis in Figure 7.12 was drawn using the E. coli data. One cannot help noticing that every enzyme on this chart has its own distinct phyletic pattern. This indicates the abundance of non-orthologous gene displacement cases among thiamine biosynthesis enzymes and suggests that different organisms might use different compounds as thiamin precursors. The apparent absence of ThiC in thermoplasmas, A. pernix, H. influenzae, and H. pylori, all of which encode ThiD (Figure 7.12), is a strong indication that some intermediate other than AIR is used as a precursor in these organisms. Thus, although all steps of the thiamine biosynthesis pathway have been resolved for E. coli [81], there is still ample opportunity for new discoveries in other organisms.

Figure 7.12

Distribution of thiamine biosynthesis enzymes in organisms with completely sequenced genomes. All details are as in Figure 2.7.

7.5.2. Riboflavin

The riboflavin biosynthesis pathway is a challenging case, with three of the seven rib genes characterized in E. coli and B. subtilis having no archaeal orthologs (Figure 7.13). The archaeal variant of riboflavin synthase, the last enzyme of the pathway, has been identified and turned out to be unrelated to the bacterial enzyme [207]. In contrast, the archaeal versions of the first two enzymes of the pathway, GTP cyclohydrolase II (RibA) and pyrimidine deaminase (RibD1), remain unknown, so there is an excellent chance of discovering new enzymes of this pathway.

Figure 7.13

Distribution of riboflavin biosynthesis enzymes in organisms with completely sequenced genomes. All details are as in Figure 2.7.

7.5.3. NAD

Nicotinate mononucleotide adenylyltransferase, the last missing enzyme of the NAD biosynthesis pathway, has been characterized only in 2000, thanks in part to the genome context-based methods [82,563,592]. It turned out that E. coli has two distantly related forms of this enzyme, which shows specificity, respectively, for mononucleotides of nicotinic acid (NadR_2) and nicotinamide (NadD) [279]. Most other organisms encode either one or the other form (Figure 7.14).

Figure 7.14

Distribution of NAD biosynthesis enzymes in organisms with completely sequenced genomes. All details are as in Figure 2.7.

7.5.4. Biotin

As is the case with many other pathways, the initial steps of biotin biosynthesis are poorly understood. The phyletic patterns of the four enzymes that catalyze the conversion of pimeloyl-CoA into biotin are relatively consistent (Figure 7.15), but the mechanisms of the formation of pimelate (6-carboxyhexanoate) and pimeloyl-CoA are still largely obscure. B. subtilis, A. aeolicus, and M. jannaschii encode an enzyme that makes pimeloyl-CoA from pimelate and CoA in a reaction that uses the energy of ATP hydrolysis to AMP and pyrophosphate [675]. In contrast, pimeloyl-CoA synthetase from Pseudomonas mendocina belongs to the family of NDP-forming acyl-CoA synthetases [91,738]. Neither of these two enzyme families is represented in Synechocystis sp., H. influenzae, H. pylori, C. jejuni, and several other bacteria, indicating the existence of yet another enzyme for the synthesis of pimeloyl-CoA (or an entirely different pathway for the formation of 7-keto-8-aminopelargonate).

Figure 7.15

Distribution of biotin biosynthesis enzymes in organisms with completely sequenced genomes. All details are as in Figure 2.7.

In spite of the similarity between the phyletic patterns of BioF, BioA, BioD, and BioB, one cannot help noticing that Synechocystis sp. lacks the bioA gene, suggesting that amination of 7-keto-8-aminopelargonate is catalyzed by a different aminotransferase. The absence of bioD and bioB genes in D. radiodurans makes one wonder whether this bacterium can synthesize biotin at all.

The enzyme catalyzing the last reaction in Figure 7.15, ligation of biotin to the biotin carboxyl carrier protein (or domain), has a much broader phyletic distribution than any of the biotin biosynthesis enzymes. This indicates that A. fulgidus, Halobacterium sp., Pyrococcus spp., L. lactis, S. pyogenes, and many other organisms that do not have a known pathway of biotin synthesis still can utilize biotin. Thus, they either have a completely different, unknown biotin synthesis pathway or import biotin from the environment (however, a biotin transport system so far has not been identified).

The paucity of data on the enzymes of biotin biosynthesis and a putative biotin uptake system should encourage active experimentation in this area. There definitely are novel enzymes and transporters yet to be discovered.

7.5.5. Heme

From the comparative-genomic point of view, the heme biosynthesis pathway is characterized by the following trends (Figure 7.16): (i) with the single exception of uroporphyrinogen III synthase (HemD), the enzymes from R. prowazekii and yeast (mitochondria) have identical phyletic patterns; (ii) all archaea, including the aerobes A. pernix and Halobacterium sp., produce siroheme but not protoheme; (iii) non-orthologous displacement is observed in the downstream steps of the pathway, as opposed to the uniformity of all the upstream steps down to uroporphirinogen III.

Figure 7.16

Distribution of heme biosynthesis enzymes in organisms with completely sequenced genomes. All details are as in Figure 2.7.

7.5.6. Pyridoxine

We conclude our survey of central metabolic pathways with the pyridoxine biosynthesis pathway, which, despite recent efforts, is still not completely understood. The scheme below is drawn based on the E. coli data [198,485]. In other organisms, the carbon backbone of the pyridoxine ring is formed of 4-hydroxythreonine (or its phosphate) and 1-deoxy-D-xylulose (or its phosphate) with the nitrogen supplied by either glutamate (in the PdxAJ-catalyzed reaction), or glutamine (in the PDX1,PDX2-catalyzed reaction) [211,578,635,825,832]. Since 1-deoxy-D-xylulose phosphate synthetase (Dxs, COG1154) so far has been identified only in bacteria, it is possible that archaea and eukaryotes use a different sugar as a pyridoxine precursor. Obviously, new enzymes of this pathway remain to be discovered.

Figure 7.17

Distribution of pyridoxine biosynthesis enzymes in organisms with completely sequenced genomes. All details are as in Figure 2.7.

7.6. Microbial Enzymes as Potential Drug Targets

One of the major incentives behind the genome sequencing of numerous pathogenic bacteria is the desire to better understand their peculiarities and to develop new approaches for controlling human diseases caused by these organisms. This task has become even more urgent with the rapid evolution of antibiotic resistance in many bacterial pathogens, including multidrug-resistant enterococci, pneumococci, pseudomonads, staphylococci, and tuberculosis bacilli. Unfortunately, finding new antibiotics is an extremely laborious process that includes (i) testing numerous compounds for their activity against model organisms (E. coli, P. aeruginosa, S. aureus) that are easy to maintain in culture; (ii) screening these compounds against mammalian cell cultures to eliminate those toxic to humans; (iii) testing the efficacy and safety of each chosen drug in animal models; and (iv) pre-clinical and clinical testing, which alone takes several years. This process ensures that only highly effective and reasonably safe drugs make it to the market. The majority of drug candidates fail the tests, usually because in low concentrations they turn out to be safe but ineffective, whereas at high doses they are effective but show unfavorable side effects.

In spite of what one might have read in popular press, genomics cannot accelerate most steps of the drug development process. What it can do, however, is to increase the success rate by helping to choose drug candidates that are most likely to be effective (being targeted at essential systems of the bacterial cell) and least likely to be toxic (having no targets in the human cell). Indeed, while not all currently used antibiotics have well-characterized targets, those targets that have been characterized comprise bacterial proteins that (i) are essential for bacterial cell metabolism and (ii) are not represented (or represented in a very distinct form) in human cells (see Table 6.2). Microbial genome sequences provide us with complete lists of the proteins encoded in any given pathogen, including all the virulence factors that it could potentially produce. This “parts list” offers a wide selection of potential drug targets.

Comparative analysis of microbial genomes based on the notion of a phyletic pattern, which is discussed throughout this book, allows the identification of gene products that are common to all (or most) pathogenic microorganisms in a chosen group, as well as of those specific for a particular organism. The proteins in the former set are attractive targets for broad-spectrum antibiotics, whereas the unique proteins offer an opportunity to design “magic bullets”, which would specifically target a narrow group of bacteria or even one particular pathogen [266].

Table 7.1

Cellular targets of most commonly used antibiotics.

In addition to the lists of probable essential genes, search for potential drug targets in microbial genomes heavily relies on the understanding of bacterial metabolism, which is briefly discussed above.

7.6.1. Potential targets for broad-spectrum drugs

The list of probable essential genes that potentially could be used as targets for broad-range antibiotics can be derived using more or less the same approach as employed for the delineation of the “minimal genome” ([452], see 2.2.5). Inclusion of certain genes in this list is, of course, affected by non-orthologous gene displacement and enzyme recruitment.

It should be noted that compiling the list of the likely essential genes for each particular group of bacteria (all bacteria, all Gram-positive bacteria, all mycobacteria, and so on) by computational means is only one of several ways to accomplish this task, although, arguably, it is the easiest and fastest one. Any predictions of essentiality for a given gene still have to be verified experimentally by checking the lethality of knockout mutants [9,520]. As mentioned above (see 3.5.1), lists of essential E. coli genes are available at http://www.genome.wisc.edu/resources/essential.htm and http://www.shigen.nig.ac.jp/ecoli/pec/Analyses.jsp?key=0.

In addition to the genes that encode well-characterized essential proteins, the availability of complete genomes allows one to tap into the pool of uncharacterized genes whose wide distribution in microbial genomes marks them as being most likely essential [57,451]. Searches for such genes can be easily performed using the “phyletic patterns search” tool of the COG database. In addition, the COG database contains lists of poorly characterized and uncharacterized protein families, which are listed as functional groups R and S, respectively. A collection of uncharacterized conserved proteins, including those from partially sequenced genomes, is available in PROSITE database (http://www.expasy.org/cgi-bin/lists?upflist.txt).

The diversity of microbial metabolic pathways described above offers numerous possibilities to look for potential drug targets among the metabolic enzymes. One straightforward approach is to select the pathways that are essential for certain pathogens but are absent in humans. Such pathways include murein biosynthesis, the shikimate pathway of aromatic amino acid biosynthesis (Figure 7.6), and the deoxyxylulose (non-mevalonate) pathway of terpenoid biosynthesis. It is remarkable that certain inhibitors of the latter pathway (fosmidomycin, fluoropyruvate, FR-900098) have been studied as potential antibiotics long before the characterization of their cellular targets [400,518].

7.6.2. Potential targets for pathogen-specific drugs

Although current approaches favor “one-shot” antibacterials that can eliminate bacterial infection irrespective of the nature of the pathogen, it gradually becomes clear that we will soon need a variety of drugs that would be effective against selected groups of organisms or even a single pathogen. There is nothing particularly new in this concept: people have been using anti-tuberculine and anti-syphilis drugs for almost a century without requiring them to also cure common cold or gastrointestinal problems.

The novelty stemming from the availability of complete genome sequences is that now it has become possible to analyze the genome of a pathogen in detail, looking for weak spots or unusual enzymes that are likely to be essential for this particular organism. In addition to the traditional drug targets, such as the cell envelope and the systems for DNA replication, transcription, and translation, this brings into play for consideration as potential drug targets such proteins as host interaction factors, transporters for essential nutrients, enzymes of intermediary metabolism, and many others.

Host interaction factors can be searched for by using the so-called “differential genome display”, first proposed by Peer Bork and his colleagues [366,371]. This approach looks for the genes that are present in the genome of a pathogen but not in the genome of a closely related free-living bacterium. Because genomes of parasitic bacteria typically code for fewer proteins than the genomes of their free-living cousins, genes detected by this approach are likely to be important for pathogenicity. Bork and colleagues applied this approach to the identification of potential pathogenicity factors in H. influenzae and H. pylori through comparison of their genomes against E. coli [366,371].

Because many pathogens have reduced biosynthetic capabilities and rely on the host for the supply of certain essential nutrients (see 3.2), the respective membrane transport systems can be valid targets for drug intervention. For example, the still uncharacterized biotin transport system appears to be the only means of biotin acquisition for several pathogens, such as S. pyogenes, R. prowazekii, C. trachomatis, and T. pallidum (see 7.5.4). Actually, one could start probing this hypothetical system right away by using various biotin analogs. As an added benefit, such a study would eventually lead to the identification of the transport system components.

Using surface proteins of bacteria as drug targets has an obvious advantage because drugs interacting with these proteins do not have to cross the cytoplasmic membrane, which largely removes the problem of drug efflux-mediated resistance [523,524]. On the other hand, humans also import biotin, therefore at this stage, it cannot be ruled out that an inhibitor of biotin uptake might be toxic for humans. This emphasizes the need for identification of the genes coding for the bacterial uptake system: once these are known, we will be in a better position to assess the likelihoods of toxic side effects of any drugs targeting this function.

Another approach to searching for pathogen-specific drug targets would rely on the enzymes that are subject to non-orthologous gene displacement and are found in certain pathogens in a different form that is present in humans. The rationale for using enzyme inhibitors as antimicrobial drugs comes from the successful use of sulfamethoxazole and trimethoprim, inhibitors of two different steps of the folate biosynthetic pathway. Indeed, while each of these drugs is only moderately effective against most bacterial pathogens, their combination proved to be effective and reasonably safe. In several instances, detailed analysis of non-orthologous displacement cases has led to suggestions that alternative forms of essential enzymes could be used as drug targets ([266,271], see refs. in Table 7.2).

Table 7.2

Examples of pathogen-specific drug targets.

7.7. Conclusions and Outlook

This chapter shows that central metabolism is the ultimate playground of non-orthologous gene displacement, where the logic of phyletic patterns works best. Metabolic pathways are so amenable to this type of analysis because, if an organism encodes a significant fraction of the enzymes for a particular pathway, it is extremely likely that, in reality, is also has the enzymes for the rest of the steps. Therefore, candidate enzymes for the missing steps may be sought for and, at least in some instances, found among uncharacterized orthologous sets (identified through COGs or otherwise) with phyletic patterns that are, at least in part, complementary to those for known enzymes for the given step. So far, only very few of the computational predictions made by this approach have been tested experimentally, but in those studies that have been conducted, the success rate has been quite high. Conversely, there are enigmatic cases where most of the enzymes of a given pathway are missing in an organism but one or two still stay around (by using this language, we imply loss of a pathway, which is indeed largely the case in parasites and heterotrophs, including ourselves). Most likely, these are cases of exaptation, where an enzyme that is no longer needed in its original metabolic capacity has found another job, thus saving itself from extinction. Elucidation of these exapted functions seems to be an interesting avenue of research.

The finding that metabolic pathways are so prone to non-orthologous gene displacement seems to indirectly convey a message of general biological significance. We know for a fact that enzymes in the same metabolic pathway are connected through reaction intermediates, but, on almost all occasions, precious little is known about the actual macromolecular organization of these enzymes in the cell. Analysis of phyletic patterns shows that many, if not most, metabolic enzymes with different structures but the same reaction chemistry are interchangeable in evolution. This suggests that, most of the time, the chemistry is, after all, the principal aspect of the metabolic functions, whereas the role of co-adaptation of subunits of macromolecular complexes is likely to be limited.

The major contribution of lineage-specific gene loss to the evolution of metabolic pathways is beyond doubt. Horizontal gene transfer is harder to demonstrate but, realistically, it appears certain that this phenomenon also had a substantial role. Indeed, it defies credibility to postulate that LUCA had each one of the alternative forms of metabolic enzymes (and the corresponding reaction intermediates), the existence of which became apparent through the comparative-genomic studies (as well as those, perhaps numerous ones that remain to be discovered). The relative contributions of gene loss and horizontal transfer hopefully will be better understood through the application of algorithmic methods briefly outlined in Chapter 6.

Identification of potential targets for antibacterial drugs using phyletic patterns, the differential genome display technique and other similar approaches is a natural task for comparative genomics and will likely remain one of its most important practical applications for years to come.

7.8. Further Reading

1.: Romano AH, Conway T. Evolution of carbohydrate metabolic pathways. Research in Microbiology. 1996;147:448–455. [PubMed: 9084754]

2.: Galperin MY, Walker DR, Koonin EV. Analogous enzymes: independent inventions in enzyme evolution. Genome Research. 1998;8:779–790. [PubMed: 9724324]

3.: Dandekar T, Schuster S, Snel B, Huynen M, Bork P. Pathway alignment: application to the comparative analysis of glycolytic enzymes. Biochemical Journal. 1999;343:115–124. [PMC free article: PMC1220531] [PubMed: 10493919]

4.: Huynen MA, Dandekar T, Bork P. Variation and evolution of the citric-acid cycle: a genomic perspective. Trends in Microbiology. 1999;7:281–291. [PubMed: 10390638]

5.: Cordwell SJ. Microbial genomes and “missing” enzymes: redefining biochemical pathways. Archives of Microbiology. 1999;172:269–279. [PubMed: 10550468]

6.: Galperin MY, Koonin EV. 2001. Comparative genome analysis. In: Bioinformatics: a practical guide to the analysis of genes and proteins (Baxevanis AD and Ouellette BFF, eds) pp. 359–392. John Wiley & Sons, New York. [PubMed: 11449732]

7.: Canback B, Andersson SG, Kurland CG. The global phylogeny of glycolytic enzymes. Proceedings of the National Academy of Sciences of the United States of America. 2002;99:6097–6102. [PMC free article: PMC122908] [PubMed: 11983902]

8.: Galperin MY, Koonin EV. Searching for drug targets in microbial genomes. Current Opinion in Biotechnology. 1999;10:571–578. [PubMed: 10600691]

Publication Details

Copyright

Publisher

Kluwer Academic, Boston

NLM Citation

Koonin EV, Galperin MY. Sequence - Evolution - Function: Computational Approaches in Comparative Genomics. Boston: Kluwer Academic; 2003. Chapter 7, Evolution of Central Metabolic Pathways: The Playground of Non-Orthologous Gene Displacement.

Antibiotic groups, Examples	Bacterial target	Resistance mechanisms
β-Lactams:
Penicillins, cephalosporins, carbapenems, monobactams	Peptidoglycan transpeptidase, other proteins	Hydrolysis by β-lactamase, alterations in penicillin-binding proteins

Glycopeptides:
Bacitracin, colistin, dactinomycin, teichoplanin, vancomycin, virginiamycin	Peptidoglycan transpeptidase, transglycosylase	Modification of the UDP-muramyl pentapeptide

Aminoglycosides:
Amikacin, kanamycin, gentamycin, hygromycin, neomycin, puromycin, streptomycin, tobramycin	Ribosomal 30S subunit	Acetylation, adenylation, or phosphorylation of the antibiotic by specific modifying enzymes

Tetracyclins:
Doxycycline, methacycline, minocycline, tetracycline	Ribosomal 30S subunit	Export by efflux pumps, mutations

Macrolides:
Azithromycin, dirithromycin, clarithromycin, spiramycin, erythromycin, oleandomycin	Ribosomal 50S subunit	rRNA methylation; rrn, rplD, and rplV mutations; hydrolysis by esterases, export by efflux pumps

Quinolones:
Nalidixic acid, ciprofloxacin	DNA gyrase β-subunit	gyrB mutations

Lincosamides:
Clindamycin, lincomycin	23S rRNA	rRNA methylation, rrn mutations, drug adenylation

Chloramphenicol
	Peptidyl-transferase center on the ribosomal 50S subunit	Inactivation by acetylation, export by efflux pumps

Sulfonamides:
Sulfamethoxazole	Dihydropteroate synthase	folP mutations

Trimethoprim
	Dihydrofolate reductase	folA mutations

Nitroimidazoles:
Metronidazole	Chromosomal DNA	Nitroreductase mutations, preventing drug activation

Rifampin
	RNA polymerase β-subunit	rpoB mutations

Enzymes with limited phyletic distribution	Human pathogens that depend on these enzymes	Ref.
ATP/ADP translocase, bacterial/plant type	R. prowazekii, C. trachomatis, C. pneumoniae	[895]
3-Dehydroquinate dehydratase, class II	C. jejuni, H. influenzae, H. pylori, P. aeruginosa, V. cholerae	[305]
DhnA-type fructose-1,6-bisphosphate aldolase	C. trachomatis, C. pneumoniae	[257]
Lysyl-tRNA synthetase, class I	B. burgdorferi, R. prowazekii, T. pallidum,	[375]
Na-translocating NADH: ubiquinone oxidoreductase	C. trachomatis, C. pneumoniae, Cl. perfringens, T. denticola	[334]
Na-translocating oxalo-acetate decarboxylase	S. pyogenes, T. pallidum,	[334]
Orotidine 5′-phosphate decarboxylase	M. leprae, M. tuberculosis	[12]
Pyridoxine biosynthesis enzymes PDX1, PDX2	Bacillus anthracis, H. influenzae, L. monocytogenes, M. leprae, M. tuberculosis, S. pneumoniae	[263,635]
Cofactor-independent phosphoglycerate mutase	C. jejuni, H. pylori, M. genitalium, P. aeruginosa, V. cholerae	[258,261]