Friday, December 14, 2018

Evolution of the CP43 and CP47 antenna proteins of Photosystem II and the link to water oxidation

In our recent paper in Geobiology we made a strong case for the process of water oxidation to oxygen having originated before the duplication leading to D1 and D2.

Article Early Archean origin of Photosystem II

As you may know by now (if you follow my posts or work), the core of Photosystem II is not just made of D1 and D2, but these also have an intimate relationship with the antenna proteins CP43 and CP47. Why is it intimate? Because the CP43 binds the Mn4CaO5 cluster together with D1.

CP43-E354 coordinates two Mn atoms, and CP43-R357 offers a hydrogen bond to one of the Mn-bridging oxygen atoms and it is within 4 Å from the calcium in the cluster.

We have seen now that D2 does not bind a cluster but instead a number of phenylalanine residues seem to replace the ligands and block access to Mn and water. What is remarkable is that CP47 also reaches within D2, as if to provide ligands to a long-gone cluster, but instead it inserts a few phenylalanine residues: one of them within less than 4 Å of the redox tyrosine, YD. Have a look at Figure 7H in the paper.

How? Why? What does this mean? Does it mean that in the homodimeric Photosystem II, before the D1/D2 duplication, the water-oxidising cluster was also coordinated by the antenna domain? Like CP43 does today?

When the crystal structure of the homodimeric Type I reaction centre of heliobacteria was released in 2017, I found a Ca2+ bound to the place where the Mn4CaO5 cluster would be, and these Ca2+-binding sites had a number of structural similarities with the water-oxidising cluster that I thought could not possibly be just coincidence. In particularly, the fact that the putative Ca2+-binding site interacted with the antenna domain in a manner similar to Photosostem II.

I discussed this in an early and hasty version of a manuscript that I should be submitting for publication soon. Have a look:

Working Paper Origin of water oxidation at the divergence of Type I and Ty...

Funnily enough, Prof. Bob Blankenship said in a news article that he didn't believe it. Well, he should believed it, because I'm right! :D haha

https://www.quantamagazine.org/simple-bacteria-offer-clues-to-the-origins-of-photosynthesis-20171017/

I jest.

Anyways, I have now taken a closer look at the antenna's extrinsic domains. And I found something AMAZING.

Have a look at the attached figure with the structural comparisons.


A, B, and C, are the antenna of heliobacteria, CP43, and CP47 respectively. In four different views. In grey you see the transmembrane helices and in colours the extrinsic domain between the 5th and 6th helices. In panel D you can see a schematic view.

I have split the extrinsic domain of CP43 into three bits: EF2, EF3, and EF1.

EF1 is retained in all Type I reaction centres (except PsaA and PsaB) and in CP43 and CP47.

EF3 binds the manganese cluster in CP43. This EF3 region is also found in CP47, but it is at a different location! A change of place occurred!

There is sequence identity in all of the matching domains once they are compared to each other.

Have a look at the attached alignment comparing only the EF3. Sequence identity is unambiguous.


The green arrows indicate the positions where EF3/EF4 are “inserted” in both subunits.

The two residues at homologous positions in the CP47-EF3 region bind a calcium! Yeah, that is right! They bind a calcium!

CP43-E354 is CP47-E435, and CP43-R357 is CP47-N438 as shown in the figure. The Ca2+ is not found in the CP47 of photosynthetic eukaryotes (I did not see it in the structure of the red algae PSII). Except perhaps for the PSII of Cyanophora paradoxa and relatives: early-branching algae.

In CP47, EF1 which in heliobacteria binds the Ca2+, interacts with the CP47-N438 via K332.

The phenylalanine residues that in CP47 insert themselves into D2, are found in the region marked as EF4, which does not exists in CP43.

The level of sequence identity between CP43 and CP47 is about 20%. But this falls to virtually 0% in the extrinsic domain if these are compared in their current order. If you remove EF4, and align the homologous bits together, the sequence identity is back to 20%! Unbelievable.

You might think that 20% overall sequence identity is too low, but the level of sequence identity between the alpha and beta subunits of ATP synthase is also 20%. Just to give you context.

You might think that the CP43 and CP47 have evolved very fast… the opposite is true. Currently after D1 and D2, the second slowest evolving reaction centre subunits are the CP43 and CP47, evolving even slower than ATP synthase today (unpublished data).

All in all it means that EF2, EF3, and EF1 were already present at the moment of duplication!

Given that EF4 only exists in CP47, we can then argue that this was not present before duplication, and therefore the phenylalanine residues that today get inserted into D2 and interact with YD could not have been in the homodimer. So the D2 and CP47 phenylalanine patch could not have been the ancestral state, as it is of course obvious from everything we discussed in the Geobiology paper and what had been described by Bill Rutherford and Wolfgang Nitschke in the 90s (see references in the paper).

Given that EF3 is found in both CP43 and CP47, and that CP43-E354 is conserved as CP47-E435, and similar for position CP43-357 (CP47-438), and given that they still bind something (manganese/calcium), we can then argue that these residues were also available for metal-binding before duplication.

It is consistent with a homodimer photosystem, with clusters on both sides, and with ligands from the antenna. It also strengthens the notion that the Ca2+-binding site in the homodimeric Type I reaction centre is a real thing, and that the structural divergence of Type I and Type II reaction centres is indeed linked to the evolution of the Mn4CaO5 cluster and water oxidation to oxygen.

What this means you can read here:

Article Photosystem II is a Chimera of Reaction Centers

And here:

Preprint Thinking Twice about the Evolution of Photosynthesis

I think that originally manganese and water oxidation started with the help of a small domain similar to that in heliobacteria. A metal-binding site exposed to the media and soluble ions. Once manganese oxidation and an early version of water oxidation got started, the extrinsic domain in the ancestral protein to CP43 and CP47 then increased in complexity, evolving EF2 and EF3 in a drive to provide proton and water channels, to shield the cluster, and to provide a site of interaction with extrinsic polypeptides.

Then the swap of position of EF3 and the evolution of EF4 in the ancestral CP47 contributed to heterodimerization and the loss of water oxidation in D2.

This happens immediatly after the divergence of Type I and Type II reaction centres LONG before the most recent common ancestor of Cyanobacteria.

Did you know that at the gene level, the N-terminus of the CP43 gene overlaps with the C-terminus of the D2 gene contributing a few additional amino-acids to the latter? This is a trait shared by most cyanobacteria, including the earliest branching, and explains how D2 lost the ligands to the cluster located at the C-terminus.

Beautiful, just beautiful.

Sunday, December 2, 2018

Early Archean origin of Photosystem II: materials for the press office


An integral part of research is outreach and dissemination. I like my papers to be accompanied with a press release, if possible, to make it more visible to the public. Sometimes, what I do is send some materials to the press officer in our faculty and request if a press release can be written on that.

Below you find those materials, which I think could help some interested readers digest some of the information in the paper. This is the official press release from the college: https://www.imperial.ac.uk/news/189232/oxygen-could-have-been-available-life/

This is our recent paper: Early Archean origin of Photosystem II

Summary of the paper

The problem
When or how oxygenic photosynthesis originated remains controversial. Understanding how and when oxygenic photosynthesis emerged is fundamental to understand how life has evolved through the long history of the planet. For example, it is important to understand when oxygen was available to life for the first time. Oxygen permitted the evolution of aerobic respiration, which is the main energetic process that powers most life on Earth and it is essential to sustain the complexity of animals and humans. It is also important to understand the probability of complex life evolving in other solar systems. For example, if oxygenic photosynthesis is a very difficult process to evolve, then the probability of complex life emerging in a distant exoplanet may be very low.

The controversy is the result of the difficulty of unequivocally and unambiguously detecting oxygen in the rock record or figuring out when the first oxygen-producers evolved for the first time.

The older the rocks, the rarer they are, and the harder it is to prove conclusively that any fossil microbes found in these ancient rocks used or produced any amount of oxygen.

Today, the oldest known oxygen-producers are called cyanobacteria. These bacteria became the chloroplast of algae and plants, but all cyanobacteria that we know of use a very sophisticated form of oxygenic photosynthesis. So figuring out when cyanobacteria originated does not really tell us when oxygenic photosynthesis appeared for the first time, but only tells us when a very sophisticated form of oxygenic photosynthesis was already possible.

Therefore, it cannot tell us when oxygenic photosynthesis really got started and what ancestral forms of oxygenic photosynthesis looked like.

What we did
To overcome this difficulties, we studied the evolution of Photosystem II, nature’s solar panels that use the energy of light to break water molecules into its components, protons, electrons, and oxygen. Then, if we can understand when and how Photosystem II evolved the capacity to oxidize water, then we may have a better idea of when and how oxygenic photosynthesis got started, even before there was enough oxygen in the planet to leave a trace in the rock record.

The core of Photosystem II is made of two evolutionarily related proteins: called D1 and D2, which originated from a gene duplication. D1 and D2 are very similar to each other at a structural level but they differ at the basic sequence level, at the amino acid level, or in other words: they look the same but the basic building blocks have changed. Today D1 and D2 share 30% of the amino acid sequence identity. That means that from the approximately 350 building blocks that make D1 and D2, slightly over a hundred are perfectly identical between D1 and D2, but at some point in time they were 100% identical.

Fortunately, the function and structure of Photosystem II has been studied in great detail, so we can tell from what D1 and D2 look like, and from the remaining ~100 identical building blocks, that before the duplication that allowed the evolution of D1 and D2, water oxidation was possible.
Oxygen is a very reactive molecule: that is why it is so important to life because it can drive many chemical reactions that are essential to life. Oxygen can also react with chlorophyll leading to the formation of what is called reactive oxygen species. These reactive forms of oxygen are very toxic to life. So all photosynthetic organisms have evolved mechanism to protect against reactive oxygen species and to prevent oxygen molecules from interacting with chlorophyll. By comparing D1 and D2 we can also tell that before the duplication, the ancestral Photosystem II had already evolved mechanisms to protect against damage caused by oxygen.

What needed to be done now is to find out the span of time between the duplication event (when D1 and D2 were 100% identical) to the ancestor of all cyanobacteria, which inherited a standard sophisticated Photosystem II (when D1 and D2 had left only about 30% identical building blocks).
To do that we need to find out how fast D1 and D2 are changing: that is, the rate of evolution. We can find out using a technique called Bayesian relaxed molecular clock analysis. The method uses the power of statistics and known events in the evolution of photosynthetic organisms from the fossil record to calculate the rates of change.

The results
We found out that D1 and D2 are evolving at a very slow rate. The rate is so slow that it would take about 8 billion years for two identical D1 sequences today to become indistinguishable from each other in the future. For example, we know that the ancestor of flowering plants and most algae is more than 1 billion years old, but if I compare D1 in an algae and D1 in the banana tree, they will be about 87% identical. So in more than 1 billion years of evolution out of approximately 350 building blocks, less than 50 have changed in all plants and algae. If you compare the D1 in all flowering plants, which appeared around the time of the dinosaurs, they’ll be over 98% identical: that is less than 10 changes in more than 130 million years!

It is not strange at all that Photosystem II evolve so slowly: all complex enzymes that can be traced to the earliest forms of life evolve at similar rates. Because they fulfil important functions most changes are likely to result in a worst enzyme than a better enzyme, so most mutations are naturally wiped out. That is why we can tell that all life on Earth originated from a single origin, because many of the enzymes important for function have evolved at a really slow pace so that even after 4 billion years of evolution, they still look the same and work in similar ways in all groups of life.

We found out that because D1 and D2 are evolving so slowly, the span of time between the duplication and the ancestor of cyanobacteria is likely to be over a billion years or more! We cannot tell however with perfect exactitude when the ancestor of cyanobacteria appeared for the first time, but if it existed about 2.5 billion years ago, then the duplication could have easily occurred more than 3.5 billion years ago. The important discovery is that it does not matter when the ancestor of cyanobacteria appeared, because the span of time between the duplication (the dawn of oxygenic photosynthesis) and this ancestor will always be very large.

Another amazing thing we discovered is that even when the span of time is one billion years, the rate of change at the moment of duplication had to be about 40 times greater than the observed rates in the past 2.0 billion years. Forty times the current speed of change is about the limit of what is possible for molecular machines of such level of complexity. In fact, it is already above any measured rate for these kind of complex, highly conserved, molecular machines. Then, knowing that, we can calculate that if this gap of time were to be smaller, the rate at the duplication would have to be faster, and quickly enough the rates would be so large that they would be outside the realms of biology.

Imagine a car going from Paris to Berlin, a journey of about 1000 km, it would take about 10 hours to drive such distance at about 100 km per hour. If we want to arrive in 5 hours, we would need to drive at about twice the speed, but if we want to arrive in 1 hour, we would need to go at 10 times the speed, at almost the speed of sound. Not possible even for the fastest Formula 1 car. It is the same for the speed of evolution.

This is also important because it tells us in a very straightforward manner that evolutionary scenarios in which oxygenic photosynthesis originated very quickly before the ancestor of cyanobacteria can be ruled out with confidence. Even if we don’t know when exactly cyanobacteria originated.

The bigger picture
The main implications of the paper is that oxygen was available to life long before it started to accumulate in the air at about 2.4 billion years ago. This is in agreement with current geological data that suggests that whiffs of oxygen or localized accumulations of oxygen were possible before 3.0 billion years ago.

There has been debates on whether aerobic respiration evolved before or after cyanobacteria, and therefore before or after oxygenic photosynthesis. This is because the enzymes used for aerobic respiration appear to be much older than cyanobacteria. But how can aerobic respiration have evolved before oxygen was available to life? In the absence of oxygenic photosynthesis it is expected that the amount of oxygen available to life would be virtually negligible. So scientist have had to come up with convoluted scenarios to explain this. Our data help understand how this is possible, because oxygenic photosynthesis likely got started long before the ancestor of cyanobacteria. Today oxygenic photosynthesis is only found in cyanobacteria, but our data suggests that it is likely that many other forms of microbes that today do not do photosynthesis may have had old ancestors with the capacity to split-water using light.

In fact, recent data hints to the possibility that oxygen was important for the development of the genetic code, and reconstructions of the genetic capabilities of the earliest forms of life always retrieve enzymes to protect against reactive forms of oxygen, but the latter are usually dismissed as artefacts or anomalies. Our work can help understand how this is actually possible, because the older cyanobacteria is found to be, the more likely it is that oxygenic photosynthesis started at the earliest stages in the history of life and soon after the earliest forms of photosynthesis.

What’s next
We are trying now to bring back to life what the ancestral photosystem before the duplication looked like using a method called Ancestral Sequence Reconstruction. This is a well-established method that allows us to predict the basic building blocks of the ancestral enzyme using the known variation across all extant species. We cannot travel back in time to 3.0 billion years ago, but we can make the ancestral enzyme travel from the distant past into our test tube in the lab today.

Because the enzyme is evolving so slowly its structure has not change too much since its origin, what has changed is the particular building blocks along the different positions of the preserved structure. That makes it very suitable system for Ancestral Sequence Reconstruction, or targeted site-directed mutagenesis, although that does not mean it is easy. Nevertheless, we have now modified strains of cyanobacteria expressing some of the ancestral genes and we will soon attempt to validate our predictions experimentally. This is a three year-project funded by the Leverhulme Trust.

Thursday, November 15, 2018

Answer to Dawn Summer's comments and questions regarding the evolution of oxygenic photosynthesis

Regarding our paper published recently in Geobiology, titled "Early Archean origin of Photosystem II"

I wrote "undescribed assumptions" because usually the papers read really well and describe many of their assumption in ways that are convincing, but results vary significantly. I've identified a couple of things that aren't justified, but I don't know if they are reasonable.
Example: It doesn't make sense to me that molecular evolution rates in chloroplasts should be the same as in free-living cyanobacteria given the significantly different "environmental" contexts, including pigments to absorb damaging radiation. Has anyone looked at this?

You are absolutely right. There are differences in the rates of evolution between chloroplast and cyanobacteria, and overall plastid proteins evolve at a faster rate than those in cyanobacteria. But that is not true for every protein. For example, proteins involved in information processing (e.g. ribosomal proteins, RNA polymerase) are evolving significantly faster in plastids. On the other hand, proteins of bioenergetics and photosynthesis metabolism, like ATP synthase, Rubico large subunit, the core subunits of the photosystems, are evolving at about similar rates in cyanobacteria and plastids.

It has to do with the different evolutionary pressures. The proteins of bioenergetics are under strong purifying selection (slow rates), but those of information processing have undergone periods of positive selection (accelerations of the rates) because they had to be put under the control of the eukaryotic replication/gene expression/translation systems. I don’t know much about it, but I have now been comparing systematically the rates of evolution between a bunch of these proteins. I am trying to establish what is a reasonable time for the emergence of the most recent common ancestor of Cyanobacteria... but of course, not so straight forward.

In our analysis, we used D1. One of the slowest evolving proteins in all life. We found that there is hardly any difference in the overall rates of evolution between D1 in all photosynthetic eukaryotes and in Cyanobacteria. In fact, the G4-D1 that in cyanobacteria is used to do oxygenic photosynthesis with chlorophyll f have experienced faster rates of evolution than those in the chloroplast.

That is why we presented Figure 2 in our paper. To try to show that the rates of evolution of D1 and D2 are quite slow, both in plants and cyanobacteria, and that if it just happens that cyanobacteria are much older than we anticipate, that would imply even slower rates, which then would push the duplication that led to D1 and D2 to even older times.

To give you an idea of how slow D1 and D2 are evolving... They are evolving slower than the alpha and beta subunits of ATP synthase. Alpha and beta originated from a gene duplication event that occurred before the LUCA. D1 and D2 are under tremendous evolutionary pressure, because they bind so many cofactors and they have to be maintained at the right orientations, plus they also interact with a bunch of other subunits, and in addition they have to incorporate protection mechanisms. Therefore, when primary endosymbiosis occurred, this had virtually no effect on the rates of evolution of D1 and D2. Unlike the ribosome for example.

If they do evolve at different rates on average, almost none of the fossil record calibrations will be effective without a deep dive into these variations.

I agree 100%! That is something I am exploring at the moment. In the case of cyanobacteria/chloroplast trees, calibrations have to be placed on either side of the node you are more interested in. That is why timing the most recent common ancestor of cyanobacteria is so difficult. If we only put calibrations on fast evolving branches, then the dates on the slowest evolving uncalibrated clades will be overestimated. On the other hand, if we place calibrations on slower evolving clades, then the rates in those clades that are fast evolving will be underestimated resulting in older calculated ages.

Therefore, when performing a molecular clock it is important to maximize calibrations and to put them strategically. However, the changes in the rates between clades should not be a big problem. The molecular clock algorithms can cope with differences in the rates orders of magnitude apart, believe me, I have tested this. But the only way the software can infer accurate dates, is with the appropriate use of calibrations.

There is no perfect dataset, and there is no perfect molecular clock, but we tried to do the best we can. We tried to model every possible scenario. The point of the paper is not to find out when cyanobacteria originated, but to find out what is the span of time between the duplication leading to D1 and D2, and standard Photosystem II (inherited by all cyanobacteria). And we find that that span of time is likely to be pretty substantial…

Think about this, the origin of ATP synthase (the duplication leading to alpha and beta subunit) does not depend on the age of any particular group of bacteria. Same for Photosystem II, the origin of Photosystem II does not depend on the age of the most recent common ancestor of cyanobacteria, but it depends on when the duplication that led to D1 and D2 occurred. And that photosystem, before the duplication, even if it didn’t oxidize water, was already a pretty special photosystem unlike any of the known anoxygenic ones.

Example 2: Atm O2 was lower pre-late Ediacaran, so there was less O3 & more UV. Even more pre-GOE. And w/ more Fe2+ in seawater, more free radicals are produced from light. How do environmental conditions such as these affect mutation rates? Different in cyanos vs chloroplasts?
Different for organisms living in different environments? E.g. Nostoc in super high light vs new cyanos found living in subsurface? Phormidium living at light limit w/HS-? How do ecological variations feed into long term mutation accumulation?

From the patterns that I have seen, it appears that overall, chloroplast proteins (eukaryotes in general) are evolving faster than cyanobacteria. But as I was mentioning above, the rates of evolution vary a  between proteins. What scientists have tried to do is to measure the background rates of evolution in non-coding regions of the genome, and compare them to the coding regions. The change in the ratio of these rates reflect different evolutionary pressures.

There are no systematic studies of the changes of the rates of evolution across geological time. Your questions are super interesting, and it is something that needs to be explored in more detail.

Have a look at the figure below. That is a comparison of the level of sequence divergence between pairs of cyanobacteria (a measurement of phylogenetic distance). What you see is a total of 703 comparisons. And I am plotting that for RpoB (RNA polymerase subunit B) and for the beta subunit of the ATP synthase. For example, if I compare the level of sequence identity between beta of Nostoc punctiforme with that of Chroococcidiopsis thermalis, they’ll be about 10% different. If I compare against Gloeobacter violaceous it would about 30% different.


The dots in blue are comparing between heterocystous cyanobacteria, and the orange dot is every comparison against Gloeobacter, the earliest branching cyano. There is a big scatter but it follows an overall linear trend, the slope of the trend line is 1.06. It means that RpoB and beta are evolving at pretty much the same rate across the core diversity of cyanobacteria.

The figure also shows that the distance between Gloeobacter and the rest of cyanobacteria is about three times as great as that among heterocystous cyanobacteria. Then if it can be established that the rates of evolution across most cyanobcateria follow approximately uniform patterns we can then be more confident of a time for their most recent common ancestor. We will only need a good fossil to calibrate it all.

Let us assume that we have identified a number of proteins that have evolved at a constant rate across cyanobacteria (say those in the figure). Now, there was a recent paper showing fossil heterocystous cyanobacteria in the Tonian period, did you see it? The lower age is 720 Ma. That would imply that the branch leading to Gloeobacter occurred at about 2.1 Ga. If instead we think that heterocystous cyanobacteria appeared about 1.0 Ga, then that would make the branching of Gloeobacter about 3.0 Ga. Molecular clocks  also behave in a similar way depending of course on the calibration choices.

Example 3: Gene exchange among closely related organisms, including via viruses. Is it possible that D1 G4 (and assoc genes) evolved in one sp of cyanos, was better, and was transferred to a bunch of others post GOE with those who didn't get the transfer dying out?

What I found out in my study of the evolution of D1, is that G4 is found in all Cyanobacteria, see Figure 1 of our paper. And when you focus on G4 only, it appears to follow a species tree of cyanobacteria, bear in mind that even D1 G4 have duplicated several times (e.g. low-light vs high-light forms, the one in the far-red light gene cluster). Nevertheless, it seems as if at least G4 had mostly been inherited vertically. That is not to say that horizontal gene transfer has not occurred, it certainly has occurred, but I don’t think to such an extent that it would dominate the topology of the tree.

Because of that, then we also concluded that the atypical D1 forms branched out before the most recent common ancestor of Cyanobacteria, including the so-called microaerobic forms.

I do think that a post-GOE ancestor of cyanobacteria is likely an artefact resulting from an overestimation of the rates of evolution, and I think there are a number of reasons for this. It turns out however that D1 and D2 are very susceptible to that because they are so slowly evolving. That is why we focused on the concept of delta-T instead.

We did not focus on trying to figure out if cyanobacteria occurred after or before the GOE, but on the span of time between the duplication leading to D1 and D2, and standard PSII. We concluded therefore that regardless of the exact timing for the MRCA of cyanobacteria, delta-T will always be very large (1.0 billion years). We also found out that if delta-T is made to be smaller, the rates of evolution will increase beyond what is likely for these type of proteins, and quickly enough beyond what is possible for any kind of protein.

So if the MRCA of cyanobacteria is found to be 2.5 Ga old, I think it would be reasonable to assume that the duplication leading to D1 and D2 occurred about 3.5 Ga... see what I mean?

In any case, I think that most of the diversity of oxygenic phototrophs that have ever existed actually predated the MRCA of cyanobacteria. That does not mean that such diversity had to be abundant or globally distributed though.

Or being present only in environments where they can compete with relatively ineffective D1s?
I'm not saying I think these necessarily happened. It just leaves me with the feeling that we are missing something really big and important in our assumptions.

I agree. Think about this:

There are three gene duplication events that are exclusive to oxygenic photosynthesis. D1 and D2, the core of PSII. CP43 and CP47, the core antenna of PSII. And PsaA and PsaB, the core of Photosystem I.

All cyanobacteria today have a form of oxygenic photosynthesis that have remained basically unchanged from Gloeobacter to avocados. In fact, most of the sequence change in the evolution of Photosystem II and Photosystem I that has ever occurred in the history of life, happened before the MRCA of cyanobacteria. From the moment those key duplications occurred countless forms of oxygenic phototrophic bacteria should have appeared spanning all of those changes that are not accounted for in the known diversity. And given that these enzymes are some of the slowest evolving enzymes we know of, the roots of oxygenic photosynthesis are likely placed deep in time... early Archean deep. We are oblivious to such huge diversity. By the time cyanobacteria enters the scene, when Gloeobacter split from the rest, oxygenic photosynthesis had already reached a pretty sophisticated stage.

So yeah, we are missing so much, in fact, we’re probably missing most of it.

Saturday, October 27, 2018

On the evolution of chlorophyll synthesis, methanogenesis, and nitrogen fixation. Was the ancestor of Bacteria photosynthetic?

Weiss et al. recently attempted to reconstruct the proteome of the Last Universal Common Ancestor (Weiss et al. 2016). One of the aspects that puzzled me the most about that work is that they suggested the LUCA was methanogenic and a nitrogen fixer, but there was absolutely no mention of photosynthesis.
There are three major groups of nitrogenase-like enzymes known. 1) The ones used for the synthesis of Ni-tetrapyrroles in methanogenic Archaea, 2) the ones used in Mg-tetrapyrroles in photosynthetic bacteria, and 3) proper nitrogenases. It is also likely that there are many uncharacterized enzymes belonging to this superfamily of proteins with unknown functions.
Under orthodox views on the evolution of photosynthesis it can be argued that those used in photosynthesis evolved from nitrogenases or those used in methanogenesis, but the phylogeny of these enzymes is inconsistent with that. It is inconsistent in such a way that the chlorophyll-synthesis enzymes do not seem to emerge from nitroegnases or those used in methanogenesis. What it is actually seen in their phylogeny is a deep divergence between the methanogenesis enzymes and those in photosynthesis: with nitrogenases being closer to the methanogenesis Ni-tetrapyrrole-synthesis enzymes. Therefore, there is a deep split between Bacteria/photosynthesis and Archaea/methanogenesis.
This is the thing, that if nitrogenase and the Ni-tetrapyrrole enzyme share a more recent common ancestor, and these were found in the LUCA, then the LUCA must have had also protochlorophyllide reductase. This may be hard to grasp, but it derives from the phylogenetic relationships of these enzymes. Phylogenetics 101. This is because the branch leading to protochlorophyllide reductase, in this case ChlL, should have diverged before the nitrogenase homolog (NifH) and the Ni-tetrapyrrole one (CbfC) had time to split.
The other way we can see this is that LUCA had the ancestral enzyme to these, and that their specialization occurred later. But there is no evidence that actually suggests the ancestor of these enzymes was more likely to be involved in methanogenesis or nitrogen fixation than photosynthesis.
What is more, I know that these enzymes have enough sequence identity to make it to the threshold of their analysis. And that is why I was puzzled, because I expected some of the chlorophyll synthesis enzymes to show up in their analysis, but apparently didn't...
As I was browsing through all 355 trees from the Weiss et al. work, I found this little gem! See image or the attached newick tree file.

That is indeed protochlorophyllide reductase subunit L splitting away from a NifH-like enzyme in methanogenic Archaea, which is probably a misannotated CbfC. Exactly as it should have been, and so it confirms that my puzzlement was not due to my lack of understanding of the evolution of these proteins.
The phylogeney on the bacteria part of the tree (red) matches perfectly that of ChlL and contains only phototrophs. It is indisputably ChlL and does not represent bacterial nitrogenases.
What does this mean? Well, if the authors work is informative in any way, it would mean that the split of the CbfC and ChlL is a Bacteria/Archaea split, which would make the ancestor to all bacteria photosynthetic. Nevertheless, that photosynthesis originated in the most recent common ancestor of all bacteria, or soon after that, is supported by the evolution of the photosynthetic reaction centres, as I concluded in my first review on the subject (Cardona 2015).
The reason this is not more widely understood or accepted is simply due to prevalent ideas on the evolution of photosynthesis, which are somewhat outdated and are now based more on speculation and personal opinions than on any real data... Something that I am working really hard to change.
If you want to have a look at the data by Weiss et al., have a look at this: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007518#sec011
The tree in the image is numbered 2020 in the supplements.
References:
Cardona, T. (2015). "A fresh look at the evolution and diversification of photochemical reaction centers." Photosynthesis research 126(1): 111-134. DOI: 10.1007/s11120-014-0065-x.
Weiss, M. C., F. L. Sousa, N. Mrnjavac, S. Neukirchen, M. Roettger, S. Nelson-Sathi and W. F. Martin (2016). "The physiology and habitat of the last universal common ancestor." Nat Microbiol 1(9): 16116. DOI: 10.1038/nmicrobiol.2016.116.

Wednesday, September 19, 2018

Unified view for the evolution of oxygenic photosynthesis

I thought it would be a good idea to create a plot that outlines my perspective on the evolution of photosynthesis. It is based on my research and those by others. Please, keep on reading if you're interested. Make sure to open the attached figure.

So, when did oxygenic photosynthesis originated? Counterintuitively, the answer to this question does not depend so much on when Cyanobacteria originated. Oxygenic photosynthesis and Cyanobacteria are not strictly the same thing. It actually depends on when photosynthesis and the first reaction centres originated for the first time.

The y axis is time. On the right, we see a species tree of bacteria cantered around the diversity of Cyanobacteria. On the left side, we see a tree of reaction centre proteins.

Let’s focus first on the species tree. It has been recently suggested that the most recent common ancestor (MRCA) of Cyanobacteria capable of oxygenic photosynthesis postdate the Great Oxidation Event (GOE). Shih et al. (2017) suggested that the age of this ancestor is about 2.0 Ga. Similar results had been obtained before in other molecular clock studies, but remained unnoticed, see for example David and Alm (2011). These results have also been reproduced in newer analyses, see for example: (Marin et al. 2017, Betts et al. 2018). There is a real possibility that the MRCA of Cyanobacteria predated the GOE, see for example: (Sanchez-Baracaldo 2015, Sanchez-Baracaldo et al. 2017, Magnabosco et al. 2018). So, let’s not take sides and consider all possibilities.

Cyanobacteria, in the classic sense, are more closely related to the Melainabacteria and Sericytochromatia (Soo et al. 2017). But if we zoom out, the Cyanobacteria/Melainabcateria/Sericytochormatia (CMS) supergroup is thought to be contained within a much larger group that includes Chloroflexi, Actinobacteria, Firmicutes, and the Deinococcus-type. This larger group has been called Terrabacteria by some. I have seen many phylogenomic analysis that puts Cyanobacteria and Chloroflexi as each other’s closest relatives. The estimated time for the Cyanobacteria and Chloroflexi split has been calculated to have occurred about ~3.0 Ga ago by David and Alm (2011) and about ~2.7 Ga ago by Marin et al. (2017). Marin et al. (2017) also timed the MRCA of Terrabacteria at about 2.9 Ga.

Nevertheless, there are phylogenomic trees that put the branch leading to Cyanobacteria very basally in the tree of life of bacteria. See for example: (Hug et al. 2016, Yokono et al. 2018). Also, see the recent tree by Betts et al. (2018). This does not necessarily imply that the MRCA of Cyanobacteria is deeply branching with respect to other bacteria. However, I think overall, the “Terrabacteria” grouping has been reported more often than a basal CMS supergroup for example. Keep this in mind.

Now let’s have a look at the evolution of reaction centres.

Cyanobacteria are characterised by having Photosystem II and Photosystem I. PSII has a heterodimeric core made of the homologous subunits D1 and D2. This core is associated with the homologous core antenna proteins CP43 and CP47. PSI has a heterodimeric core made of the homologous subunits PsaA and PsaB.

The level of sequence identity (distance) between D1 and D2 in ALL cyanobacteria is just under 30%. Between CP43 and CP47 is just under 20%, and between PsaA and PsaB is just under 45%. In other words, the phylogenetic distance between D1 and D2, CP43 and CP47, and between PsaA and PsaB is very large.

The thing about Cyanobacteria is that they all inherited “standard” photosystems. That is to say, that the most recent common ancestor of Cyanobacteria already had photosystems with divergent heterodimeric cores. So, the duplication events leading to D1 and D2, CP43 and CP47, and PsaA and PsaB happened before the MRCA of Cyanobacteria (nodes marked red).

I have attempted to gain an understanding of the evolution of reaction centre proteins as a function of time. I have done that by comparing the levels of sequence identity and by applying molecular clocks under a wide range of evolutionary scenarios (Cardona 2016, Cardona 2018, Cardona et al. 2018).

What I have found is that the gene duplication event leading to D1 and D2, marked as D0 in the tree, is likely to have occurred more than 1 billion years before the MRCA of Cyanobacteria!

It sounds crazy, but it is not crazy at all. It is actually rather straight forward. We’re just not used to think this way about the evolution of photosynthesis. Don’t panic!

In this particular example, the span of time between the D0 duplication event and the MRCA of Cyanobacteria is called ΔT, see the figure.

The large ΔT is due to two facts of life that are pretty unambiguous. 1) The phylogenetic distance between D1 and D2 is VERY LARGE. 2) The rates of evolution of D1 and D2 are VERY SLOW. Therefore, it takes a very long time to span the distance between the D0 duplication event and the MRCA of Cyanobacteria. This is also true for the CP43/CP47 and the PsaA/PsaB duplications.

The rates of evolution of D1 and D2 are very slow, but these rates are not unusual in any way. The rates are just like those in any other highly conserved protein of bioenergetics involved in complex functions. Absolutely nothing peculiar about that. Have a look at Table 3 in Cardona et al. (2018), we have studied the rates of evolution of D1 and D2 in great detail and compared them to those of other proteins.

What is key however, is that the ancestral protein to D1 and D2, D0, likely made a photosystem that was capable of oxidizing water to oxygen or was well on its way towards the origin of water oxidation chemistry. Given the shared conserved traits between D1 and D2 we have a pretty solid idea of what D0 photosystem was capable of doing… and a photosystem made of D0 was not like other anoxygenic Type II reaction centres. That is for sure.

So the roots of oxygenic photosynthesis go deep. I find this conclusion inescapable.

I also found that the rate of evolution of L and M is about 5 times greater than D1 and D2. It appears as if D1 and D2 are actually the slowest evolving reaction centre proteins of all. This means that PSII is the most likely reaction centre to have retained ancestral traits. Counterintuitively as it seems, it is rather evident when you compare the structures of the photosystems… starting from the fact that like Type I reaction centres PSII has retained core antenna proteins and the core peripheral chlorophylls of D1 and D2. What is more, the position of the redox tyrosine residues is located at the ancestral entry point of electrons, as it is the case in homodimeric Type I reaction centres.

I have tried to time the duplication leading to PsaA and PsaB as well (Cardona 2018), which is widely accepted to have occurred after the origin of oxygenic photosynthesis (Ben-Shem et al. 2004, Hohmann-Marriott and Blankenship 2008, Rutherford et al. 2012). It turns out that PsaA and PsaB are also evolving quite slowly, only slightly faster than D1 and D2, in such a way that the PsaA and PsaB duplication likely occurred long before the MRCA of Cyanobacteria too. It is expected that the duplication leading to CP43 and CP47 occurred simultaneously with the duplication of D1 and D2, as they make part of the same complex. The distance between CP43 and CP47 and their rates of evolution agrees with this.

The position of CP43 and CP47 in the tree of reaction centres is not well defined yet. That is why I have put the branch with dashes. That is the position that I think is better supported and has more explanatory power… but other scenarios are possible, all with interesting repercussions. I am currently working on a paper about this.

These three duplications that are unique to oxygenic photosynthesis are more likely to have occurred closer to the origin of reaction centre proteins than closer to the GOE, or after the GOE. Strong arguments supporting the premise that these duplications were driven by the optimisation of water oxidation and the evolution of photoprotective mechanisms to avoid the production of reactive oxygen species can be made. Such arguments can be applied to the initial divergence of anoxygenic and oxygenic specific reaction centre proteins (blue nodes, question marks), see for example (Orf et al. 2018). No matter how you look at it, water oxidation to oxygen likely started well before 3.0 Ga (blue wavy line), which is indeed supported by some geochemical evidence (Planavsky et al. 2014, Satkoski et al. 2015, Havig et al. 2017, Wang et al. 2018).

So how old are these duplications and initial divergences? As you can see in the plot, this depends on how old photosynthesis is. The older reaction centres are, the older the origin of water oxidation chemistry. If we assume that Cyanobacteria is much older than the GOE, then that would make the rates of evolution of reaction centre proteins even slower, which then would push the initial duplications specific to oxygenic photosynthesis (red nodes) even closer to the origin of reaction centres. This is a consequence of the two facts mentioned above, long distance and slow rates.

The origin of oxygenic photosynthesis started in an ancestor of Cyanobacteria… but this ancestor could have been the ancestor of a much greater diversity that could include other Terrabacteria, if that affiliation holds true. Betts et al. (2018) suggested that the MRCA of bacteria is only about 3.4 Ga old. David and Alm (2011) also suggested that the expansion of diversity in bacteria started about 3.4 Ga ago, peaking about 3.2 Ga ago.

I understand that the evidence for photosynthesis at 3.5 Ga (traditionally considered to be anoxygenic) is pretty strong. As far as I understand, the possibility that photosynthesis originated prior to 3.8 Ga cannot be ruled out yet (Rosing 1999, Rosing and Frei 2004, Czaja et al. 2013, Nisbet and Fowler 2014, Butterfield 2015).

Therefore, connect the dots.

If you have questions don’t hesitate to leave a comment.


References
Ben-Shem, A., F. Frolow and N. Nelson (2004). "Evolution of photosystem I - from symmetry through pseudosymmetry to asymmetry." FEBS Lett 564(3): 274-280. DOI: 10.1016/S0014-5793(04)00360-6.

Betts, H. C., M. N. Puttick, J. W. Clark, T. A. Williams, P. C. J. Donoghue and D. Pisani (2018). "Integrated genomic and fossil evidence illuminates life’s early evolution and eukaryote origin." Nature Ecology & Evolution. DOI: 10.1038/s41559-018-0644-x.

Butterfield, N. J. (2015). "Proterozoic photosynthesis - a critical review." Palaeontology 58(6): 953-972. DOI: 10.1111/pala.12211.

Cardona, T. (2016). "Reconstructing the origin of oxygenic photosynthesis: Do assembly and photoactivation recapitulate evolution?" Front Plant Sci 7: 257. DOI: 10.3389/fpls.2016.00257.

Cardona, T. (2018). "Early Archean origin of heterodimeric Photosystem I." Heliyon 4(3): e00548. DOI: 10.1016/j.heliyon.2018.e00548.

Cardona, T., P. Sanchez-Baracaldo, A. W. Rutherford and A. W. D. Larkum (2018). "Early Archean origin of Photosystem II." BioRxiv 109447. DOI: https://doi.org/10.1101/109447.

Czaja, A. D., C. M. Johnson, B. L. Beard, E. E. Roden, W. Q. Li and S. Moorbath (2013). "Biological Fe oxidation controlled deposition of banded iron formation in the ca. 3770 Ma Isua Supracrustal Belt (West Greenland)." Earth and Planetary Science Letters 363: 192-203. DOI: 10.1016/j.epsl.2012.12.025.

David, L. A. and E. J. Alm (2011). "Rapid evolutionary innovation during an Archaean genetic expansion." Nature 469(7328): 93-96. DOI: 10.1038/Nature09649.

Havig, J. R., T. L. Hamilton, A. Bachan and L. R. Kump (2017). "Sulfur and carbon isotopic evidence for metabolic pathway evolution and a four-stepped Earth system progression across the Archean and Paleoproterozoic." Earth-Sci Rev 174: 1-21. DOI: https://doi.org/10.1016/j.earscirev.2017.06.014.

Hohmann-Marriott, M. F. and R. E. Blankenship (2008). Anoxygenic Type-I photosystems and evolution of photosynthetic reaction centers. Photosynthetic Protein Complexes. P. Fromme, Wiley-VCH Verlag GmbH & Co. KGaA: 295-324.

Hug, L. A., B. J. Baker, K. Anantharaman, C. T. Brown, A. J. Probst, C. J. Castelle, C. N. Butterfield, A. W. Hernsdorf, Y. Amano, K. Ise, Y. Suzuki, N. Dudek, D. A. Relman, K. M. Finstad, R. Amundson, B. C. Thomas and J. F. Banfield (2016). "A new view of the tree of life." Nat Microbiol 1: 16048. DOI: 10.1038/nmicrobiol.2016.48.

Magnabosco, C., K. R. Moore, J. M. Wolfe and G. P. Fournier (2018). "Dating phototropic microbial lineages with reticulate gene histories." Geobiology. DOI: 10.1111/gbi.12273.

Marin, J., F. U. Battistuzzi, A. C. Brown and S. B. Hedges (2017). "The timetree of prokaryotes: New insights into their evolution and speciation." Mol Biol Evol 34: 437-446. DOI: 10.1093/molbev/msw245.

Nisbet, E. G. and C. F. R. Fowler (2014). The early history of life. Treatise on Geochemistry. K. D. M. and W. H. Schlesinger. Amsterdam, Elsevier Science. 10: 1-42.

Orf, G. S., C. Gisriel and K. E. Redding (2018). "Evolution of photosynthetic reaction centers: insights from the structure of the heliobacterial reaction center." Photosynthesis research. DOI: 10.1007/s11120-018-0503-2.

Planavsky, N. J., D. Asael, A. Hofmann, C. T. Reinhard, S. V. Lalonde, A. Knudsen, X. Wang, F. Ossa Ossa, E. Pecoits, A. J. B. Smith, N. J. Beukes, A. Bekker, T. M. Johnson, K. O. Konhauser, T. W. Lyons and O. J. Rouxel (2014). "Evidence for oxygenic photosynthesis half a billion years before the Great Oxidation Event." Nat Geosci 7(4): 283-286. DOI: 10.1038/ngeo2122.

Rosing, M. T. (1999). "C-13-depleted carbon microparticles in > 3700-Ma sea-floor sedimentary rocks from west Greenland." Science 283(5402): 674-676. DOI: 10.1126/science.283.5402.674.

Rosing, M. T. and R. Frei (2004). "U-rich Archaean sea-floor sediments from Greenland - indications of > 3700 Ma oxygenic photosynthesis." Earth and Planetary Science Letters 217(3-4): 237-244. DOI: 10.1016/S0012-821x(03)00609-5.

Rutherford, A. W., A. Osyczka and F. Rappaport (2012). "Back-reactions, short-circuits, leaks and other energy wasteful reactions in biological electron transfer: redox tuning to survive life in O2." FEBS Lett 586(5): 603-616. DOI: 10.1016/j.febslet.2011.12.039.

Sanchez-Baracaldo, P. (2015). "Origin of marine planktonic cyanobacteria." Sci Rep 5: 17418. DOI: 10.1038/srep17418.

Sanchez-Baracaldo, P., J. A. Raven, D. Pisani and A. H. Knoll (2017). "Early photosynthetic eukaryotes inhabited low-salinity habitats." Proc Natl Acad Sci USA. DOI: 10.1073/pnas.1620089114.

Satkoski, A. M., N. J. Beukes, W. Li, B. L. Beard and C. M. Johnson (2015). "A redox-stratified ocean 3.2 billion years ago." Earth and Planetary Science Letters 430: 43-53.

Shih, P. M., J. Hemp, L. M. Ward, N. J. Matzke and W. W. Fischer (2017). "Crown group Oxyphotobacteria postdate the rise of oxygen." Geobiology 15(1): 19-29. DOI: 10.1111/gbi.12200.

Soo, R. M., J. Hemp, D. H. Parks, W. W. Fischer and P. Hugenholtz (2017). "On the origins of oxygenic photosynthesis and aerobic respiration in Cyanobacteria." Science 355(6332): 1436-1440. DOI: 10.1126/science.aal3794.

Wang, X. L., N. J. Planavsky, A. Hofmann, E. E. Saupe, B. P. De Corte, P. Philippot, S. V. LaLonde, N. E. Jemison, H. J. Zou, F. O. Ossa, K. Rybacki, N. Alfimova, M. J. Larson, H. Tsikos, P. W. Fralick, T. M. Johnson, A. C. Knudsen, C. T. Reinhard and K. O. Konhauser (2018). "A Mesoarchean shift in uranium isotope systematics." Geochim Cosmochim Ac 238: 438-452. DOI: 10.1016/j.gca.2018.07.024.

Yokono, M., S. Satoh and A. Tanaka (2018). "Comparative analyses of whole-genome protein sequences from multiple organisms." Sci Rep 8. DOI:10.1038/s41598-018-25090-8.

Saturday, July 14, 2018

Searching for new Type I reaction centre proteins in metagenomes

Testing my new-found metagenome-searching skills I decided to look for Type I reaction centre core subunits from Heliobacteria. This is because there are less than a handful of PshA sequences from this fascinating group of organisms, and only one complete and published sequenced genome.

Judging from the massive phylogenetic distance between the PshA core subunit of the reaction centre from Heliobacteria and the next closest relative (the PscA from Chlorobi/Acidobacteria), one must assume that a significant biodiversity should have existed spanning this distance, even if one or the other obtained phototrophy via horizontal gene transfer.

I limited my search to about 2000 metagenomes. I narrowed down my selection to those using in the metagenome title: “microbial dark matter”. I am not sure however if all of these belong to a singular project or if these have come from different/independent labs or projects.

I have always wondered however, if in these humongous datasets there are any novel phototrophs still unknown to science.

I used the PshA sequence from Heliobacterium modesticaldum as query.

The BLAST did not retrieve new sequence from Heliobacteria nor Acidobacteria, but did retrieve quite a few sequences from phototrophic Chlorobi and Cyanobacteria, see the attached figures. No sequences outside the known phyla of phototrophs were found, which is kind of sad. I had great expectations.

PscA from phototrophic Chlorobi
255 complete or almost complete sequences were obtained, which I then used to build a Maximum Likelihood tree. I did not have a look at fragmented sequences.

There was one almost complete sequence of a PsaA subunit from a new strain close to Gloeobacter.

It had 82% sequence identity to the PsaA of G. violaceus and G. kilaueensis. In comparison, the PsaA of these last two share 88% sequence identity. As another point of comparison, the level of sequence identity for PsaA between a red algae, C. merolae, and A. thaliana is 82%.

PsaA, the early branches. ML tree. In bold the metagenome sequnces
At this level of sequence divergence, it should be a new genus/species. I name this strain Protogloeobacter cardonensis. Kidding.

The metagenome where this particular sequence was found is the following:

Hot spring sediment bacterial and archeal communities from British Columbia, Canada, to study Microbial Dark Matter (Phase II) - Larsen N4 metaG (Released on 2016-05-27)

There were also quite a few sequences from the early-branching hot spring Synechococcus type. In addition, a PsaA/PsaB pair for another Gloeomargarita strain and a PsaA/PsaB pair of isoforms of the far-red light acclimation response from a form of Fischerella.

If you want the sequences or would like to see the full tree, let me know.

Friday, July 6, 2018

The atypical D1 sequence of Gloeobacter kilaueensis: looking for another one in metagenomes

The evolution of D1 proteins is complicated. It is characterized by many gene duplication events occurring at every taxonomic level. Some of these duplications could potentially predate the most recent common ancestor of all described cyanobacteria.
See our previous work on this:
Some of the earliest duplications, we suggested, gave rise to the atypical D1 forms, of which we have described three forms. What I have called Group 0, Group 1, and Group 2 D1.
Group 0 is made of a single sequence, found exclusively in the genome of Gloeobacter kilaueensisG. kilaueensis has additionally 5 standard D1 forms. There may be a D1 fragment encoded in the genome of the early branching Synechococcus sp. PCC 7336, have a look at this:
Group 1 is the super-rogue D1 also known as chlorophyll f synthase (or PsbA4).
Group 2 is the rogue D1: function unknown/unconfirmed.
A recent preprint by Grettenberger et al., described a new type of early branching cyanobacteria, which was named Aurora. The genome of this cyanobacterium was assembled from a metagenome of a microbial mat found in lake Vanda in Antarctica. It is more than 90% complete. This strain seems to be distantly related to Gloeobacter. As far as I understand, it is not clear however if this strain is an early-branching cyanobacterium sister to Gloeobacter, or whether it predates Gloeobacter, being therefore a sister branch to all described cyanobacteria.
This is the preprint:
Aurura vandensis has a PSII with a subunit composition similar to that of Gloeobacter. Only one D1 was reported in the preprint, and this is a standard form of D1, a Group 4.
Excited by this, I wondered if I could find another Group 0 sequence in the available metagenomes. Another G0, similar to that from G. kilaueensis.
So, I did a BLAST to all JGI environmental metagenomes: these were a total of 12361. I left out metagenomes categorized as “engineered” or “host-associated”.
To do a BLAST in so many metagenomes directly on the JGI site, it is necessary to split the data into sets of maximum 500 metagenomes. That gives 25 sets of metagenomes that needed to be BLASTed.
My query sequence was the very atypical G0 sequence from G. kilaueensis.
In the first set I obtained more than 30000 hits, which must include D1, D2, L, and M subunits; both complete and partial sequences. The cut-off E-value was 1e-5.
None of the 25 sets produced a sequence similar to the G0 sequence. Nothing close to it. The closest identity was 54%, usually to other standard forms of D1. No sequence alignment included the C-terminus, which is kind of special in the G0 sequence. Some of the metagenome sets gave a top hit to super-rogue D1 sequences, but the level of sequence identity between G0 and the other atypical forms is also just over 50%. This makes sense if the phylogenetic tree that we published in the paper above is correct, as it would imply that the G0 sequence is as close to the other atypical sequences, as it is to the standard forms of D1.
This is because we suggested based on the phylogeny of D1, that Group 1 to Group 4 would make a monophyletic group to the exclusion of the G0 sequence. But, phylogenetic trees are susceptible to artifacts, so having more G0 sequences could potentially improve the D1 phylogeny.
Each search for each of the metagenome sets produced more than 30k hits: that means that I could have obtained more than 750k hits in these 12361 metagenomes! But not a second G0 sequence?
I have to say that I did not examine every sequence in detail (of course)… waaay too many. So there may have been a partial sequence close to G0 that did not score high due to its very short length. If there was another G. kialueensis somewhere else I would have expected at least some identical sequences, but nothing at all!
I thought that Gloeobacter was not that uncommon after all:
Would anyone be interested in repeating this search? :)
This is the link to the G0 sequence: https://www.ncbi.nlm.nih.gov/protein/AGY58976.1
Now, with the recent eruption of Kilauea this unique strain of Gloeobacter may have just gone extinct.