I wrote "undescribed assumptions" because usually
the papers read really well and describe many of their assumption in ways that
are convincing, but results vary significantly. I've identified a couple of
things that aren't justified, but I don't know if they are reasonable.
Example: It doesn't make sense to me that molecular
evolution rates in chloroplasts should be the same as in free-living
cyanobacteria given the significantly different "environmental"
contexts, including pigments to absorb damaging radiation. Has anyone looked at
this?
You are absolutely right. There
are differences in the rates of evolution between chloroplast and
cyanobacteria, and overall plastid proteins evolve at a faster rate than those
in cyanobacteria. But that is not true for every protein. For example, proteins
involved in information processing (e.g. ribosomal proteins, RNA polymerase) are
evolving significantly faster in plastids. On the other hand, proteins of
bioenergetics and photosynthesis metabolism, like ATP synthase, Rubico large
subunit, the core subunits of the photosystems, are evolving at about similar
rates in cyanobacteria and plastids.
It has to do with the different
evolutionary pressures. The proteins of bioenergetics are under strong purifying
selection (slow rates), but those of information processing have undergone
periods of positive selection (accelerations of the rates) because they had to
be put under the control of the eukaryotic replication/gene
expression/translation systems. I don’t know much about it, but I have now been comparing systematically the rates of evolution between a bunch of these proteins. I am trying to
establish what is a reasonable time for the emergence of the most recent common
ancestor of Cyanobacteria... but of course, not so straight forward.
In our analysis, we used D1. One
of the slowest evolving proteins in all life. We found that there is hardly any difference
in the overall rates of evolution between D1 in all photosynthetic eukaryotes and in
Cyanobacteria. In fact, the G4-D1 that in cyanobacteria is used to do oxygenic
photosynthesis with chlorophyll f have experienced faster rates of evolution
than those in the chloroplast.
That is why we presented Figure
2 in our paper. To try to show that the rates of evolution of D1 and D2 are
quite slow, both in plants and cyanobacteria, and that if it just happens that
cyanobacteria are much older than we anticipate, that would imply even slower
rates, which then would push the duplication that led to D1 and D2 to even
older times.
To give you an idea of how slow
D1 and D2 are evolving... They are evolving slower than the alpha and beta
subunits of ATP synthase. Alpha and beta originated from a gene duplication
event that occurred before the LUCA. D1 and D2 are under tremendous evolutionary pressure, because
they bind so many cofactors and they have to be maintained at the right
orientations, plus they also interact with a bunch of other subunits, and in addition they have to incorporate protection mechanisms. Therefore, when primary endosymbiosis
occurred, this had virtually no effect on the rates of evolution of D1 and D2. Unlike the ribosome for example.
If they do evolve at different rates on average, almost none
of the fossil record calibrations will be effective without a deep dive into
these variations.
I agree 100%! That is something
I am exploring at the moment. In the case of cyanobacteria/chloroplast trees,
calibrations have to be placed on either side of the node you are more
interested in. That is why timing the most recent common ancestor of
cyanobacteria is so difficult. If we only put calibrations on fast evolving
branches, then the dates on the slowest evolving uncalibrated clades will be
overestimated. On the other hand, if we place calibrations on slower evolving
clades, then the rates in those clades that are fast evolving will
be underestimated resulting in older calculated ages.
Therefore, when performing a
molecular clock it is important to maximize calibrations and to put
them strategically. However, the changes in the rates between clades should not
be a big problem. The molecular clock algorithms can cope with differences in
the rates orders of magnitude apart, believe me, I have tested this. But the
only way the software can infer accurate dates, is with the appropriate use of
calibrations.
There is no perfect dataset, and
there is no perfect molecular clock, but we tried to do the best we can. We tried to model every possible scenario. The
point of the paper is not to find out when cyanobacteria originated, but to
find out what is the span of time between the duplication leading to D1 and D2,
and standard Photosystem II (inherited by all cyanobacteria). And we find that
that span of time is likely to be pretty substantial…
Think about this, the origin of
ATP synthase (the duplication leading to alpha and beta subunit) does not
depend on the age of any particular group of bacteria. Same for Photosystem II,
the origin of Photosystem II does not depend on the age of the most recent
common ancestor of cyanobacteria, but it depends on when the duplication that
led to D1 and D2 occurred. And that photosystem, before the duplication, even
if it didn’t oxidize water, was already a pretty special photosystem unlike any
of the known anoxygenic ones.
Example 2: Atm O2 was lower pre-late Ediacaran, so there was
less O3 & more UV. Even more pre-GOE. And w/ more Fe2+ in seawater, more
free radicals are produced from light. How do environmental conditions such as
these affect mutation rates? Different in cyanos vs chloroplasts?
Different for organisms living in different environments?
E.g. Nostoc in super high light vs new cyanos found living in subsurface?
Phormidium living at light limit w/HS-? How do ecological variations feed into
long term mutation accumulation?
From the patterns that I have
seen, it appears that overall, chloroplast proteins (eukaryotes in general) are
evolving faster than cyanobacteria. But as I was mentioning above, the rates of
evolution vary a between proteins. What scientists have tried to do is to
measure the background rates of evolution in non-coding regions of the genome,
and compare them to the coding regions. The change in the ratio of these rates
reflect different evolutionary pressures.
There are no systematic studies
of the changes of the rates of evolution across geological time. Your questions
are super interesting, and it is something that needs to be explored in more detail.
Have a look at the figure below.
That is a comparison of the level of sequence divergence between pairs of
cyanobacteria (a measurement of phylogenetic distance). What you see is a total
of 703 comparisons. And I am plotting that for RpoB (RNA polymerase subunit B)
and for the beta subunit of the ATP synthase. For example, if I compare the
level of sequence identity between beta of Nostoc punctiforme with that of
Chroococcidiopsis thermalis, they’ll be about 10% different. If I compare against
Gloeobacter violaceous it would about 30% different.
The dots in blue are comparing
between heterocystous cyanobacteria, and the orange dot is every comparison
against Gloeobacter, the earliest branching cyano. There is a big scatter but it
follows an overall linear trend, the slope of the trend line is 1.06. It means
that RpoB and beta are evolving at pretty much the same rate across the core
diversity of cyanobacteria.
The figure also shows that the
distance between Gloeobacter and the rest of cyanobacteria is about three times
as great as that among heterocystous cyanobacteria. Then if it can be
established that the rates of evolution across most cyanobcateria follow
approximately uniform patterns we can then be more confident of a time for
their most recent common ancestor. We will only need a good fossil to calibrate
it all.
Let us assume that we have
identified a number of proteins that have evolved at a constant rate across
cyanobacteria (say those in the figure). Now, there was a recent paper showing
fossil heterocystous cyanobacteria in the Tonian period, did you see it? The lower age is 720
Ma. That would imply that the branch leading to Gloeobacter occurred at about
2.1 Ga. If instead we think that heterocystous cyanobacteria appeared about 1.0
Ga, then that would make the branching of Gloeobacter about 3.0 Ga. Molecular
clocks also behave in a similar way depending of course on the calibration
choices.
Example 3: Gene exchange among closely related organisms,
including via viruses. Is it possible that D1 G4 (and assoc genes) evolved in
one sp of cyanos, was better, and was transferred to a bunch of others post GOE
with those who didn't get the transfer dying out?
What I found out in my study of
the evolution of D1, is that G4 is found in all Cyanobacteria, see Figure 1 of our paper. And when you focus on G4 only, it appears to follow a
species tree of cyanobacteria, bear in mind that even D1 G4 have duplicated
several times (e.g. low-light vs high-light forms, the one in the far-red light
gene cluster). Nevertheless, it seems as if at least G4 had mostly been
inherited vertically. That is not to say that horizontal gene transfer has not
occurred, it certainly has occurred, but I don’t think to such an extent that
it would dominate the topology of the tree.
Because of that, then we also
concluded that the atypical D1 forms branched out before the most recent common
ancestor of Cyanobacteria, including the so-called microaerobic forms.
I do think that a post-GOE
ancestor of cyanobacteria is likely an artefact resulting from an overestimation
of the rates of evolution, and I think there are a number of reasons for this. It
turns out however that D1 and D2 are very susceptible to that because they are
so slowly evolving. That is why we focused on the concept of delta-T instead.
We did not focus on trying to
figure out if cyanobacteria occurred after or before the GOE, but on the span
of time between the duplication leading to D1 and D2, and standard PSII. We
concluded therefore that regardless of the exact timing for the MRCA of
cyanobacteria, delta-T will always be very large (1.0 billion years). We also found out that if
delta-T is made to be smaller, the rates of evolution will increase beyond what
is likely for these type of proteins, and quickly enough beyond what is
possible for any kind of protein.
So if the MRCA of cyanobacteria is found to be 2.5 Ga old, I think it would be reasonable to assume that the duplication leading to D1 and D2 occurred about 3.5 Ga... see what I mean?
In any case, I think that most
of the diversity of oxygenic phototrophs that have ever existed actually
predated the MRCA of cyanobacteria. That does not mean that such diversity had
to be abundant or globally distributed though.
Or being present only in environments where they can compete
with relatively ineffective D1s?
I'm not saying I think these necessarily happened. It just
leaves me with the feeling that we are missing something really big and
important in our assumptions.
I agree. Think about this:
There are three gene duplication
events that are exclusive to oxygenic photosynthesis. D1 and D2, the core of
PSII. CP43 and CP47, the core antenna of PSII. And PsaA and PsaB, the core of Photosystem
I.
All cyanobacteria today have a
form of oxygenic photosynthesis that have remained basically unchanged from Gloeobacter to avocados. In fact, most
of the sequence change in the evolution of Photosystem II and Photosystem I
that has ever occurred in the history of life, happened before the MRCA of
cyanobacteria. From the moment those key duplications occurred countless forms
of oxygenic phototrophic bacteria should have appeared spanning all of those
changes that are not accounted for in the known diversity. And given that these enzymes are some of the slowest evolving enzymes
we know of, the roots of oxygenic photosynthesis are likely placed deep in
time... early Archean deep. We are oblivious to such huge diversity. By the time cyanobacteria enters
the scene, when Gloeobacter split
from the rest, oxygenic photosynthesis had already reached a pretty
sophisticated stage.
So yeah, we are missing so much,
in fact, we’re probably missing most of it.