Saturday, July 14, 2018

Searching for new Type I reaction centre proteins in metagenomes

Testing my new-found metagenome-searching skills I decided to look for Type I reaction centre core subunits from Heliobacteria. This is because there are less than a handful of PshA sequences from this fascinating group of organisms, and only one complete and published sequenced genome.

Judging from the massive phylogenetic distance between the PshA core subunit of the reaction centre from Heliobacteria and the next closest relative (the PscA from Chlorobi/Acidobacteria), one must assume that a significant biodiversity should have existed spanning this distance, even if one or the other obtained phototrophy via horizontal gene transfer.

I limited my search to about 2000 metagenomes. I narrowed down my selection to those using in the metagenome title: “microbial dark matter”. I am not sure however if all of these belong to a singular project or if these have come from different/independent labs or projects.

I have always wondered however, if in these humongous datasets there are any novel phototrophs still unknown to science.

I used the PshA sequence from Heliobacterium modesticaldum as query.

The BLAST did not retrieve new sequence from Heliobacteria nor Acidobacteria, but did retrieve quite a few sequences from phototrophic Chlorobi and Cyanobacteria, see the attached figures. No sequences outside the known phyla of phototrophs were found, which is kind of sad. I had great expectations.

PscA from phototrophic Chlorobi
255 complete or almost complete sequences were obtained, which I then used to build a Maximum Likelihood tree. I did not have a look at fragmented sequences.

There was one almost complete sequence of a PsaA subunit from a new strain close to Gloeobacter.

It had 82% sequence identity to the PsaA of G. violaceus and G. kilaueensis. In comparison, the PsaA of these last two share 88% sequence identity. As another point of comparison, the level of sequence identity for PsaA between a red algae, C. merolae, and A. thaliana is 82%.

PsaA, the early branches. ML tree. In bold the metagenome sequnces
At this level of sequence divergence, it should be a new genus/species. I name this strain Protogloeobacter cardonensis. Kidding.

The metagenome where this particular sequence was found is the following:

Hot spring sediment bacterial and archeal communities from British Columbia, Canada, to study Microbial Dark Matter (Phase II) - Larsen N4 metaG (Released on 2016-05-27)

There were also quite a few sequences from the early-branching hot spring Synechococcus type. In addition, a PsaA/PsaB pair for another Gloeomargarita strain and a PsaA/PsaB pair of isoforms of the far-red light acclimation response from a form of Fischerella.

If you want the sequences or would like to see the full tree, let me know.

No comments:

Post a Comment