Wednesday, April 22, 2015

Contamination of genome projects with DNA from other organisms

I was blasting a protein named PsbO, also known as the 'manganese stabilizing protein' of Photosystem II. This is a protein found in cyanobacteria, algae, and plants, and it is important in photosynthesis. I was doing a phylogenetic tree and noted that one of the proteins originated from the recently sequenced non-photosynthetic bacterium Paenibacillus sp. IHB B 3415. A BLAST showed that the PsbO in this strain is identical to that in Camellia sinensis, the tea plant.

The chances of horizontal gene transfer from the chloroplast of the tea plant to Paenibacillus, I would say, is pretty close to 0%. So I imagine this is some form of contamination.

It is interesting that some of the investigators involved in the genome project are from Hill Area Tea Science Division, CSIR-Institute of Himalayan Bioresource Technology in Palampur, India.

I'm a little bit concerned. What is the chance of contamination to be present in genome projects? In the case of contamination from Eukaryote DNA into that of a bacterium, I guess it is not such a big deal because it can be easily spotted... but if you have contamination from another strain of bacteria, this might look like horizontal gene transfer and it may be not that simple to differentiate using just bioinformatics.

Update (April 20, 2015)
I contacted GenBank to report the issue, they investigated and this is what they told me:

"The submitter concurs with your assessment, so we have removed the contaminated contig JUEI01000195 from the public record." 

manganese stabilizing protein msp
The PsbO protein of Photosystem II

Monday, April 13, 2015

The unusual D1 protein of Microcystis auriginosa TAIHU98

My coworkers and I recently published an article studying the phylogeny of all D1 subunits found in cyanobacteria. That study gave us some insight into the evolution of Photosystem II and water oxidation.

In that dataset I noticed a D1 that was quite aberrant present only in the genome of Microcystis aeruginosa TAIHU98. The genome of this strain was published in 2013.

D1 proteins are characterized by 5 transmembrane helices (A to E), between helices C and D, there is a parallel alpha helix, denominated CD (see below).

photosynthesis cyanobacteria D1 protein photosystem
D1 protein from Thermosynechococcus vulcanus, PDB ID: 3WU2.
This anomalous D1 from M. aeruginosa TAIHU98 is predicted to have only 4 transmembrane helices. There is also what appears to be a 54 amino acids long 'sequence swap', in such a way that the original sequence has changed for a new sequence with no homology to any other protein known (as determined by BLASTing the 54 amino acid unique sequence).

Sequence alignment of a normal D1 and the unusual D1 in Microcystis aeruginosa TAIHU98 (click in the image to see larger). In blue I have highlighted the usual transmembrane helices, in red I highlighted the unique sequence in Microcystis. In purple I highlighted Y161, H191 and the ligands to the manganese cluster.
This unique sequence cut the second transmembrane helix (B) in half and the third transmembrane helix (C) disappeared completely. Instead, it is predicted using the TMHMM tool and the ΔG prediction server, that a brand new transmembrane helix exists in this protein made from a bit of the new sequence insertion and from the CD parallel helix. This becomes then the second helix.

Transmembrane helix prediction using TMHMM 2.0 for the weird D1 in M. aeruginosa TAIHU98

Transmembrane helix prediction using TMHMM 2.0 for a normal D1 (PsbA1 from T. elongatus BP-1)
If this D1 is inserted into the membrane then the last two transmembrane helices should be inverted in comparison with normal D1, due to the absence of helix B.

The 54 amino acid sequence swap also eliminated the redox tyrosine Y161 and the high-affinity manganese binding site, D170. All other ligands to the manganese cluster have remained unchanged.

The 'sequence swap' is actually caused by three nucleotide insertions into the psbA gene, the first one causes a frame shift, and the third insertion takes it back to normal, see the image below.

Three insertions into the psbA gene of this Mycrocystis strain (Query) caused the 'sequence swap' in the D1 protein, in comparison with the D1 from T. elongatus. The insertions are highlighted in purple.
Taking this into consideration it is unlikely that this sequence is incorporated into Photosystem II, but who knows really.

A curious thing is that in this 54 amino acid sequence there are 7 cysteines. Is this a sign that this sequence has a new function as a Fe-S protein?

In my published phylogeny of D1, this sequence clustered with other D1 from Microcystis and does not have an ancient origin, suggesting that these radical alterations occurred in this particular strain of Microcystis only.

This implies that the gain of new protein functions and the drastic redesign of proteins could evolved really fast.