I was blasting a protein named PsbO, also known as the 'manganese stabilizing
protein' of Photosystem II. This is a protein found in cyanobacteria,
algae, and plants, and it is important in photosynthesis. I was doing
a phylogenetic tree and noted that one of the proteins originated
from the recently sequenced non-photosynthetic bacterium Paenibacillus sp. IHB B 3415. A BLAST showed that the PsbO in this
strain is identical to that in Camellia sinensis, the tea plant.
The chances of horizontal gene transfer from the chloroplast of the tea plant to Paenibacillus, I would say, is pretty close to 0%. So I imagine this is some form of contamination.
It is interesting that some of the investigators involved in the genome project are from Hill Area Tea Science Division, CSIR-Institute of Himalayan Bioresource Technology in Palampur, India.
I'm a little bit concerned. What is the chance of contamination to be present in genome projects? In the case of contamination from Eukaryote DNA into that of a bacterium, I guess it is not such a big deal because it can be easily spotted... but if you have contamination from another strain of bacteria, this might look like horizontal gene transfer and it may be not that simple to differentiate using just bioinformatics.
Update (April 20, 2015)
I contacted GenBank to report the issue, they investigated and this is what they told me:
"The submitter concurs with your assessment, so we have removed the contaminated contig JUEI01000195 from the public record."
Update (April 20, 2015)
I contacted GenBank to report the issue, they investigated and this is what they told me:
"The submitter concurs with your assessment, so we have removed the contaminated contig JUEI01000195 from the public record."
The PsbO protein of Photosystem II |