genouest.org/). SOR genes were detected in the three kingdoms of life, and only on chromosomal replicons. Although no N-terminal Evofosfamide price signal sequences were previously described for bacteria SOR [43], we predicted seven SOR to be potentially TAT-secreted (Twin-arginine translocation) in some bacteria, including for example in Desulfovibrio salexigens DSM 2638, Desulfuromonas acetoxidans DSM 684 and Geobacter uraniireducens Rf4. Our analysis confirms
the observations by Pinto et al in 2010 that (1) the repartition of SOR classes does not correlate with organism phylogeny and that (2) sor genes occur in very diverse genetic environments. Indeed, although some sor are clustered with genes encoding electron donors
(such as rubredoxin in D. vulgaris) or inter-related oxidative responsive genes, most are close to functionally unrelated genes. This is consistent with sor genes being acquired, or lost, through lateral gene transfer [41]. Construction and content Collection of SOR For collection of SOR, we have extensively searched the Pubmed database and identified all relevant literature concerning any protein with “”superoxide reductase”" activity; this search resulted in a small buy Blasticidin S dataset (13 SOR published in 12 organisms, see Table 1). We therefore enriched the database using manually curated sequences described as desulfoferrodoxin (160 proteins), superoxide reductase (50 proteins) or neelaredoxin (9 proteins) in EntrezGene and/or GenBank entries. As the “”centre II”" is the tetracosactide active site for the SOR activity, we also included all proteins with a domain of this type as described in InterPro
(Dactolisib mouse IPR002742, IPR004793, IPR004462, IPR012002), Pfam (PF01880, PF06397), Supfam (SSF49367), TIGRfam (TIGR00332, TIGR00320, TIGR00319), NCBI conserved domains (cd03172, cd03171, cd00524, cl00018, cl00014, cd00974) and PRODOM (PD006618, PD330262, PDA2O7Z7, PDA36750, PD985590, PDA36751, PDA63215, PDA7Y161, PDA7Y162, PD511041, PD171746, PD985589, PDA7Y163). All sequences collected were cleaned up to remove redundancy and unrelated proteins. This non-redundant and curated dataset was used to investigate the 1237 complete and 1345 in-draft genomes available in the NCBI database (May, 2010) through a series of successive BlastP [44] and tBlanstN [45] searches. Orthology (KO K05919 and COG2033) and synteny (IMG neighbourhood interface) were also exploited. To be as comprehensive as possible in the data collection, we performed multiple alignments using both ClustalW [46, 47] and Muscle [48] algorithms. These alignments showed highly conserved residues in the sequences of active centre I (CX2CX15CC) and centre II (HX5H-CX2H ). These conversations were translated into “”regular expressions”" that were used to perform for final screening of databases.