Sunday, July 13, 2008

Reticulate classification of mobile genetic elements

“Reticulate representation of evolutionary and functional relationships between phage genomes.” by Lima-Mendez et al. (2008)

In this paper, the authors note that mobile genetic elements (MGE) in prokaryotes (such as phages, plasmids, conjugative transposons, and genomic islands) show mosaic structures, indicating the importance of horizontal gene exchange in their evolution. These elements represent unique combinations of modules, each of them with a different phylogenetic history. The traditional classification schemes cannot be applied to these genetic elements in part due to the intrinsic inability of tree-based methods to efficiently deal with mosaicism.

To solve the problem, Lima-Mendez et al. (2008) proposed a framework for a reticulate classification of phages based on gene content; i.e., presence (1) or absence (0) of protein family. First, the authors built a graph, where nodes represent phages and lines represent similarities in gene content between phages. Then, the authors applied a two-step clustering [Markov clustering (MCL) and fuzzy clustering] to this graph to generate a reticulate classification of phages: each phage is represented by a membership vector, which quantitatively characterizes its membership in the set of clusters. Phages within the same MCL cluster are likely descendant from a unique module combination, and one phage could belong to several clusters (Lawrence et al. 2002); for example, phage lambda belongs almost equally to two different clusters. Lima-Mendez et al. (2008) stated that “The weight of the intracluster connections represents 79% of the total weight of the connections of the network. This number can be taken as a rough estimate of the contribution of vertical evolution in this network. However, phages from different MCL clusters may be also be related through vertical evolution, but they might have diverged so much that sequence similarities are no longer recognizable or only some [evolutionary cohesive] modules may have been vertically inherited, whereas others have been replaced through horizontal gene transfer.” Thus, it is still difficult to estimate the contribution of different evolutionary events (i.e., vertical and horizontal gene transfer). Kunin and Ouzounis (2003) suggested a framework for the inference of presence or absence of individual protein families at any node on a phylogenetic tree, and assumed that: (1) A protein family shared by most of the clade members would be vertically transmitted; (2) If a protein family is present in most of the descendants of a particular ancestor, but is not found in some subclade, the observed gene absence would normally result from gene loss; and (3) A protein family interspersed across distantly related clades would be horizontally transferred. This assumption cannot detect horizontal gene transfer (HGT) among closely related species, as is true for most methods used to identify HGT (those based on phylogenetic information and compositional features). However, it is well recognized that phage and plasmid transfers (and consequently HGT) should be more likely among closely related species than among distantly related species.

Phylogenetic profiles have been widely applied to bacterial genomes to predict functional links between proteins on the assumption that proteins interacting in metabolic pathways or physical structure would be required to co-occur in genomes (Pellegrini et al. 1999). Lima-Mendez et al. (2008) clustered genes based on their “phylogenetic profiles” to define “evolutionary cohesive modules.” In virulent phages, evolutionary modules span several functional categories, whereas in temperate phages they correspond better to functional modules, suggesting that the phylogenetic profile method does not work well at predicting protein function in virulent phages. The Lima-Mendez analysis reminds us that we must be careful to consider the total context of the MGE, and not only the genome content.

The Lima-Mendez analysis was implemented using Network Analysis Tools (NeAT) (Brohée et al. 2008), available at http://rsat.ulb.ac.be/rsat/index_neat.html.

PRIMARY ARTICLE:
Lima-Mendez G, Van Helden J, Toussaint A, Leplae R. Mol Biol Evol. (2008) 25:762-77. Reticulate representation of evolutionary and functional relationships between phage genomes.

ADDITIONAL REFERENCES:
Lawrence JG, Hatfull GF, Hendrix RW. J Bacteriol. (2002) 184:4891-905. Imbroglios of viral taxonomy: genetic exchange and failings of phenetic approaches.

Kunin V, Ouzounis CA. Bioinformatics. (2003) 19:1412-6. GeneTRACE-reconstruction of gene content of ancestral species.

Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Proc Natl Acad Sci U S A. (1999) 96:4285-8. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.

Brohée S, Faust K, Lima-Mendez G, Sand O, Janky R, Vanderstocken G, Deville Y, van Helden J. Nucleic Acids Res. (2008) 36(Web Server issue):W444-51. NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways.

Dr. Haruo Suzuki
University of Idaho

No comments: