Friday, March 13, 2009

Genome's barcodes

It is important to assign short sequence fragments generated by metagenomic studies to original sources (genomes). Zhou et al. (2008) investigated the oligonucleotide (k-mer) frequencies termed 'barcode' for this purpose.

From the barcodes (Figure 1), 4-mer frequency distrubitions are stable along the chromosomes, and phylogenetically closely related species have more similar barcodes than distantly related species. This is completely agreement with previous studies by Karlin and colleagues, using 2-mer frequeinces (dinucleotide relative abundance, termed 'genomic signature').

Concerning plasmids, the author stated that 'The barcodes of all plasmid genomes also tend to have similar characteristics among themselves, possibly due to being under similar selection pressure caused by their frequent transferring among cell cultures.' It isn't clear what the selection pressure is.

One interesting observation is that different classes of genomes (prokaryotes, eukaryotes, plastids, plasmids, and mitochondria) were separated by two features derived from their barcodes (Figure 4). One feature (x-axis) is the average variation of 4-mer frequencies (across a whole genome across all 4-mers), and the other (y-axis) is the overall similarity (in 4-mer frequencies) among all fragments of the genome. Note that the neighboring genomes in this feature space do not necessarily have similar barcodes. Although the feature space clearly separated these five different classes of genomes, biological implications of the separations were not described.

This has inspired us to investigate factors contributing variations in barcodes (oligonucleotide frequencies) among different genomes. The possible factors include restriction site (Abe et al., 2003), synonymous codon usage, amino acid usage, G+C content (Sandberg et al., 2003), and mosaic structure of the genome.

PRIMARY ARTICLE:
Zhou F, Olman V, Xu Y. BMC Bioinformatics. (2008) 9:546. Barcodes for genomes and applications.

ADDITIONAL REFERENCES:
Karlin S, Campbell AM, Mrázek J. Annu Rev Genet. 1998;32:185-225. Comparative DNA analysis across diverse genomes.

Abe T, Kanaya S, Kinouchi M, Ichiba Y, Kozuki T, Ikemura T. Genome Res. 2003 Apr;13(4):693-702. Informatics for unveiling hidden genome signatures.

Sandberg R, Bränden CI, Ernberg I, Cöster J. Gene. 2003 Jun 5;311:35-42. Quantifying the species-specificity in genomic signatures, synonymous codon choice, amino acid usage and G+C content.

Dr. Haruo Suzuki
University of Idaho

No comments: