Applied Bioinformatics Group


A   A   A
Sections
Home > Publications > Exploiting Physico-Chemical Properties in String Kernels

Skip to content. | Skip to navigation

Exploiting Physico-Chemical Properties in String Kernels


Nora C Toussaint, Christian Widmer, Oliver Kohlbacher, and Gunnar Rätsch

2010

Background: String kernels are commonly used for the classi cation of biological sequences, nucleotide as well as amino acid sequences. Although string kernels are already very powerful, when it comes to amino acids they have a major short coming. They ignore an important piece of information when comparing amino acids: the physico-chemical properties such as size, hydrophobicity, or charge. This information is very valuable, especially when training data is less abundant. There have been only very few approaches so far that aim at combining these two ideas. Results: We propose new string kernels that combine the bene ts of physico-chemical descriptors for amino acids with the ones of string kernels. The bene ts of the proposed kernels are assessed on two problems: MHC-peptide binding classi cation using position speci c kernels and protein classi cation based on the substring spectrum of the sequences. Our experiments demonstrate that the incorporation of amino acid properties in string kernels yields improved performances compared to standard string kernels and to previously proposed non-substring kernels. Conclusions: In summary, the proposed modi cations, in particular the combination with the RBF substring kernel, consistently yield improvements without a ecting the computational complexity. The proposed kernels therefore appear to be the kernels of choice for any protein sequence-based inference. Availability: Data sets, code and additional information will be available from http://www.fml.mpg.de/raetsch/suppl/aask. The developed kernels will be part of the next Shogun toolbox release.

http://www.biomedcentral.com/1471-2105/11/S8/S7

http://www.biomedcentral.com/content/pdf/1471-2105-11-S8-S7.pdf



BMC Bioinformatics

11

Suppl 8

S7

21034432

10.1186/1471-2105-11-S8-S7

October