Applied Bioinformatics Group


A   A   A
Sections
Home > Publications > Graphlet Data Mining of Energetical Interaction Patterns in Protein 3D Structures

Skip to content. | Skip to navigation

Carsten Henneges, Marc Röttig, Oliver Kohlbacher, and Andreas Zell (2010)

Graphlet Data Mining of Energetical Interaction Patterns in Protein 3D Structures

In: Proceedings of the International Conference on Fuzzy Computation and 2nd International Conference on Neural Computation, pp. 190-195, SciTePress.

Interactions between secondary structure elements (SSEs) in the core of proteins are evolutionary conserved and define the overall fold of proteins. They can thus be used to classify protein families. Using a graph representation of SSE interactions and data mining techniques we identify overrepresented graphlets that can be used for protein classification. We find, in total, 627 significant graphlets within the ICGEB Protein Benchmark database (SCOP40mini) and the Super-Secondary Structure database (SSSDB). Based on graphlets, decision trees are able to predict the four SCOP levels and SSSDB (sub)motif classes with a mean Area Under Curve (AUC) better than 0.89 (5-fold CV). Regularized decision trees reveal that for each classification task about 20 graphlets suffice for reliable predictions. Graphlets composed of five secondary structure interactions are most informative. Finally, we find that graphlets can be predicted from secondary structure using decision trees (5-fold CV) with a Matthews Correlation Coefficient (MCC) reaching up to 0.7.