Applied Bioinformatics Group

A   A   A
Home > Publications > Blocked Inverted Indices for Exact Clustering of Large Chemical Spaces

Skip to content. | Skip to navigation

Document Actions

Philipp Thiel, Lisa Sach-Peltason, Christian Ottmann, and Oliver Kohlbacher (2014)

Blocked Inverted Indices for Exact Clustering of Large Chemical Spaces

J. Chem. Inf. Model., 54(9):2395-401.

The calculation of pairwise compound similarities based on fingerprints is one of the fundamental tasks in chemoinformatics. Methods for efficient calculation of compound similarities are of utmost importance for various applications like similarity searching or library clustering. With the increasing size of public compound databases, exact clustering of these databases is desirable, but often computationally prohibitively expensive. We present an optimized inverted index algorithm for the calculation of all pairwise similarities on 2D fingerprints of a given dataset. In contrast to other algorithms it does neither require GPU computing, nor does it yield a stochastic approximation of the clustering. The algorithm has been designed to work well with multicore architectures and shows excellent parallel speedup. As an application example of this algorithm we implemented a deterministic clustering application, which has been designed to decompose virtual libraries comprising tens of millions of compounds in a short time on current hardware. Our results show, that our implementation achieves more than 400 million Tanimoto similarity calculations per second on a common desktop CPU. Deterministic clustering of the available chemical space thus can be done on modern multicore machines within a few days.