Publication | Open Access
Open-source platform to benchmark fingerprints for ligand-based virtual screening
381
Citations
37
References
2013
Year
Similarity‑search methods using molecular fingerprints are widely used for ligand‑based virtual screening, but the performance of the many available fingerprints depends heavily on the validation data sets and similarity measures, and systematic comparison is hampered by the lack of standard benchmarks. This work introduces a standard platform for benchmarking 2D fingerprints. The open‑source platform provides source code, structural data drawn from three public collections, and random query molecule lists to enable statistically valid comparisons of methods. Using the platform, the authors benchmarked 12 standard fingerprints and two baselines across seven evaluation methods, finding high correlations among fingerprints and among evaluation methods, with only the two baselines differing significantly from the others.
Similarity-search methods using molecular fingerprints are an important tool for ligand-based virtual screening. A huge variety of fingerprints exist and their performance, usually assessed in retrospective benchmarking studies using data sets with known actives and known or assumed inactives, depends largely on the validation data sets used and the similarity measure used. Comparing new methods to existing ones in any systematic way is rather difficult due to the lack of standard data sets and evaluation procedures. Here, we present a standard platform for the benchmarking of 2D fingerprints. The open-source platform contains all source code, structural data for the actives and inactives used (drawn from three publicly available collections of data sets), and lists of randomly selected query molecules to be used for statistically valid comparisons of methods. This allows the exact reproduction and comparison of results for future studies. The results for 12 standard fingerprints together with two simple baseline fingerprints assessed by seven evaluation methods are shown together with the correlations between methods. High correlations were found between the 12 fingerprints and a careful statistical analysis showed that only the two baseline fingerprints were different from the others in a statistically significant way. High correlations were also found between six of the seven evaluation methods, indicating that despite their seeming differences, many of these methods are similar to each other.
| Year | Citations | |
|---|---|---|
Page 1
Page 1