In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening

TLDR

Machine‑learning methods, especially convolutional neural networks, are increasingly applied to structure‑based virtual screening and often outperform traditional scoring functions, yet they remain largely black boxes with unclear feature importance and uncertain prediction validity. The study aims to provide guidelines for designing validation experiments and to outline how new benchmark data sets can be generated for bias‑controlled machine‑learning virtual screening. The authors re‑evaluated three common benchmark data sets with machine‑learning models, illustrated implicit bias learning on two literature examples, and used these analyses to propose guidelines for validation experiments. They found that not all benchmark data sets are suitable for ML, that implicit bias can be learned from standard benchmarks, and that more bias‑controlled validation experiments and data sets are needed.

Abstract

Reports of successful applications of machine learning (ML) methods in structure-based virtual screening (SBVS) are increasing. ML methods such as convolutional neural networks show promising results and often outperform traditional methods such as empirical scoring functions in retrospective validation. However, trained ML models are often treated as black boxes and are not straightforwardly interpretable. In most cases, it is unknown which features in the data are decisive and whether a model’s predictions are right for the right reason. Hence, we re-evaluated three widely used benchmark data sets in the context of ML methods and came to the conclusion that not every benchmark data set is suitable. Moreover, we demonstrate on two examples from current literature that bias is learned implicitly and unnoticed from standard benchmarks. On the basis of these results, we conclude that there is a need for eligible validation experiments and benchmark data sets suited to ML for more bias-controlled validation in ML-based SBVS. Therefore, we provide guidelines for setting up validation experiments and give a perspective on how new data sets could be generated.

References

Page 1

	Year	Citations

Page 1