Publication | Closed Access
Can We Learn To Distinguish between “Drug-like” and “Nondrug-like” Molecules?
482
Citations
14
References
1998
Year
Pharmaceutical ScienceEngineeringMachine LearningMolecular DesignMolecular PharmacologyData ScienceDrug DesignSmall Molecule LibraryLibrary DesignBiochemistryEntire Molecule“ Drug-like ”PharmacologyTarget PredictionMolecular PropertyComputational BiologyRational Drug DesignBayesian Neural NetworkSystems BiologyMedicineDrug DiscoveryPharmaceutical ResearchDrug Analysis
1D parameters encode overall molecular properties such as weight, while 2D parameters capture specific functional groups. The authors aim to use the resulting models to design combinatorial libraries. A Bayesian neural network trained on CMC (drug‑like surrogate) and ACD (non‑drug‑like surrogate) using 1D and 2D descriptors distinguishes drugs from nondrugs. The model achieves >90 % accuracy on CMC, correctly labels ~10 % of ACD as drug‑like, classifies ~80 % of MDDR as drug‑like, and enables generation of a 100‑molecule drug‑like library with a 3–4‑order‑of‑magnitude improvement over random sampling, producing neighborhoods distinct from Tanimoto similarity.
We have used a Bayesian neural network to distinguish between drugs and nondrugs. For this purpose, the CMC acts as a surrogate for drug-like molecules while the ACD is a surrogate for nondrug-like molecules. This task is performed by using two different set of 1D and 2D parameters. The 1D parameters contain information about the entire molecule like the molecular weight and the the 2D parameters contain information about specific functional groups within the molecule. Our best results predict correctly on over 90% of the compounds in the CMC while classifying about 10% of the molecules in the ACD as drug-like. Excellent generalization ability is shown by the models in that roughly 80% of the molecules in the MDDR are classified as drug-like. We propose to use the models to design combinatorial libraries. In a computer experiment on generating a drug-like library of size 100 from a set of 10 000 molecules we obtain at least a 3 or 4 order of magnitude improvement over random methods. The neighborhoods defined by our models are not similar to the ones generated by standard Tanimoto similarity calculations. Therefore, new and different information is being generated by our models, and so it can supplement standard diversity approaches to library design.
| Year | Citations | |
|---|---|---|
Page 1
Page 1