Learning To Navigate The Synthetically Accessible Chemical Space Using\n Reinforcement Learning

Abstract

Over the last decade, there has been significant progress in the field of\nmachine learning for de novo drug design, particularly in deep generative\nmodels. However, current generative approaches exhibit a significant challenge\nas they do not ensure that the proposed molecular structures can be feasibly\nsynthesized nor do they provide the synthesis routes of the proposed small\nmolecules, thereby seriously limiting their practical applicability. In this\nwork, we propose a novel forward synthesis framework powered by reinforcement\nlearning (RL) for de novo drug design, Policy Gradient for Forward Synthesis\n(PGFS), that addresses this challenge by embedding the concept of synthetic\naccessibility directly into the de novo drug design system. In this setup, the\nagent learns to navigate through the immense synthetically accessible chemical\nspace by subjecting commercially available small molecule building blocks to\nvalid chemical reactions at every time step of the iterative virtual multi-step\nsynthesis process. The proposed environment for drug discovery provides a\nhighly challenging test-bed for RL algorithms owing to the large state space\nand high-dimensional continuous action space with hierarchical actions. PGFS\nachieves state-of-the-art performance in generating structures with high QED\nand penalized clogP. Moreover, we validate PGFS in an in-silico\nproof-of-concept associated with three HIV targets. Finally, we describe how\nthe end-to-end training conceptualized in this study represents an important\nparadigm in radically expanding the synthesizable chemical space and automating\nthe drug discovery process.\n