Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning

TLDR

Computational modeling of chemical and biological systems at atomic resolution requires a trade‑off between the high accuracy of quantum‑mechanical methods and the low cost of classical force fields. The authors propose using machine learning to combine the strengths of both approaches. They train a general‑purpose neural network potential, ANI‑1ccx, first on DFT data and then refine it via transfer learning on a chemically diverse CCSD(T)/CBS dataset to achieve near‑CCSD(T)/CBS accuracy on reaction thermochemistry, isomerization, and torsion benchmarks. The resulting potential is broadly applicable across materials science, biology, and chemistry, delivering accuracy comparable to CCSD(T)/CBS while being billions of times faster.

Abstract

Computational modeling of chemical and biological systems at atomic resolution is a crucial tool in the chemist's toolset. The use of computer simulations requires a balance between cost and accuracy: quantum-mechanical methods provide high accuracy but are computationally expensive and scale poorly to large systems, while classical force fields are cheap and scalable, but lack transferability to new systems. Machine learning can be used to achieve the best of both approaches. Here we train a general-purpose neural network potential (ANI-1ccx) that approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. This is achieved by training a network to DFT data then using transfer learning techniques to retrain on a dataset of gold standard QM calculations (CCSD(T)/CBS) that optimally spans chemical space. The resulting potential is broadly applicable to materials science, biology, and chemistry, and billions of times faster than CCSD(T)/CBS calculations.