Representation of compounds for machine-learning prediction of physical properties

TLDR

Compound descriptors are essential for building machine‑learning models of physical properties. The study develops a systematic descriptor generation procedure from elemental and structural data. The authors applied the descriptor generation to an 18,000‑compound cohesive‑energy set, a 110‑compound lattice‑thermal‑conductivity set, and a 248‑compound melting‑temperature set, then evaluated descriptor performance via kernel ridge regression accuracy and Bayesian optimization efficiency. The resulting kernel ridge models achieve a 0.041 eV/atom error, near chemical accuracy, and demonstrate strong predictive performance across datasets.

Abstract

The representations of a compound, called "descriptors" or "features", play an essential role in constructing a machine-learning model of its physical properties. In this study, we adopt a procedure for generating a systematic set of descriptors from simple elemental and structural representations. First it is applied to a large dataset composed of the cohesive energy for about 18000 compounds computed by density functional theory (DFT) calculation. As a result, we obtain a kernel ridge prediction model with a prediction error of 0.041 eV/atom, which is close to the "chemical accuracy" of 1 kcal/mol (0.043 eV/atom). The procedure is also applied to two smaller datasets, i.e., a dataset of the lattice thermal conductivity (LTC) for 110 compounds computed by DFT calculation and a dataset of the experimental melting temperature for 248 compounds. We examine the performance of the descriptor sets on the efficiency of Bayesian optimization in addition to the accuracy of the kernel ridge regression models. They exhibit good predictive performances.

References

Page 1

	Year	Citations

Page 1