Publication | Open Access
Regression Transformer enables concurrent sequence regression and generation for molecular language modelling
121
Citations
64
References
2023
Year
EngineeringMachine LearningMolecular BiologySequence RegressionSequence AlignmentLarge Language ModelNatural Language ProcessingData ScienceGenerative ModelComputational BiochemistryMachine TranslationSequence ModellingGenerative ModelsPre-trained ModelsComputational ModelingMolecular Language ModellingBioinformaticsNeural Machine TranslationRt MatchesConcurrent Sequence RegressionRegression TransformerNatural SciencesComputational BiologySystems BiologyLinguistics
Abstract Despite tremendous progress of generative models in the natural sciences, their controllability remains challenging. One fundamentally missing aspect of molecular or protein generative models is an inductive bias that can reflect continuous properties of interest. To that end, we propose the Regression Transformer (RT), a method that abstracts regression as a conditional sequence modelling problem. This introduces a new direction for multitask language models, seamlessly bridging sequence regression and conditional sequence generation. We demonstrate that, despite using a nominal-scale training objective, the RT matches or surpasses the performance of conventional regression models in property prediction of small molecules, proteins and chemical reactions. Critically, priming the same model with continuous properties yields a competitive conditional generative model that outperforms specialized approaches in a substructure-constrained, property-driven molecule generation benchmark. Our dichotomous approach is facilitated by an alternating training scheme that enables the model to decorate seed sequences on the basis of desired property constraints, for example, to optimize reaction yield. We expect that the RT’s capability to jointly tackle predictive and generative tasks in biochemistry can find applications in property-driven, local exploration of the chemical or protein space. Such multitask approaches will pave the road towards foundation models in materials design.
| Year | Citations | |
|---|---|---|
Page 1
Page 1