Publication | Open Access
Function-guided protein design by deep manifold sampling
20
Citations
25
References
2021
Year
Unknown Venue
EngineeringMachine LearningAutoencodersRepresentation LearningData ScienceProtein FoldingSelf-supervised LearningComputational BiochemistryBiophysicsProtein ModelingProtein Structure PredictionComputer ScienceComputational ModelingDeep LearningMedical Image ComputingBioinformaticsProtein BioinformaticsStructural BiologyComputational BiologyDeep Manifold SamplingFunction PredictorAbstract Protein DesignMedicineFoundation Models
Abstract Protein design is challenging because it requires searching through a vast combinatorial space that is only sparsely functional. Self-supervised learning approaches offer the potential to navigate through this space more effectively and thereby accelerate protein engineering. We introduce a sequence denoising autoencoder (DAE) that learns the manifold of protein sequences from a large amount of potentially unlabelled proteins. This DAE is combined with a function predictor that guides sampling towards sequences with higher levels of desired functions. We train the sequence DAE on more than 20M unlabeled protein sequences spanning many evolutionarily diverse protein families and train the function predictor on approximately 0.5M sequences with known function labels. At test time, we sample from the model by iteratively denoising a sequence while exploiting the gradients from the function predictor. We present a few preliminary case studies of protein design that demonstrate the effectiveness of this proposed approach, which we refer to as “deep manifold sampling”, including metal binding site addition, function-preserving diversification, and global fold change.
| Year | Citations | |
|---|---|---|
Page 1
Page 1