Publication | Closed Access
Seq2seq Fingerprint with Byte-Pair Encoding for Predicting Changes in Protein Stability upon Single Point Mutation
10
Citations
27
References
2019
Year
EngineeringMachine LearningGeneticsMolecular BiologySingle Point MutationGenomicsSequence AlignmentGene RecognitionProtein SequencesStable ProteinsComputational GenomicsByte-pair EncodingSequence AnalysisProtein ModelingProtein Structure PredictionBioinformaticsFunctional GenomicsProtein BioinformaticsNext-generation SequencingBasic PredictorComputational BiologyProtein EvolutionSeq2seq FingerprintSystems BiologyMedicine
The engineering of stable proteins is crucial for various industrial purposes. Several machine learning methods have been developed to predict changes in the stability of proteins corresponding to single point mutations. To improve the prediction accuracy, we propose a new unsupervised descriptor for protein sequences, which is based on a sequence-to-sequence (seq2seq) neural network model combined with a sequence-compression method called byte-pair encoding (BPE). Our results demonstrate that BPE can encode a protein sequence into a sequence of shorter length, thereby enabling efficient training of the seq2seq model. Furthermore, we implement a basic predictor using the proposed descriptor, and our experimental results demonstrate that the predictor achieves state-of-the-art accuracy in tests for proteins that are not included in the training data.
| Year | Citations | |
|---|---|---|
Page 1
Page 1