Publication | Closed Access
Efficient Classification of Hot Spots and Hub Protein Interfaces by Recursive Feature Elimination and Gradient Boosting
55
Citations
61
References
2019
Year
EngineeringSignal RecognitionMolecular BiologyFeature SelectionData ScienceData MiningPattern RecognitionHub Protein InterfacesGradient BoostingProteomicsInteractomicsKnowledge DiscoveryProtein ModelingProtein Structure PredictionHot SpotsBioinformaticsFeature ConstructionProtein BioinformaticsBoosting AlgorithmNatural SciencesComputational BiologyProtein EngineeringClassificationSystems Biology
Proteins are not isolated biological molecules, which have the specific three-dimensional structures and interact with other proteins to perform functions. A small number of residues (hot spots) in protein-protein interactions (PPIs) play the vital role in bioinformatics to influence and control of biological processes. This paper uses the boosting algorithm and gradient boosting algorithm based on two feature selection strategies to classify hot spots with three common datasets and two hub protein datasets. First, the correlation-based feature selection is used to remove the highly related features for improving accuracy of prediction. Then, the recursive feature elimination based on support vector machine (SVM-RFE) is adopted to select the optimal feature subset to improve the training performance. Finally, boosting and gradient boosting (G-boosting) methods are invoked to generate classification results. Gradient boosting is capable of obtaining an excellent model by reducing the loss function in the gradient direction to avoid overfitting. Five datasets from different protein databases are used to verify our models in the experiments. Experimental results show that our proposed classification models have the competitive performance compared with existing classification methods.
| Year | Citations | |
|---|---|---|
Page 1
Page 1