Publication | Closed Access
Better prediction of protein cellular localization sites with the k nearest neighbors classifier.
440
Citations
11
References
1997
Year
EngineeringMachine LearningSubcellular LocalizationGene RecognitionSpatial OmicsCellular Localization SitesClassification MethodData MiningPattern RecognitionLocalization ProblemBiostatisticsProteomicsK Nearest NeighborsProtein ModelingProtein Structure PredictionBioinformaticsCell BiologyProtein BioinformaticsBetter PredictionNaïve Bayes ClassifierData ClassificationComputational BiologyClassifier SystemMicrobiologyCellular BiochemistrySystems BiologyMedicineCell Detection
Four classifiers—structured probabilistic, k‑nearest neighbors, decision tree, and naïve Bayes—were compared for predicting yeast and E. coli protein localization using sequence‑derived features such as hydrophobicity regions.
We have compared four classifiers on the problem of predicting the cellular localization sites of proteins in yeast and E. coli. A set of sequence derived features, such as regions of high hydrophobicity, were used for each classifier. The methods compared were a structured probabilistic model specifically designed for the localization problem, the k nearest neighbors classifier, the binary decision tree classifier, and the naïve Bayes classifier. The result of tests using stratified cross validation shows the k nearest neighbors classifier to perform better than the other methods. In the case of yeast this difference was statistically significant using a cross-validated paired t test. The result is an accuracy of approximately 60% for 10 yeast classes and 86% for 8 E. coli classes. The best previously reported accuracies for these datasets were 55% and 81% respectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1