Publication | Open Access
Predicting subcellular localization of proteins for Gram‐negative bacteria by support vector machines based on <i>n</i>‐peptide compositions
909
Citations
27
References
2004
Year
Biomolecular Structure PredictionSubcellular LocalizationMolecular BiologyBacterial PathogensOuter MembraneSupport Vector MachinesProteomicsGram-negative BacteriaProtein ModelingProtein Structure PredictionBioinformaticsProtein BioinformaticsNatural SciencesMicrobial ProteomicsComputational BiologyGram‐negative BacteriaMicrobiologySystems BiologyMedicine
Gram‑negative bacteria possess five major subcellular compartments—cytoplasm, periplasm, inner membrane, outer membrane, and extracellular space—and knowing a protein’s location informs its function, making an automated, accurate prediction tool increasingly essential as genomic data grow. The study introduces a method to predict subcellular localization of proteins in Gram‑negative bacteria. The approach employs support vector machines trained on multiple n‑peptide composition feature vectors. On a standard dataset of 1,443 proteins, the method achieved an 89% accuracy—14% higher than PSORT‑B and the highest reported—demonstrating its effectiveness and potential for broad, high‑throughput proteomic analysis.
Gram-negative bacteria have five major subcellular localization sites: the cytoplasm, the periplasm, the inner membrane, the outer membrane, and the extracellular space. The subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. We present an approach to predict subcellular localization for Gram-negative bacteria. This method uses the support vector machines trained by multiple feature vectors based on n-peptide compositions. For a standard data set comprising 1443 proteins, the overall prediction accuracy reaches 89%, which, to the best of our knowledge, is the highest prediction rate ever reported. Our prediction is 14% higher than that of the recently developed multimodular PSORT-B. Because of its simplicity, this approach can be easily extended to other organisms and should be a useful tool for the high-throughput and large-scale analysis of proteomic and genomic data.
| Year | Citations | |
|---|---|---|
Page 1
Page 1