Publication | Open Access
DeepLoc 2.0: multi-label subcellular localization prediction using protein language models
641
Citations
26
References
2022
Year
EngineeringMachine LearningSubcellular LocalizationMolecular BiologySignal RecognitionSpatial OmicsData ScienceProteomicsDeeploc 2.0Proteomics ResearchTranslational BioinformaticsProtein Subcellular LocalizationProtein ModelingOmicsProtein Structure PredictionDeep LearningBioinformaticsFunctional GenomicsCell BiologyProtein BioinformaticsOmics DatasetsComputational BiologyPopular Tool DeeplocSystems BiologyMedicine
The prediction of protein subcellular localization is of great relevance for proteomics research. We propose an update to DeepLoc that adds multi‑localization prediction and improves performance and interpretability. We train DeepLoc 2.0 on curated eukaryotic and human multi‑location protein datasets with stringent homology partitioning and enriched sorting‑signal annotations, using a pre‑trained protein language model and providing attention outputs and accurate sorting‑signal predictions for interpretability. DeepLoc 2.0 achieves state‑of‑the‑art accuracy, outperforms prior tools, runs faster due to sequence input, and its attention outputs correlate with sorting‑signal positions, enabling accurate prediction of nine sorting‑signal types. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.
The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.
| Year | Citations | |
|---|---|---|
Page 1
Page 1