Concepedia

Publication | Open Access

Interpretation of Shotgun Proteomic Data

1K

Citations

92

References

2005

Year

TLDR

Shotgun proteomics, which digests proteins into peptides and identifies them by tandem mass spectrometry and automated database searching, is the preferred method for large‑scale protein identification, but its peptide‑centric approach creates ambiguities—especially in higher eukaryotes—because shared peptides can map to multiple proteins or isoforms. This article aims to illustrate the challenges of interpreting shotgun proteomic data and to emphasize the need for common nomenclature and transparent informatic practices. The authors discuss issues such as the current state of protein sequence databases, their role in analysis, interpreting relative peptide quantification amid multiple isoforms, integrating proteomic with transcriptional data, and building computational infrastructure to combine diverse datasets.

Abstract

The shotgun proteomic strategy based on digesting proteins into peptides and sequencing them using tandem mass spectrometry and automated database searching has become the method of choice for identifying proteins in most large scale studies. However, the peptide-centric nature of shotgun proteomics complicates the analysis and biological interpretation of the data especially in the case of higher eukaryote organisms. The same peptide sequence can be present in multiple different proteins or protein isoforms. Such shared peptides therefore can lead to ambiguities in determining the identities of sample proteins. In this article we illustrate the difficulties of interpreting shotgun proteomic data and discuss the need for common nomenclature and transparent informatic approaches. We also discuss related issues such as the state of protein sequence databases and their role in shotgun proteomic analysis, interpretation of relative peptide quantification data in the presence of multiple protein isoforms, the integration of proteomic and transcriptional data, and the development of a computational infrastructure for the integration of multiple diverse datasets.

References

YearCitations

Page 1