Sequence database searches viade novo peptide sequencing by tandem mass spectrometry

TLDR

Graph‑theory based de novo sequencing can generate multiple alternative peptide sequences from a single tandem mass spectrum, yet these alternatives are often similar enough for homology‑based database searches, though standard tools like Blast or FASTA require a single unambiguous query. The authors present a modified FASTA workflow that accepts a short list of candidate sequences derived from de novo peptide spectra to improve database search effectiveness despite sequencing ambiguities. They use a de novo algorithm to generate candidate peptide lists, incorporating amino‑acid composition, sequence tags, and incomplete Edman data, which then serve as queries in a homology‑based search routine. © 1997 John Wiley & Sons, Ltd.

Abstract

A method is described for searching protein sequence databases using tandem mass spectra of tryptic peptides. The approach uses a de novo sequencing algorithm to derive a short list of possible sequence candidates which serve as query sequences in a subsequent homology-based database search routine. The sequencing algorithm employs a graph theory approach similar to previously described sequencing programs. In addition, amino acid composition, peptide sequence tags and incomplete or ambiguous Edman sequence data can be used to aid in the sequence determinations. Although sequencing of peptides from tandem mass spectra is possible, one of the frequently encountered difficulties is that several alternative sequences can be deduced from one spectrum. Most of the alternative sequences, however, are sufficiently similar for a homology-based sequence database search to be possible. Unfortunately, the available protein sequence database search algorithms (e.g. Blast or FASTA) require a single unambiguous sequence as input. Here we describe how the publicly available FASTA computer program was modified in order to search protein databases more effectively in spite of the ambiguities intrinsic in de novo peptide sequencing algorithms. © 1997 John Wiley & Sons, Ltd.