Publication | Closed Access
A fast computation of pairwise sequence alignment scores between a protein and a set of single-locus variants of another protein
138
Citations
5
References
2012
Year
Unknown Venue
GeneticsMolecular BiologyMolecular GeneticsGenomicsSequence AlignmentGene RecognitionSingle-locus VariantsPhylogenetic AnalysisMolecular EcologyString ProcessingComputational GenomicsSystems BiologySequence AnalysisStatistical GeneticsFast ComputationBioinformaticsFunctional GenomicsNew AlgorithmProtein BioinformaticsNatural SciencesComputational BiologyProtein EvolutionDynamic ProgrammingProvean Source CodeMedicine
Recently we have developed a new algorithm, PROVEAN (<u>Pro</u>tein <u>V</u>ariation <u>E</u>ffect <u>An</u>alyzer), for predicting the functional effect of protein sequence variations, including single amino acid substitutions and small insertions and deletions [2]. The prediction is based on the change, caused by a given variation, in the similarity of the query sequence to a set of its related protein sequences. For this prediction, the algorithm is required to compute a semi-global pairwise sequence alignment score between the query sequence and each of the related sequences. Using dynamic programming, it takes O(n · m) time to compute alignment score between the query sequence Q of length n and a related sequence S of length m. Thus given l different variations in Q, in a naive way it would take O(l · n · m) time to compute the alignment scores between each of the variant query sequences and S. In this paper, we present a new approach to efficiently compute the pairwise alignment scores for l variations, which takes O((n + l) · m) time when the length of variations is bounded by a constant. In this approach, we further utilize the solutions of overlapping subproblems, which are already used by dynamic programming approach. Our algorithm has been used to build a new database for precomputed prediction scores for all possible single amino acid substitutions, single amino acid insertions, and up to 10 amino acids deletions in about 91K human proteins (including isoforms), where l becomes very large, that is, l = O(n). The PROVEAN source code and web server are available at http://provean.jcvi.org.
| Year | Citations | |
|---|---|---|
Page 1
Page 1