Publication | Closed Access
Fast Approximate Motif Statistics
17
Citations
12
References
2001
Year
Protein MotifsEngineeringGeneticsMolecular BiologyGenomicsSequence AlignmentSequence MotifString-searching AlgorithmProteomicsApproximation TheorySequence AnalysisKnowledge DiscoveryComputer ScienceFast Approximate MethodFunctional GenomicsBioinformaticsProtein BioinformaticsComputational BiologyCombinatorial Pattern MatchingRandom TextSystems BiologyMedicine
We present in this article a fast approximate method for computing the statistics of a number of non-self-overlapping matches of motifs in a random text in the nonuniform Bernoulli model. This method is well suited for protein motifs where the probability of self-overlap of motifs is small. For 96% of the PROSITE motifs, the expectations of occurrences of the motifs in a 7-million-amino-acids random database are computed by the approximate method with less than 1% error when compared with the exact method. Processing of the whole PROSITE takes about 30 seconds with the approximate method. We apply this new method to a comparison of the C. elegans and S. cerevisiae proteomes.
| Year | Citations | |
|---|---|---|
Page 1
Page 1