Concepedia

Publication | Closed Access

A Genetic-Based EM Motif-Finding Algorithm for Biological Sequence Analysis

18

Citations

23

References

2007

Year

Chengpeng Bi

Unknown Venue

Abstract

Motif-finding in biological sequence analysis remains a challenge in computational biology. Many algorithms and software packages have been developed to address the problem. The expectation maximization (EM)-type motif algorithm such as MEME is one of the most popular de novo motif discovery methods. However, as pointed out in literature, EM algorithms largely depend on their initialization and can be easily trapped in local optima. This paper proposes and implements a genetic-based EM motif-finding algorithm (GEMFA) aiming to overcome the drawbacks inherent in EM motif discovery algorithms. It first initializes a population of multiple local alignments each of which is encoded on a chromosome that represents a potential solution. GEMFA then performs heuristic search in the whole alignment space using minimum distance length (MDL) as the fitness function which is generalized from maximum log-likelihood. The genetic algorithm gradually moves this population towards the best alignment from which the motif model is derived. Simulated and real biological sequence analysis showed that GEMFA performed better than the simple multiple-restart of EM motif-finding algorithm especially in the subtle motif sequence alignment and other similar algorithms as well

References

YearCitations

Page 1