Concepedia

Publication | Open Access

Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals

2.1K

Citations

21

References

2004

Year

TLDR

Maximum entropy models are defined by varying the set of constraints used to specify them. The authors propose a maximum entropy framework for modeling sequence motifs. They approximate short motif distributions with a maximum‑entropy distribution constrained by low‑order marginals, use these models to discriminate splice signals from decoys, evaluate performance to assess positional dependencies, apply the framework to large RNA splicing datasets, and discuss methods for comparing models. The resulting models outperform prior probabilistic approaches in distinguishing human donor and acceptor splice sites from decoys.

Abstract

We propose a framework for modeling sequence motifs based on the maximum entropy principle (MEP). We recommend approximating short sequence motif distributions with the maximum entropy distribution (MED) consistent with low-order marginal constraints estimated from available data, which may include dependencies between nonadjacent as well as adjacent positions. Many maximum entropy models (MEMs) are specified by simply changing the set of constraints. Such models can be utilized to discriminate between signals and decoys. Classification performance using different MEMs gives insight into the relative importance of dependencies between different positions. We apply our framework to large datasets of RNA splicing signals. Our best models out-perform previous probabilistic models in the discrimination of human 5′ (donor) and 3′ (acceptor) splice sites from decoys. Finally, we discuss mechanistically motivated ways of comparing models.

References

YearCitations

Page 1