Publication | Open Access
Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals
2.1K
Citations
21
References
2004
Year
Maximum entropy models are defined by varying the set of constraints used to specify them. The authors propose a maximum entropy framework for modeling sequence motifs. They approximate short motif distributions with a maximum‑entropy distribution constrained by low‑order marginals, use these models to discriminate splice signals from decoys, evaluate performance to assess positional dependencies, apply the framework to large RNA splicing datasets, and discuss methods for comparing models. The resulting models outperform prior probabilistic approaches in distinguishing human donor and acceptor splice sites from decoys.
We propose a framework for modeling sequence motifs based on the maximum entropy principle (MEP). We recommend approximating short sequence motif distributions with the maximum entropy distribution (MED) consistent with low-order marginal constraints estimated from available data, which may include dependencies between nonadjacent as well as adjacent positions. Many maximum entropy models (MEMs) are specified by simply changing the set of constraints. Such models can be utilized to discriminate between signals and decoys. Classification performance using different MEMs gives insight into the relative importance of dependencies between different positions. We apply our framework to large datasets of RNA splicing signals. Our best models out-perform previous probabilistic models in the discrimination of human 5′ (donor) and 3′ (acceptor) splice sites from decoys. Finally, we discuss mechanistically motivated ways of comparing models.
| Year | Citations | |
|---|---|---|
Page 1
Page 1