Concepedia

Publication | Closed Access

SPIRIT: Sequential Pattern Mining with Regular Expression Constraints

450

Citations

11

References

1999

Year

TLDR

Discovering sequential patterns is important in data mining with applications in medicine, telecommunications, and the Web, yet conventional systems offer only a limited minimum‑support mechanism for specifying patterns. This paper proposes using Regular Expressions as a flexible constraint tool to let users focus pattern mining. The authors develop SPIRIT, a family of algorithms that enforce RE constraints at varying degrees to prune the search space, and evaluate tradeoffs through extensive experiments on synthetic and real data. Our solutions provide valuable insights into the tradeoffs that arise when constraints that do not subscribe to nice properties (like anti‑monotonicity) are integrated into the mining process.

Abstract

Discovering sequential patterns is an important problem in data mining with a host of application domains including medicine, telecommunications, and the World Wide Web. Conventional mining systems provide users with only a very restricted mechanism (based on minimum support) for specifying patterns of interest. In this paper, we propose the use of Regular Expressions (REs) as a flexible constraint specification tool that enables user-controlled focus to be incorporated into the pattern mining process. We develop a family of novel algorithms (termed SPIRIT ‐ Sequential Pattern mIning with Regular expressIon consTraints) for mining frequent sequential patterns that also satisfy user-specified RE constraints. The main distinguishing factor among the proposed schemes is the degree to which the RE constraints are enforced to prune the search space of patterns during computation. Our solutions provide valuable insights into the tradeoffs that arise when constraints that do not subscribe to nice properties (like anti-monotonicity) are integrated into the mining process. A quantitative exploration of these tradeoffs is conducted through an extensive experimental study on synthetic and real-life data sets.

References

YearCitations

Page 1