Concepedia

Abstract

We present a probability-density-based data stream clustering approach which requires only the newly arrived data, not the entire historical data, to be saved in memory. This approach incrementally updates the density estimate taking only the newly arrived data and the previously estimated density. The idea roots on a theorem of estimator updating and it works naturally with Gaussian mixture models. We implement it through the expectation maximization algorithm and a cluster merging strategy by multivariate statistical tests for equality of covariance and mean. Our approach is highly efficient in clustering voluminous <i>online</i> data streams when compared to the standard EM algorithm. We demonstrate the performance of our algorithm on clustering a simulated Gaussian mixture data stream and clustering real noisy spike signals extracted from neuronal recordings.

References

YearCitations

Page 1