Publication | Closed Access
Highly efficient incremental estimation of Gaussian mixture models for online data stream clustering
127
Citations
9
References
2005
Year
Cluster ComputingEngineeringMachine LearningGaussian Mixture ModelsStreaming AlgorithmStreaming DataUnsupervised Machine LearningSocial SciencesData ScienceData MiningMixture AnalysisStatisticsClustering (Nuclear Physics)Estimator UpdatingKnowledge DiscoveryComputer ScienceMixture DistributionComputational NeuroscienceDensity EstimateData Stream MiningStatistical InferenceNeuroscienceClustering (Data Mining)
We present a probability-density-based data stream clustering approach which requires only the newly arrived data, not the entire historical data, to be saved in memory. This approach incrementally updates the density estimate taking only the newly arrived data and the previously estimated density. The idea roots on a theorem of estimator updating and it works naturally with Gaussian mixture models. We implement it through the expectation maximization algorithm and a cluster merging strategy by multivariate statistical tests for equality of covariance and mean. Our approach is highly efficient in clustering voluminous <i>online</i> data streams when compared to the standard EM algorithm. We demonstrate the performance of our algorithm on clustering a simulated Gaussian mixture data stream and clustering real noisy spike signals extracted from neuronal recordings.
| Year | Citations | |
|---|---|---|
Page 1
Page 1