Publication | Closed Access
Density-based clustering for real-time stream data
560
Citations
14
References
2007
Year
Unknown Venue
Cluster ComputingDensity-based ClusteringEngineeringStreaming AlgorithmSporadic GridsData Streaming ArchitectureStreaming DataCluster TechnologyData ScienceData MiningData ManagementKnowledge DiscoveryComputer EngineeringComputer ScienceData Stream MiningCloud ComputingClustering QualityBig DataStream Data
Existing data‑stream clustering methods such as CluStream rely on k‑means, cannot detect arbitrarily shaped clusters, handle outliers, or operate without prior knowledge of k and a fixed time window. This paper introduces D‑Stream, a density‑based framework designed to overcome these limitations. D‑Stream maps each record to a grid in an online component, then an offline component computes grid densities, applies a decay factor to capture stream dynamics, and clusters grids while removing sporadic outlier grids to maintain efficiency. Experiments demonstrate that D‑Stream achieves high‑speed clustering with superior quality, accurately identifies arbitrarily shaped clusters, and tracks evolving stream behaviors without degrading performance.
Existing data-stream clustering algorithms such as CluStream arebased on k-means. These clustering algorithms are incompetent tofind clusters of arbitrary shapes and cannot handle outliers. Further, they require the knowledge of k and user-specified time window. To address these issues, this paper proposes D-Stream, a framework for clustering stream data using adensity-based approach. The algorithm uses an online component which maps each input data record into a grid and an offline component which computes the grid density and clusters the grids based on the density. The algorithm adopts a density decaying technique to capture the dynamic changes of a data stream. Exploiting the intricate relationships between the decay factor, data density and cluster structure, our algorithm can efficiently and effectively generate and adjust the clusters in real time. Further, a theoretically sound technique is developed to detect and remove sporadic grids mapped to by outliers in order to dramatically improve the space and time efficiency of the system. The technique makes high-speed data stream clustering feasible without degrading the clustering quality. The experimental results show that our algorithm has superior quality and efficiency, can find clusters of arbitrary shapes, and can accurately recognize the evolving behaviors of real-time data streams.
| Year | Citations | |
|---|---|---|
Page 1
Page 1