Publication | Open Access
Bursty Feature Representation for Clustering Text Streams
111
Citations
16
References
2007
Year
Unknown Venue
Cluster ComputingEngineeringStreaming AlgorithmText RepresentationCorpus LinguisticsText MiningNatural Language ProcessingStatic TextInformation RetrievalData ScienceData MiningDocument ClassificationLanguage StudiesContent AnalysisDocument ClusteringClassical Text MiningKnowledge DiscoveryComputer ScienceBursty Feature RepresentationTopic ModelData Stream MiningKeyword ExtractionLinguistics
Text representation plays a crucial role in classical text mining, where the primary focus was on static text. Nevertheless, well-studied static text representations including TFIDF are not optimized for non-stationary streams of information such as news, discussion board messages, and blogs. We therefore introduce a new temporal representation for text streams based on bursty features. Our bursty text representation differs significantly from traditional schemes in that it 1) dynamically represents documents over time, 2) amplifies a feature in proportional to its burstiness at any point in time, and 3) is topic independent. Our bursty text representation model was evaluated against a classical bag-of-words text representation on the task of clustering TDT3 topical text streams. It was shown to consistently yield more cohesive clusters in terms of cluster purity and cluster/class entropies. This new temporal bursty text representation can be extended to most text mining tasks involving a temporal dimension, such as modeling of online blog pages.
| Year | Citations | |
|---|---|---|
Page 1
Page 1