Publication | Closed Access
On Summarization and Timeline Generation for Evolutionary Tweet Streams
77
Citations
34
References
2014
Year
EngineeringEntity SummarizationCommunicationData StructureCorpus LinguisticsText MiningAutomatic SummarizationNatural Language ProcessingComputational Social ScienceInformation RetrievalData ScienceComputational LinguisticsOnline SummariesLanguage StudiesContent AnalysisSocial Medium MiningMachine TranslationKnowledge DiscoveryComputer ScienceTimeline GenerationHistorical SummariesMulti-modal SummarizationSocial Medium Data
Short-text messages such as tweets are being created and shared at an unprecedented rate. Tweets, in their raw form, while being informative, can also be overwhelming. For both end-users and data analysts, it is a nightmare to plow through millions of tweets which contain enormous amount of noise and redundancy. In this paper, we propose a novel continuous summarization framework called Sumblr to alleviate the problem. In contrast to the traditional document summarization methods which focus on static and small-scale data set, Sumblr is designed to deal with dynamic, fast arriving, and large-scale tweet streams. Our proposed framework consists of three major components. First, we propose an online tweet stream clustering algorithm to cluster tweets and maintain distilled statistics in a data structure called tweet cluster vector (TCV). Second, we develop a TCV-Rank summarization technique for generating online summaries and historical summaries of arbitrary time durations. Third, we design an effective topic evolution detection method, which monitors summary-based/volume-based variations to produce timelines automatically from tweet streams. Our experiments on large-scale real tweets demonstrate the efficiency and effectiveness of our framework.
| Year | Citations | |
|---|---|---|
Page 1
Page 1