Publication | Closed Access
Combining semantic and syntactic document classifiers to improve first story detection
65
Citations
3
References
2001
Year
Unknown Venue
EngineeringNarrative SummarizationCorpus LinguisticsJournalismText MiningAutomatic SummarizationNatural Language ProcessingStory DetectionInformation RetrievalData ScienceComputational LinguisticsMultiple Document RepresentationsDocument ClassificationLanguage StudiesNews SemanticsSyntactic Document ClassifiersContent AnalysisNarrative ExtractionKnowledge DiscoveryInformation ExtractionTdt InitiativeTopic ModelTdt1 Evaluation MethodologyLinguistics
In this paper we describe a type of data fusion involving the combination of evidence derived from multiple document representations. Our aim is to investigate if a composite representation can improve the online detection of novel events in a stream of broadcast news stories. This classification process otherwise known as first story detection FSD (or in the Topic Detection and Tracking pilot study as online new event detection [1]), is one of three main classification tasks defined by the TDT initiative. Our composite document representation consists of a semantic representation (based on the lexical chains derived from a text) and a syntactic representation (using proper nouns). Using the TDT1 evaluation methodology, we evaluate a number of document representation combinations using these document classifiers.
| Year | Citations | |
|---|---|---|
Page 1
Page 1