Publication | Closed Access
Approaches to topic identification on the switchboard corpus
61
Citations
12
References
2002
Year
Unknown Venue
EngineeringSpeech CorpusSpoken Language ProcessingKeyword Event SelectionTopic ModelingCorpus LinguisticsLanguage ProcessingText MiningSpeech RecognitionNatural Language ProcessingSwitchboard CorpusInformation RetrievalData ScienceTopic IdentificationComputational LinguisticsDocument ClassificationCorpus AnalysisLanguage StudiesSpoken Language UnderstandingKnowledge DiscoveryTerminology ExtractionSpeech CommunicationSpeech AnalysisTopic ModelKeyword ExtractionSpeech ProcessingLinguistics
Topic identification (TID) is the automatic classification of speech messages into one of a known set of possible topics. The TID task can be view as having three principal components: 1) event generation, 2) keyword event selection, and 3) topic modeling. Using data from the Switchboard corpus, the authors present experimental results for various approaches to the TID problem and compare the relative effectiveness of each. In addition, they examine the effect of keyword set size on identification accuracy and gauge the loss in performance when mismatched topic modeling and keyword selection schemes are used.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
| Year | Citations | |
|---|---|---|
Page 1
Page 1