Publication | Closed Access
Serendip: Topic model-driven visual exploration of text corpora
102
Citations
34
References
2014
Year
Unknown Venue
EngineeringEntity SummarizationSemantic WebCorpus LinguisticsText MiningAutomatic SummarizationNatural Language ProcessingInteractive VisualizationInformation RetrievalData ScienceComputational LinguisticsLanguage StudiesContent AnalysisVisual AnalyticsKnowledge DiscoveryTerminology ExtractionDifferent LevelsInformation ExtractionLarge Text CorpusVisualization ResearchTopic ModelText CorporaLinguistics
Exploration and discovery in a large text corpus requires investigation at multiple levels of abstraction, from a zoomed-out view of the entire corpus down to close-ups of individual passages and words. At each of these levels, there is a wealth of information that can inform inquiry - from statistical models, to metadata, to the researcher's own knowledge and expertise. Joining all this information together can be a challenge, and there are issues of scale to be combatted along the way. In this paper, we describe an approach to text analysis that addresses these challenges of scale and multiple information sources, using probabilistic topic models to structure exploration through multiple levels of inquiry in a way that fosters serendipitous discovery. In implementing this approach into a tool called Serendip, we incorporate topic model data and metadata into a highly reorderable matrix to expose corpus level trends; extend encodings of tagged text to illustrate probabilistic information at a passage level; and introduce a technique for visualizing individual word rankings, along with interaction techniques and new statistical methods to create links between different levels and information types. We describe example uses from both the humanities and visualization research that illustrate the benefits of our approach.
| Year | Citations | |
|---|---|---|
Page 1
Page 1