Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial

TLDR

More than 80 % of today’s data is stored in unstructured form, especially natural language, and manual qualitative analysis is infeasible for large internet‑derived text datasets. The tutorial aims to discuss the challenges of applying automated text‑mining techniques in information systems research. It demonstrates probabilistic topic modeling with LDA and LASSO multinomial logistic regression to explain user satisfaction from over 12,000 online customer reviews. The tutorial offers guidance for IS researchers to conduct and evaluate text‑mining studies.

Abstract

t is estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video);and much of it is expressed in rich and ambiguous natural language. Traditionally, the analysis of natural languagehas prompted the use of qualitative data analysis approaches, such as manual coding. Yet, the size of text data setsobtained from the Internet makes manual analysis virtually impossible. In this tutorial, we discuss the challengesencountered when applying automated text-mining techniques in information systems research. In particular, weshowcase the use of probabilistic topic modeling via Latent Dirichlet Allocation, an unsupervised text miningtechnique, in combination with a LASSO multinomial logistic regression to explain user satisfaction with an IT artifactby automatically analyzing more than 12,000 online customer reviews. For fellow information systems researchers,this tutorial provides some guidance for conducting text mining studies on their own and for evaluating the quality ofothers.

References

Page 1

	Year	Citations

Page 1