Incorporating text dispersion into keyword analyses

TLDR

Keyword analysis is widely used to identify characteristic words in a discourse domain, yet conventional keyness calculations treat the corpus as a homogeneous whole and ignore how words are distributed across individual texts, resulting in frequent but poorly dispersed keywords that may not truly represent the domain. This study proposes a novel keyword analysis method—text dispersion keyness—that bases keyness on how widely a word is dispersed across texts rather than on overall corpus frequency. The authors evaluate this method against four existing keyness measures by conducting case studies on online travel blogs to determine which approach best identifies domain‑typical keywords. Quantitative and qualitative comparisons demonstrate that text dispersion keyness yields keyword lists with higher content generalisability and distinctiveness, outperforming the other methods.

Abstract

Keyword analysis has become an indispensable tool for discourse analysts, being applied to identify the words that are especially characteristic of the texts in a target discourse domain. But, surprisingly, the statistical computation of keyness makes no reference to those texts. Rather, once a corpus has been constructed, it is treated as a homogeneous whole for the computation of keyness. As a result, the keywords in such lists are relatively frequent in the corpus, but they are often not widely dispersed across the texts of that corpus and are thus not truly representative of the target discourse domain. The purpose of this study is to propose a new method for keyword analysis – text dispersion keyness – that is based on text dispersion, rather than corpus frequency. We compare the effectiveness of this measure to four other methods for computing keyness, carrying out a series of case studies to identify the keywords that are typical of online travel blogs. A variety of quantitative and qualitative analyses are carried out to compare these methods based on their content-generalisability and content-distinctiveness, demonstrating that text dispersion keyness is a superior measure for generating keyword lists.

References

Page 1

	Year	Citations

Page 1