Language Models in Sociological Research: An Application to Classifying Large Administrative Data and Measuring Religiosity

TLDR

Computational methods are widespread in the social sciences, yet probabilistic language models remain relatively underused. The study introduces language models to a general social science readership and demonstrates their use through an illustrative analysis. The authors explain language models as probabilistic estimators of linguistic units, then apply them to classify names in a large administrative database to measure spatial variation of religiosity. The application demonstrates that language models effectively classify text with localized naming variations and suggests their broader potential for sociological research beyond classification.

Abstract

Computational methods have become widespread in the social sciences, but probabilistic language models remain relatively underused. We introduce language models to a general social science readership. First, we offer an accessible explanation of language models, detailing how they estimate the probability of a piece of language, such as a word or sentence, on the basis of the linguistic context. Second, we apply language models in an illustrative analysis to demonstrate the mechanics of using these models in social science research. The example application uses language models to classify names in a large administrative database; the classifications are then used to measure a sociologically important phenomenon: the spatial variation of religiosity. This application highlights several advantages of language models, including their effectiveness in classifying text that contains variation around the base structures, as is often the case with localized naming conventions and dialects. We conclude by discussing language models’ potential to contribute to sociological research beyond classification through their ability to generate language.

References

Page 1

	Year	Citations

Page 1