Concepedia

TLDR

The study aims to develop a reliable, cost‑effective method for classifying Internet texts into register categories and apply it to a large web corpus. The authors built a bottom‑up classification system using a decision‑tree survey completed by end users, tested it in ten pilot studies, and then applied it to 53,000 web documents. The analysis demonstrates the method’s effectiveness for web register classification and provides an initial overview of register types and their distribution on the web.

Abstract

This paper introduces a project to develop a reliable, cost‐effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom‐up method for web register classification, asking end users of the web to utilize a decision‐tree survey to code relevant situational characteristics of web documents, resulting in a bottom‐up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.

References

YearCitations

Page 1