Automatically Neutralizing Subjective Bias in Text

TLDR

Texts such as news, encyclopedias, and social media aim for objectivity, yet bias introduced through subjective framing, presupposition, and doubt remains widespread, eroding trust and fueling conflict. This work introduces a novel testbed for natural language generation that automatically neutralizes biased text and provides the first parallel corpus of biased language. The corpus contains 180,000 sentence pairs derived from Wikipedia edits that remove framing and attitudes, and the authors propose two encoder‑decoder baselines: a BERT‑based system that flags subjective words during generation and a modular approach that classifies problematic words and edits encoder hidden states via a join embedding. Large‑scale human evaluation across encyclopedias, news headlines, books, and political speeches indicates that these algorithms represent an initial step toward automatically identifying and reducing bias.

Abstract

Texts like news, encyclopedias, and some social media strive for objectivity. Yet bias in the form of inappropriate subjectivity — introducing attitudes via framing, presupposing truth, and casting doubt — remains ubiquitous. This kind of bias erodes our collective trust and fuels social conflict. To address this issue, we introduce a novel testbed for natural language generation: automatically bringing inappropriately subjective text into a neutral point of view (“neutralizing” biased text). We also offer the first parallel corpus of biased language. The corpus contains 180,000 sentence pairs and originates from Wikipedia edits that removed various framings, presuppositions, and attitudes from biased sentences. Last, we propose two strong encoder-decoder baselines for the task. A straightforward yet opaque concurrent system uses a BERT encoder to identify subjective words as part of the generation process. An interpretable and controllable modular algorithm separates these steps, using (1) a BERT-based classifier to identify problematic words and (2) a novel join embedding through which the classifier can edit the hidden states of the encoder. Large-scale human evaluation across four domains (encyclopedias, news headlines, books, and political speeches) suggests that these algorithms are a first step towards the automatic identification and reduction of bias.

References

Page 1

	Year	Citations

Page 1