Publication | Closed Access
Gender Bias and Under-Representation in Natural Language Processing Across Human Languages
31
Citations
5
References
2021
Year
Unknown Venue
Gendered PerceptionEngineeringCorpus LinguisticsLanguage ProcessingText MiningApplied LinguisticsNatural Language ProcessingGender StudiesComputational LinguisticsLanguage EngineeringLanguage StudiesMachine TranslationNatural LanguageBias In Natural Language ProcessingNlp TaskLanguage TechnologyGender BiasGender StereotypeLanguage ScienceLanguage CorpusLinguisticsGender Bias Measurements
Natural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucial recommendations about our future world. However, these systems reflect a wide range of biases, from gender bias to a bias in which voices they represent. In this paper, a team including speakers of 9 languages - Chinese, Spanish, English, Arabic, German, French, Farsi, Urdu, and Wolof - reports and analyzes measurements of gender bias in the Wikipedia corpora for these 9 languages. In the process, we also document how our work exposes crucial gaps in the NLP-pipeline for many languages. Despite substantial investments in multilingual support, the modern NLP-pipeline still systematically and dramatically under-represents the majority of human voices in the NLP-guided decisions that are shaping our collective future. We develop extensions to profession-level and corpus-level gender bias metric calculations originally designed for English and apply them to 8 other languages, including languages like Spanish, Arabic, German, French and Urdu that have grammatically gendered nouns including different feminine, masculine and neuter profession words. We compare these gender bias measurements across the Wikipedia corpora in different languages as well as across some corpora of more traditional literature.
| Year | Citations | |
|---|---|---|
Page 1
Page 1