Publication | Open Access
Metamorphic Testing and Certified Mitigation of Fairness Violations in NLP Models
77
Citations
26
References
2020
Year
Unknown Venue
Nlp Model FairnessEngineeringMachine LearningEthics In Natural Language ProcessingVerificationFairness In Natural Language ProcessingSemanticsLanguage ProcessingText MiningNatural Language ProcessingNlp ModelsData ScienceLanguage TestingComputational LinguisticsLanguage EngineeringFairness ViolationsFairness (Computer Systems)Language StudiesFair Data PrincipleBias In Natural Language ProcessingAlgorithmic BiasNlp TaskComputer ScienceFairness (Language Acquisition)Automated ReasoningAlgorithmic FairnessLinguisticsMetamorphic Testing
NLP models are increasingly deployed in sensitive domains such as credit scoring and insurance, making it essential to ensure their decisions are free from unfair bias toward subpopulation groups. This work proposes a novel framework that uses metamorphic testing to identify discriminatory inputs that trigger fairness violations in NLP models. The framework applies metamorphic testing and defines (ε, k)-fairness, then smooths model predictions to provide certified mitigation of fairness violations. Applied to popular commercial NLP models, the method flags thousands of discriminatory inputs and, with a modest cost, adds a certified fairness guarantee that improves model fairness.
Natural language processing (NLP) models have been increasingly used in sensitive application domains including credit scoring, insurance, and loan assessment. Hence, it is critical to know that the decisions made by NLP models are free of unfair bias toward certain subpopulation groups. In this paper, we propose a novel framework employing metamorphic testing, a well-established software testing scheme, to test NLP models and find discriminatory inputs that provoke fairness violations. Furthermore, inspired by recent breakthroughs in the certified robustness of machine learning, we formulate NLP model fairness in a practical setting as (ε, k)-fairness and accordingly smooth the model predictions to mitigate fairness violations. We demonstrate our technique using popular (commercial) NLP models, and successfully flag thousands of discriminatory inputs that can cause fairness violations. We further enhance the evaluated models by adding certified fairness guarantee at a modest cost.
| Year | Citations | |
|---|---|---|
Page 1
Page 1