Metamorphic Testing and Certified Mitigation of Fairness Violations in NLP Models

TLDR

NLP models are increasingly deployed in sensitive domains such as credit scoring and insurance, making it essential to ensure their decisions are free from unfair bias toward subpopulation groups. This work proposes a novel framework that uses metamorphic testing to identify discriminatory inputs that trigger fairness violations in NLP models. The framework applies metamorphic testing and defines (ε, k)-fairness, then smooths model predictions to provide certified mitigation of fairness violations. Applied to popular commercial NLP models, the method flags thousands of discriminatory inputs and, with a modest cost, adds a certified fairness guarantee that improves model fairness.

Abstract

Natural language processing (NLP) models have been increasingly used in sensitive application domains including credit scoring, insurance, and loan assessment. Hence, it is critical to know that the decisions made by NLP models are free of unfair bias toward certain subpopulation groups. In this paper, we propose a novel framework employing metamorphic testing, a well-established software testing scheme, to test NLP models and find discriminatory inputs that provoke fairness violations. Furthermore, inspired by recent breakthroughs in the certified robustness of machine learning, we formulate NLP model fairness in a practical setting as (ε, k)-fairness and accordingly smooth the model predictions to mitigate fairness violations. We demonstrate our technique using popular (commercial) NLP models, and successfully flag thousands of discriminatory inputs that can cause fairness violations. We further enhance the evaluated models by adding certified fairness guarantee at a modest cost.

References

Page 1

	Year	Citations

Page 1