Testing Machine Learning Algorithms for Balanced Data Usage

Abstract

With the increased application of machine learning (ML) algorithms to decision-making processes, the question of fairness of such algorithms came into the focus. Fairness testing aims at checking whether a classifier as "learned" by an ML algorithm on some training data is biased in the sense of discriminating against some of the attributes (e.g. gender or age). Fairness testing thus targets the prediction phase in ML, not the learning phase. In this paper, we investigate fairness for the learning phase. Our definition of fairness is based on the idea that the learner should treat all data in the training set equally, disregarding issues like names or orderings of features or orderings of data instances. We term this property balanced data usage. We consequently develop a (metamorphic) testing approach called TiLe for checking balanced data usage. TiLe is applied on 14 ML classifiers taken from the scikit-learn library using 4 artificial and 9 real-world data sets for training, finding 12 of the classifiers to be unbalanced.

References

Page 1

	Year	Citations

Page 1