Concepedia

Publication | Closed Access

Testing Machine Learning Algorithms for Balanced Data Usage

31

Citations

24

References

2019

Year

Abstract

With the increased application of machine learning (ML) algorithms to decision-making processes, the question of fairness of such algorithms came into the focus. Fairness testing aims at checking whether a classifier as "learned" by an ML algorithm on some training data is biased in the sense of discriminating against some of the attributes (e.g. gender or age). Fairness testing thus targets the prediction phase in ML, not the learning phase. In this paper, we investigate fairness for the learning phase. Our definition of fairness is based on the idea that the learner should treat all data in the training set equally, disregarding issues like names or orderings of features or orderings of data instances. We term this property balanced data usage. We consequently develop a (metamorphic) testing approach called TiLe for checking balanced data usage. TiLe is applied on 14 ML classifiers taken from the scikit-learn library using 4 artificial and 9 real-world data sets for training, finding 12 of the classifiers to be unbalanced.

References

YearCitations

Page 1