Discrimination-aware data mining

TLDR

Discrimination in civil rights law refers to unfair treatment based on group membership, and data‑mining derived rules can perpetuate such bias. The paper introduces and studies the concept of discriminatory classification rules. The authors formulate the redlining problem precisely and relate discriminatory rules to apparently safe ones using background knowledge. They show that guaranteeing non‑discrimination is non‑trivial, that simply removing discriminatory attributes is insufficient, and validate their approach empirically on the German credit dataset.

Abstract

In the context of civil rights law, discrimination refers to unfair or unequal treatment of people based on membership to a category or a minority, without regard to individual merit. Rules extracted from databases by data mining techniques, such as classification or association rules, when used for decision tasks such as benefit or credit approval, can be discriminatory in the above sense. In this paper, the notion of discriminatory classification rules is introduced and studied. Providing a guarantee of non-discrimination is shown to be a non trivial task. A naive approach, like taking away all discriminatory attributes, is shown to be not enough when other background knowledge is available. Our approach leads to a precise formulation of the redlining problem along with a formal result relating discriminatory rules with apparently safe ones by means of background knowledge. An empirical assessment of the results on the German credit dataset is also provided.

References

Page 1

	Year	Citations
UCI Repository of machine learning databases Catherine Blake Medical Entomology and Zoology Data ClassificationEngineeringMachine LearningData ScienceData Mining	1998	10.5K
Fast Algorithms for Mining Association Rules in Large Databases Rakesh Agrawal, Ramakrishnan Srikant Very Large Data Bases EngineeringInformation RetrievalData ScienceData MiningFrequent Pattern Mining	1994	9.4K
The Economics of Discrimination. David Collard, Gary S. Becker The Economic Journal Race LawDiscriminationRacial PrejudiceEducationDiscrimination Law	1972	3.7K
Privacy-preserving data mining Rakesh Agrawal, Ramakrishnan Srikant ACM SIGMOD Record Privacy-preserving Data MiningEngineeringMachine LearningDecision-tree ClassifierInformation Security	2000	3K
Integrating classification and association rule mining Bing Liu, Wynne Hsu, Yiming Ma	1998	2.2K
Privacy-preserving data mining Rakesh Agrawal, Ramakrishnan Srikant Privacy-preserving Data MiningEngineeringMachine LearningDecision-tree ClassifierInformation Security	2000	1.7K
CPAR: Classification based on Predictive Association Rules Xiaoxin Yin, Jiawei Han	2003	815
Selecting the right objective measure for association analysis Pang‐Ning Tan, Vipin Kumar, Jaideep Srivastava Information Systems Profiling TechniqueEngineeringData MiningEvaluation MeasureRight Objective Measure	2003	560
Black Job Applicants and the Hiring Officer's Race Michael A. Stoll, Steven Raphael, Harry J. Holzer Industrial and Labor Relations Review DiscriminationUnited StatesRacial DisparitiesAfrican American HistorySocial Sciences	2004	105
Racial Profiling, Insurance Style: Insurance Redlining and the Uneven Development of Metropolitan Areas Gregory D. Squires Journal of Urban Affairs EthnicityRace LawEducationRacial StudyRacial Disparities	2003	101

Page 1