Publication | Open Access
Towards fairer datasets
218
Citations
43
References
2020
Year
Unknown Venue
EngineeringMachine LearningObject CategorizationKey FactorsImage ClassificationImage AnalysisVisual GroundingData ScienceData MiningPattern RecognitionFair DataFair Data PrincipleData ManagementVision RecognitionMachine VisionComputer Vision TechnologyFair Resource AllocationData PrivacyVision Language ModelComputer ScienceDeep LearningComputer VisionAlgorithmic FairnessPerson Subtree
Computer vision is widely used yet its datasets are unrepresentative, leading to misbehavior such as offensive predictions and poorer performance for underrepresented groups, because models are trained on manually annotated image collections whose data and label distributions critically shape model behavior. The authors aim to examine ImageNet to illuminate root causes of bias and to initiate constructive mitigation steps. The study analyzes ImageNet’s person subtree, focusing on its stagnant WordNet vocabulary, exhaustive but uneven image coverage, and unequal representation, to identify how these dataset factors drive downstream bias.
Computer vision technology is being used by many but remains representative of only a few. People have reported misbehavior of computer vision models, including offensive prediction results and lower performance for underrepresented groups. Current computer vision models are typically developed using datasets consisting of manually annotated images or videos; the data and label distributions in these datasets are critical to the models' behavior. In this paper, we examine ImageNet, a large-scale ontology of images that has spurred the development of many modern computer vision methods. We consider three key factors within the person subtree of ImageNet that may lead to problematic behavior in downstream computer vision technology: (1) the stagnant concept vocabulary of WordNet, (2) the attempt at exhaustive illustration of all categories with images, and (3) the inequality of representation in the images within concepts. We seek to illuminate the root causes of these concerns and take the first steps to mitigate them constructively.
| Year | Citations | |
|---|---|---|
2016 | 214.9K | |
2009 | 60.2K | |
2015 | 39.5K | |
2015 | 27.2K | |
2009 | 19K | |
2015 | 7.5K | |
2015 | 6.2K | |
2017 | 5.1K | |
2017 | 3.9K | |
2012 | 3.3K |
Page 1
Page 1