GOGGLES: Automatic Image Labeling with Affinity Coding

Abstract

Generating large labeled training data is becoming the biggest bottleneck in\nbuilding and deploying supervised machine learning models. Recently, the data\nprogramming paradigm has been proposed to reduce the human cost in labeling\ntraining data. However, data programming relies on designing labeling functions\nwhich still requires significant domain expertise. Also, it is prohibitively\ndifficult to write labeling functions for image datasets as it is hard to\nexpress domain knowledge using raw features for images (pixels).\n We propose affinity coding, a new domain-agnostic paradigm for automated\ntraining data labeling. The core premise of affinity coding is that the\naffinity scores of instance pairs belonging to the same class on average should\nbe higher than those of pairs belonging to different classes, according to some\naffinity functions. We build the GOGGLES system that implements affinity coding\nfor labeling image datasets by designing a novel set of reusable affinity\nfunctions for images, and propose a novel hierarchical generative model for\nclass inference using a small development set.\n We compare GOGGLES with existing data programming systems on 5 image labeling\ntasks from diverse domains. GOGGLES achieves labeling accuracies ranging from a\nminimum of 71% to a maximum of 98% without requiring any extensive human\nannotation. In terms of end-to-end performance, GOGGLES outperforms the\nstate-of-the-art data programming system Snuba by 21% and a state-of-the-art\nfew-shot learning technique by 5%, and is only 7% away from the fully\nsupervised upper bound.\n

References

Page 1

	Year	Citations

Page 1