Best of both worlds: Human-machine collaboration for object annotation

TLDR

Localizing every object in an image remains elusive, manual annotation is costly, and current detectors can reliably detect only a few objects per image. This work proposes a principled framework that combines state‑of‑the‑art object detection with crowd‑engineering techniques to accurately and efficiently localize objects. The system takes an image and desired precision, utility, or human‑cost constraints, and outputs annotations generated by a Markov Decision Process that seamlessly integrates multiple computer‑vision models with diverse human inputs. Experiments on the ILSVRC2014 dataset demonstrate the effectiveness of this human‑in‑the‑loop labeling approach.

Abstract

The long-standing goal of localizing every object in an image remains elusive. Manually annotating objects is quite expensive despite crowd engineering innovations. Current state-of-the-art automatic object detectors can accurately detect at most a few objects per image. This paper brings together the latest advancements in object detection and in crowd engineering into a principled framework for accurately and efficiently localizing objects in images. The input to the system is an image to annotate and a set of annotation constraints: desired precision, utility and/or human cost of the labeling. The output is a set of object annotations, informed by human feedback and computer vision. Our model seamlessly integrates multiple computer vision models with multiple sources of human input in a Markov Decision Process. We empirically validate the effectiveness of our human-in-the-loop labeling approach on the ILSVRC2014 object detection dataset.

References

Page 1

	Year	Citations
ImageNet classification with deep convolutional neural networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Communications of the ACM Convolutional Neural NetworkEngineeringMachine LearningNeural NetworkImagenet Classification	2017	75.5K
Going deeper with convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Image ClassificationDeep Neural NetworksImage AnalysisMachine LearningData Science	2015	46.2K
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky, Jia Deng, Hao Su, International Journal of Computer Vision Image ClassificationConvolutional Neural NetworkMachine VisionImage AnalysisEngineering	2015	39.5K
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Convolutional Neural NetworkEngineeringMachine LearningFeature DetectionRich Feature Hierarchies	2014	31.2K
The Pascal Visual Object Classes (VOC) Challenge Mark Everingham, Luc Van Gool, Christopher K. I. Williams, International Journal of Computer Vision Image AnalysisMachine VisionEngineeringObject CategorizationPattern Recognition	2009	19K
Object Detection with Discriminatively Trained Part-Based Models Pedro F. Felzenszwalb, Ross Girshick, David McAllester, IEEE Transactions on Pattern Analysis and Machine Intelligence Multiple Instance LearningObject Detection SystemMachine LearningEngineeringLatent Svm	2009	10K
SUN database: Large-scale scene recognition from abbey to zoo Jianxiong Xiao, James Hays, Krista A. Ehinger, Object CategorizationScene AnalysisImage AnalysisMachine VisionData Science	2010	3.1K
Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey<sup>★</sup> Chris Lintott, Kevin Schawinski, Anže Slosar, Monthly Notices of the Royal Astronomical Society	2008	1.4K
Going Deeper with Convolutions Christian Szegedy, Wei Liu, Yangqing Jia, arXiv (Cornell University) Image ClassificationDeep Neural NetworksMachine VisionMachine LearningData Science	2014	1.4K
Predicting protein structures with a multiplayer online game Seth Cooper, Firas Khatib, Adrien Treuille, Nature Multiplayer Online GameEngineeringData ScienceComputational BiologyGame Analytics	2010	1.4K

Page 1