Concepedia

Publication | Closed Access

Focal Loss for Dense Object Detection

9.3K

Citations

29

References

2018

Year

TLDR

Object detectors that first generate sparse candidate regions (e.g., R‑CNN) achieve the highest accuracy, while dense one‑stage detectors are faster but have historically lagged behind. The study investigates why one‑stage detectors underperform two‑stage detectors and proposes reshaping the cross‑entropy loss to down‑weight well‑classified examples. They introduce Focal Loss, which down‑weights easy negatives and focuses training on hard examples, and evaluate it by training a simple dense detector called RetinaNet. They find that extreme foreground–background imbalance is the main reason for one‑stage detector performance gaps, and that RetinaNet trained with focal loss matches the speed of prior one‑stage detectors while exceeding the accuracy of all state‑of‑the‑art two‑stage detectors. Code is available at https://github.com/facebookresearch/Detectron.

Abstract

The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors. Code is at: https://github.com/facebookresearch/Detectron.

References

YearCitations

Page 1