A Survey on Performance Metrics for Object-Detection Algorithms

TLDR

Average precision (AP) is a widely used metric for object‑detection accuracy, but its many variants, inconsistent implementations across languages and datasets, and the resulting lack of consensus pose challenges for researchers. The study surveys and compares common object‑detection metrics, delineates their differences and applications, and proposes a standard implementation for cross‑dataset benchmarking. The authors review key metrics and provide a standard implementation that requires minimal annotation adaptation for cross‑dataset benchmarking.

Abstract

This work explores and compares the plethora of metrics for the performance evaluation of object-detection algorithms. Average precision (AP),for instance, is a popular metric for evaluating the accuracy of object detectors by estimating the area under the curve (AUC) of the precision × recall relationship. Depending on the point interpolation used in the plot, two different AP variants can be defined and, therefore, different results are generated. AP has six additional variants increasing the possibilities of benchmarking. The lack of consensus in different works and AP implementations is a problem faced by the academic and scientific communities. Metric implementations written in different computational languages and platforms are usually distributed with corresponding datasets sharing a given bounding-box description. Such projects indeed help the community with evaluation tools, but demand extra work to be adapted for other datasets and bounding-box formats. This work reviews the most used metrics for object detection detaching their differences, applications, and main concepts. It also proposes a standard implementation that can be used as a benchmark among different datasets with minimum adaptation on the annotation files.

References

Page 1

	Year	Citations

Page 1