YOLO-SM: A Lightweight Single-Class Multi-Deformation Object Detection Network

Abstract

Recently, object detection witnessed vast progress with the rapid development of Convolutional Neural Networks (CNNs). However, object detection is mainly for multi-class tasks, and few networks are used to detect single-class multi-deformation objects. This paper aims to develop a lightweight object detection network for single-class multi-deformation objects to promote the practical application of object detection networks. First, we design a Densely Connected Multi-scale (DCM) module to augment the semantic information extraction of deformation objects. With the DCM module and other strategies incorporated, we design a lightweight backbone structure for object detection, namely, DCMNet. Then, we construct a lightweight Neck structure Ghost Multi-scale Feature (GMF) module for feature fusion using a feature linear generation strategy. Finally, with the DCMNet and GMF module, we propose the object detection network YOLO-SM for single-class multi-deformation objects. Extensive experiments demonstrate that our proposed backbone structure, DCMNet, significantly outperforms the state-of-the-art models. YOLO-SM achieves 97.66% mean Average Precision ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$mAP$</tex-math></inline-formula> ) on the Barcode public dataset, which is higher than other state-of-the-art object detection models, and achieves an inference time of 55.45 frames per second (FPS), proving that the YOLO-SM has a good performance tradeoff between speed and accuracy in detecting single-class multi-deformation objects. Furthermore, in the single-class multi-deformation Crack public dataset, the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$mAP$</tex-math></inline-formula> of 86.11% is achieved, and an <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$mAP$</tex-math></inline-formula> of 99.84% is obtained in the multi-class dataset Dish20, which is much higher than other state-of-the-art object detection models, proving that the YOLO-SM has good generalization ability.

References

Page 1

	Year	Citations

Page 1