Lightweight Underwater Object Detection Based on YOLO v4 and Multi-Scale Attentional Feature Fusion

TLDR

Underwater object detection is difficult because low visibility, color bias, small targets, and target aggregation degrade image quality, and existing deep‑learning approaches rely on large networks that are slow and heavy, limiting real‑time deployment. This work proposes a lightweight detection framework that combines MobileNet v2, YOLO v4, and attentional feature fusion to achieve a balance between accuracy and speed for marine environments. The method reduces parameters by replacing standard convolutions with depth‑wise separable convolutions in MobileNet v2 and introduces a Modified Attentional Feature Fusion module to better merge semantic and scale‑inconsistent features. Experiments show the model attains 81.67 % mAP on PASCAL VOC and 92.65 % on a brackish dataset while running at 44.22 FPS, with parameters and size compressed to 16.76 % and 19.53 % of YOLO v4, demonstrating a strong accuracy‑speed trade‑off.

Abstract

A challenging and attractive task in computer vision is underwater object detection. Although object detection techniques have achieved good performance in general datasets, problems of low visibility and color bias in the complex underwater environment have led to generally poor image quality; besides this, problems with small targets and target aggregation have led to less extractable information, which makes it difficult to achieve satisfactory results. In past research of underwater object detection based on deep learning, most studies have mainly focused on improving detection accuracy by using large networks; the problem of marine underwater lightweight object detection has rarely gotten attention, which has resulted in a large model size and slow detection speed; as such the application of object detection technologies under marine environments needs better real-time and lightweight performance. In view of this, a lightweight underwater object detection method based on the MobileNet v2, You Only Look Once (YOLO) v4 algorithm and attentional feature fusion has been proposed to address this problem, to produce a harmonious balance between accuracy and speediness for target detection in marine environments. In our work, a combination of MobileNet v2 and depth-wise separable convolution is proposed to reduce the number of model parameters and the size of the model. The Modified Attentional Feature Fusion (AFFM) module aims to better fuse semantic and scale-inconsistent features and to improve accuracy. Experiments indicate that the proposed method obtained a mean average precision (mAP) of 81.67% and 92.65% on the PASCAL VOC dataset and the brackish dataset, respectively, and reached a processing speed of 44.22 frame per second (FPS) on the brackish dataset. Moreover, the number of model parameters and the model size were compressed to 16.76% and 19.53% of YOLO v4, respectively, which achieved a good tradeoff between time and accuracy for underwater object detection.

References

Page 1

	Year	Citations

Page 1