Concepedia

Abstract

CCTVs have since long been used to enforce security, e.g. to detect fights arising from many different situations. But their effectiveness is questionable, because they rely on continuous and specialized human supervision, demanding automated solutions. Previous work are either too superficial (classification of short-clips) or unrealistic (movies, sports, fake fights). None performed detection of actual fights on long duration CCTV recordings. In this work, we tackle this problem by firstly proposing CCTV-Fights <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> , a novel and challenging dataset containing 1,000 videos of real fights, with more than 8 hours of annotated CCTV footage. Then we propose a pipeline, on which we assess the impact of different feature extractors, through Two-stream CNN, 3D CNN and a local interest point descriptor, as well as different classifiers, such as end-to-end CNN, LSTM and SVM. Results confirm how challenging the problem is, and highlight the importance of explicit motion information to improve performance.

References

YearCitations

Page 1