Publication | Closed Access
A large-scale benchmark dataset for event recognition in surveillance video
752
Citations
13
References
2011
Year
Unknown Venue
EngineeringMachine LearningVideo ProcessingEvaluation MetricsVideo SurveillanceVideo RetrievalVideo InterpretationCver TasksImage AnalysisData SciencePattern RecognitionVideo Content AnalysisVideo TransformerMachine VisionLarge-scale EvaluationComputer ScienceVideo UnderstandingDeep LearningComputer VisionLarge-scale Benchmark Dataset
Previous datasets for action recognition are unrealistic for real‑world surveillance because they consist of short clips of a single action by one individual, and datasets developed for movies and sports do not reflect the conditions of surveillance videos. We introduce a large‑scale video dataset for continuous visual event recognition in outdoor surveillance and propose evaluation modes and metrics to assess diverse algorithms. The dataset comprises 29 hours of outdoor footage with naturally occurring actions by non‑actors, covering 23 event types, and includes detailed annotations of moving object tracks and event instances. Preliminary experiments demonstrate the dataset’s utility for evaluating visual event recognition algorithms, and we expect it will spur research and advance continuous visual event recognition.
We introduce a new large-scale video dataset designed to assess the performance of diverse visual event recognition algorithms with a focus on continuous visual event recognition (CVER) in outdoor areas with wide coverage. Previous datasets for action recognition are unrealistic for real-world surveillance because they consist of short clips showing one action by one individual [15, 8]. Datasets have been developed for movies [11] and sports [12], but, these actions and scene conditions do not apply effectively to surveillance videos. Our dataset consists of many outdoor scenes with actions occurring naturally by non-actors in continuously captured videos of the real world. The dataset includes large numbers of instances for 23 event types distributed throughout 29 hours of video. This data is accompanied by detailed annotations which include both moving object tracks and event examples, which will provide solid basis for large-scale evaluation. Additionally, we propose different types of evaluation modes for visual recognition tasks and evaluation metrics along with our preliminary experimental results. We believe that this dataset will stimulate diverse aspects of computer vision research and help us to advance the CVER tasks in the years ahead.
| Year | Citations | |
|---|---|---|
Page 1
Page 1