Publication | Open Access
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
21
Citations
0
References
2016
Year
Convolutional Neural NetworkEngineeringMachine LearningVideo RetrievalVideo InterpretationSimple PipelineImage AnalysisData SciencePattern RecognitionVideo Content AnalysisRobot LearningTemporal Activity DetectionVideo TransformerMachine VisionUntrimmed VideosTemporal Pattern RecognitionComputer ScienceVideo UnderstandingDeep LearningComputer VisionVideo Hallucination
This work proposes a simple pipeline to classify and temporally localize activities in untrimmed videos. Our system uses features from a 3D Convolutional Neural Network (C3D) as input to train a a recurrent neural network (RNN) that learns to classify video clips of 16 frames. After clip prediction, we post-process the output of the RNN to assign a single activity label to each video, and determine the temporal boundaries of the activity within the video. We show how our system can achieve competitive results in both tasks with a simple architecture. We evaluate our method in the ActivityNet Challenge 2016, achieving a 0.5874 mAP and a 0.2237 mAP in the classification and detection tasks, respectively. Our code and models are publicly available at: https://imatge-upc.github.io/activitynet-2016-cvprw/