Publication | Open Access
A 3D-CNN and LSTM Based Multi-Task Learning Architecture for Action Recognition
74
Citations
69
References
2019
Year
EngineeringMachine LearningMtl ModelAction Quality AssessmentVideo RetrievalVideo InterpretationImage AnalysisData SciencePattern RecognitionMulti-task LearningMulti-task Learning ArchitectureRobot LearningVideo TransformerMachine VisionAction RecognitionComputer ScienceVideo UnderstandingDeep LearningComputer VisionNovel Mtl ArchitectureMtl MechanismActivity Recognition
Multi-task learning (MTL) is a machine learning method to share knowledge for multiple related machine learning tasks via learning those tasks jointly. It has been shown to be capable of effectively improving the generalization capability of each single task (learning just one task at a time). In this paper, we propose a novel MTL architecture that first combines 3D convolutional neural networks (3D CNN) plus the long short-term memory (LSTM) networks together with the MTL mechanism, tailored to information sharing of video inputs. We split each video into several clips and apply the hybrid deep model of 3D CNN and LSTM to extract the sequential features of those video clips. Therefore, our MTL model can share visual knowledge based on those video-clip features among different categories more efficiently. We evaluate our method on three popular public action recognition video datasets. The experimental results show that our novel MTL method can efficiently share detailed information in video clips among multiple action categories and outperforms other multi-task methods.
| Year | Citations | |
|---|---|---|
Page 1
Page 1