Publication | Closed Access
Two-Stream Convolution Neural Network with Video-stream for Action Recognition
26
Citations
25
References
2019
Year
Unknown Venue
Artificial IntelligenceConvolutional Neural NetworkImage AnalysisMachine VisionData ScienceMachine LearningPattern RecognitionEngineeringAction RecognitionVideo Content AnalysisVideo TransformerVideo UnderstandingTemporal InformationDeep LearningVideo RetrievalVideo InterpretationComputer Vision
Recently, as the application of the convolutional neural network in artificial intelligence is becoming increasingly diversified, a growing number of neural network methods are put forward. For example, 3D convolution and two-stream convolution method based on RGB and optical stream are applied to the neural network. Convolutional neural network with 3D convolutional core is able to extract spatio-temporal features directly from a set of video sequences, used for action recognition. Although the 3D convolutional neural network can obtain partial spatio-temporal information, a new ConvNet architecture called CVDN(Combined Video-stream Deep Network) is proposed to extract more spatio-temporal features from video fragments so as to effectively utilize the temporal information in the dataset. We evaluate our method on the UCF-101 dataset and obtain a good result. The following is some details about our method:First, we use pre-trained ResNets models on Kinetics dataset to initialize our training models, training and extracting the video stream features from UCF-101 dataset. Then, optical flow graphs obtained from the UCF-101 dataset, which are the input of the optical stream, are used to extract the optical features. At length, two-stream features are combined and the results are obtained after Softmax layer. When the linear fusion ratio of video stream features and optical stream features is 5:4, CVDN obtains good results. And the accuracy of our method with Resnet-101 achieves 92.2%.
| Year | Citations | |
|---|---|---|
Page 1
Page 1