Publication | Closed Access
Towards Good Practices for Action Video Encoding
70
Citations
24
References
2014
Year
Unknown Venue
EngineeringMachine LearningVideo Coding FormatBiometricsCommunicationVideo RetrievalVideo InterpretationImage AnalysisData ScienceExcellent AccuracyPattern RecognitionVirtual RealityVideo TransformerHigh Dimensional RepresentationsMachine VisionFeature LearningVideo ManipulationComputer ScienceVideo UnderstandingDeep LearningAction Video EncodingProper EncodingComputer Vision
High dimensional representations such as VLAD or FV have shown excellent accuracy in action recognition. This paper shows that a proper encoding built upon VLAD can achieve further accuracy boost with only negligible computational cost. We empirically evaluated various VLAD improvement technologies to determine good practices in VLAD-based video encoding. Furthermore, we propose an interpretation that VLAD is a maximum entropy linear feature learning process. Combining this new perspective with observed VLAD data distribution properties, we propose a simple, lightweight, but powerful bimodal encoding method. Evaluated on 3 benchmark action recognition datasets (UCF101, HMDB51 and Youtube), the bimodal encoding improves VLAD by large margins in action recognition.
| Year | Citations | |
|---|---|---|
Page 1
Page 1