Publication | Closed Access
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
388
Citations
0
References
2006
Year
Unknown Venue
EngineeringMachine LearningVideo SummarizationVideo RetrievalVideo InterpretationText MiningNatural Language ProcessingData ScienceHuman ActionPattern RecognitionRobot LearningHuman MotionHealth SciencesCognitive ScienceAction PatternHuman Action CategoriesTemporal Pattern RecognitionVideo UnderstandingComputer VisionSpace-time Interest PointsActivity RecognitionLinguistics
We present a novel unsupervised learning method for human action categories. A video sequence is represented as a collection of spatial-temporal words by extracting space-time interest points. The algorithm automatically learns the probability distributions of the spatial-temporal words and the intermediate topics corresponding to human action categories. This is achieved by using latent topic models such as the probabilistic Latent Semantic Analysis (pLSA) model and Latent Dirichlet Allocation (LDA). Our approach can handle noisy feature points arisen from dynamic background and moving cameras due to the application of the probabilistic models. Given a novel video sequence, the algorithm can categorize and localize the human action(s) contained in the video. We test our algorithm on three challenging datasets: the KTH human motion dataset, the Weizmann human action dataset, and a recent dataset of figure skating actions. Our results reflect the promise of such a simple approach. In addition, our algorithm can recognize and localize multiple actions in long and complex video sequences containing multiple motions.