Concepedia

TLDR

Lifelong learning is essential for autonomous robotics to acquire and refine knowledge from experience, yet existing deep neural models for video action recognition learn only fixed batches of data with predefined action classes and samples. The authors aim to create a learning system that can incrementally process perceptual cues and adapt its responses over time. They propose a self‑organizing neural architecture comprising hierarchically arranged recurrent networks that grow and adapt through prediction‑driven dynamics to reconstruct temporally ordered video inputs, enabling unsupervised learning of action representations with expanding spatiotemporal receptive fields. Experiments on two action benchmarks show the model matches state‑of‑the‑art batch learning performance even with missing or corrupted labels and can adapt to non‑stationary inputs without catastrophic interference.

Abstract

Lifelong learning is fundamental in autonomous robotics for the acquisition and fine-tuning of knowledge through experience. However, conventional deep neural models for action recognition from videos do not account for lifelong learning but rather learn a batch of training data with a predefined number of action classes and samples. Thus, there is the need to develop learning systems with the ability to incrementally process available perceptual cues and to adapt their responses over time. We propose a self-organizing neural architecture for incrementally learning to classify human actions from video sequences. The architecture comprises growing self-organizing networks equipped with recurrent neurons for processing time-varying patterns. We use a set of hierarchically arranged recurrent networks for the unsupervised learning of action representations with increasingly large spatiotemporal receptive fields. Lifelong learning is achieved in terms of prediction-driven neural dynamics in which the growth and the adaptation of the recurrent networks are driven by their capability to reconstruct temporally ordered input sequences. Experimental results on a classification task using two action benchmark datasets show that our model is competitive with state-of-the-art methods for batch learning also when a significant number of sample labels are missing or corrupted during training sessions. Additional experiments show the ability of our model to adapt to non-stationary input avoiding catastrophic interference.

References

YearCitations

Page 1