Concepedia

TLDR

Artificial neural networks excel at fixed classification tasks but suffer catastrophic forgetting when extending knowledge beyond the original task; continual learning aims to allow networks to accumulate knowledge across sequential tasks without retraining. This survey surveys continual learning for classification, presenting a taxonomy, a framework for stability‑plasticity trade‑off, and a comprehensive experimental comparison of 11 methods with baselines. The authors evaluate task‑incremental classification on Tiny ImageNet, iNaturalist, and a sequence of recognition datasets, studying how model capacity, weight decay, dropout, and task order affect memory, computation time, and storage requirements.

Abstract

Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern: (1) a taxonomy and extensive overview of the state-of-the-art; (2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner; (3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods; and (4) baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.

References

YearCitations

Page 1