Non-IID data and Continual Learning processes in Federated Learning: A long road ahead

TLDR

Federated Learning enables collaborative model training while preserving data privacy, yet it is vulnerable to statistical heterogeneity across devices and over time, which can hinder convergence and has prompted many heterogeneous‑aware methods that often overlook the specific types of heterogeneity involved. This study formally classifies data statistical heterogeneity and reviews prominent Federated Learning strategies that address it, proposing methods readily adaptable to Federated settings to enhance performance. The authors introduce techniques from other machine learning frameworks, adapt them to Federated Learning, and empirically evaluate their effects on various non‑IID data scenarios. Empirical results confirm that different types of non‑IID data negatively affect Federated Learning performance, illustrating the practical impact of heterogeneity.

Abstract

Federated Learning is a novel framework that allows multiple devices or institutions to train a machine learning model collaboratively while preserving their data private. This decentralized approach is prone to suffer the consequences of data statistical heterogeneity, both across the different entities and over time, which may lead to a lack of convergence. To avoid such issues, different methods have been proposed in the past few years. However, data may be heterogeneous in lots of different ways, and current proposals do not always determine the kind of heterogeneity they are considering. In this work, we formally classify data statistical heterogeneity and review the most remarkable learning Federated Learning strategies that are able to face it. At the same time, we introduce approaches from other machine learning frameworks. In particular, Continual Learning strategies are worthy of special attention, since they are able to handle habitual kinds of data heterogeneity. Throughout this paper, we present many methods that could be easily adapted to the Federated Learning settings to improve its performance. Apart from theoretically discussing the negative impact of data heterogeneity, we examine it and show some empirical results using different types of non-IID data.

References

Page 1

	Year	Citations

Page 1