Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

TLDR

Federated Learning trains visual models on privacy‑preserving data from distributed mobile devices, but the data statistics vary widely across devices. This study investigates how non‑identical data distributions affect visual classification performance in Federated Learning. The authors synthesize datasets with a continuous spectrum of distribution similarity, evaluate the Federated Averaging algorithm on them, and introduce server‑momentum as a mitigation strategy. On CIFAR‑10, classification accuracy improves from 30.1 % to 76.9 % as skew increases, and server‑momentum reduces the performance degradation.

Abstract

Federated Learning enables visual models to be trained in a privacy-preserving way using real-world data from mobile devices. Given their distributed nature, the statistics of the data across these devices is likely to differ significantly. In this work, we look at the effect such non-identical data distributions has on visual classification via Federated Learning. We propose a way to synthesize datasets with a continuous range of identicalness and provide performance measures for the Federated Averaging algorithm. We show that performance degrades as distributions differ more, and propose a mitigation strategy via server momentum. Experiments on CIFAR-10 demonstrate improved classification performance over a range of non-identicalness, with classification accuracy improved from 30.1% to 76.9% in the most skewed settings.

References

Page 1

	Year	Citations

Page 1