Defending Bit-Flip Attack through DNN Weight Reconstruction

Abstract

Recent studies show that adversarial attacks on neural network weights, aka, Bit-Flip Attack (BFA), can degrade Deep Neural Network’s (DNN) prediction accuracy severely. In this work, we propose a novel weight reconstruction method as a countermeasure to such BFAs. Specifically, during inference, the weights are reconstructed such that the weight perturbation due to BFA is minimized or diffused to the neighboring weights. We have successfully demonstrated that our method can significantly improve the DNN robustness against random and gradient-based BFA variants. Even under the most aggressive attacks (i.e., greedy progressive bit search), our method maintains a test accuracy of 60% on ImageNet after 5 iterations while the baseline accuracy drops to below 1%.

References

Page 1

	Year	Citations

Page 1