Improving Adversarial Robustness Requires Revisiting Misclassified Examples

TLDR

Deep neural networks are vulnerable to imperceptible adversarial perturbations, and although adversarial training is the most effective defense, it typically only considers correctly classified examples, overlooking the fact that many training samples are misclassified. This study investigates how misclassified versus correctly classified examples affect adversarial training’s final robustness and introduces Misclassification Aware adversarial Training (MART) to address this issue. MART explicitly separates misclassified and correctly classified samples during training and a semi‑supervised variant further exploits unlabeled data to enhance robustness. Experiments show that misclassified examples have a significant impact on robustness, that minimization strategies are critical while maximization choices are largely irrelevant, and that MART and its variant markedly outperform existing defenses.

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial examples crafted by imperceptible perturbations. A range of defense techniques have been proposed to improve DNN robustness to adversarial examples, among which adversarial training has been demonstrated to be the most effective. Adversarial training is often formulated as a min-max optimization problem, with the inner maximization for generating adversarial examples. However, there exists a simple, yet easily overlooked fact that adversarial examples are only defined on correctly classified (natural) examples, but inevitably, some (natural) examples will be misclassified during training. In this paper, we investigate the distinctive influence of misclassified and correctly classified examples on the final robustness of adversarial training. Specifically, we find that misclassified examples indeed have a significant impact on the final robustness. More surprisingly, we find that different maximization techniques on misclassified examples may have a negligible influence on the final robustness, while different minimization techniques are crucial. Motivated by the above discovery, we propose a new defense algorithm called {\em Misclassification Aware adveRsarial Training} (MART), which explicitly differentiates the misclassified and correctly classified examples during the training. We also propose a semi-supervised extension of MART, which can leverage the unlabeled data to further improve the robustness. Experimental results show that MART and its variant could significantly improve the state-of-the-art adversarial robustness.

References

Page 1

	Year	Citations

Page 1