Robust Physical-World Attacks on Deep Learning Visual Classification

TLDR

Recent studies show that state‑of‑the‑art deep neural networks are vulnerable to adversarial examples, and as these models are increasingly deployed in safety‑critical physical systems, such perturbations could lead to dangerous misclassifications. The study aims to understand physical‑world adversarial examples and to develop a two‑stage lab‑and‑field evaluation methodology that will guide the creation of resilient learning algorithms. The authors introduce RP2, a general attack algorithm that generates robust visual perturbations under varying physical conditions, and evaluate it using the proposed two‑stage methodology on real objects. RP2 achieves high targeted misclassification rates on road‑sign classifiers, attaining 100 % success in laboratory settings and 84.8 % in field tests with simple black‑and‑white stickers.

Abstract

Recent studies show that the state-of-the-art deep neural networks (DNNs) are vulnerable to adversarial examples, resulting from small-magnitude perturbations added to the input. Given that that emerging physical systems are using DNNs in safety-critical situations, adversarial examples could mislead these systems and cause dangerous situations. Therefore, understanding adversarial examples in the physical world is an important step towards developing resilient learning algorithms. We propose a general attack algorithm, Robust Physical Perturbations (RP2), to generate robust visual adversarial perturbations under different physical conditions. Using the real-world case of road sign classification, we show that adversarial examples generated using RP2 achieve high targeted misclassification rates against standard-architecture road sign classifiers in the physical world under various environmental conditions, including viewpoints. Due to the current lack of a standardized testing method, we propose a two-stage evaluation methodology for robust physical adversarial examples consisting of lab and field tests. Using this methodology, we evaluate the efficacy of physical adversarial manipulations on real objects. With a perturbation in the form of only black and white stickers, we attack a real stop sign, causing targeted misclassification in 100% of the images obtained in lab settings, and in 84.8% of the captured video frames obtained on a moving vehicle (field test) for the target classifier.

References

Page 1

	Year	Citations

Page 1