Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

TLDR

Deep neural networks excel at many tasks yet are easily fooled by adversarial examples, and existing defenses have limited success or high computational cost. This work introduces feature squeezing as a strategy to harden DNNs by detecting adversarial inputs. Feature squeezing shrinks the adversarial search space by mapping many feature vectors to a single sample, implemented via color bit‑depth reduction and spatial smoothing. Comparing predictions on original versus squeezed inputs yields high‑accuracy detection with few false positives, and the inexpensive methods can be combined to achieve strong rates against state‑of‑the‑art attacks.

Abstract

Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by \emph{adversarial examples} that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, \emph{feature squeezing}, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives. This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.

References

Page 1

	Year	Citations

Page 1