Publication | Open Access
Gotta Catch'Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks
66
Citations
50
References
2020
Year
Unknown Venue
Deep neural networks (DNN) are known to be vulnerable to adversarial attacks. Numerous efforts either try to patch weaknesses in trained models, or try to make it difficult or costly to compute adversarial examples that exploit them. In our work, we explore a new "honeypot" approach to protect DNN models. We intentionally inject trapdoors, honeypot weaknesses in the classification manifold that attract attackers searching for adversarial examples. Attackers' optimization algorithms gravitate towards trapdoors, leading them to produce attacks similar to trapdoors in the feature space. Our defense then identifies attacks by comparing neuron activation signatures of inputs to those of trapdoors.
| Year | Citations | |
|---|---|---|
Page 1
Page 1