Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features

Abstract

With the prevalent use of Deep Neural Networks (DNNs) in many applications, security of these networks is of importance. Pre-trained DNNs may contain backdoors that are injected through poisoned training. These trojaned models perform well when regular inputs are provided, but misclassify to a target output label when the input is stamped with a unique pattern called trojan trigger. Recently various backdoor detection and mitigation systems for DNN based AI applications have been proposed. However, many of them are limited to trojan attacks that require a specific patch trigger. In this paper, we introduce composite attack, a more flexible and stealthy trojan attack that eludes backdoor scanners using trojan triggers composed from existing benign features of multiple labels. We show that a neural network with a composed backdoor can achieve accuracy comparable to its original version on benign data and misclassifies when the composite trigger is present in the input. Our experiments on 7 different tasks show that this attack poses a severe threat. We evaluate our attack with two state-of-the-art backdoor scanners. The results show none of the injected backdoors can be detected by either scanner. We also study in details why the scanners are not effective. In the end, we discuss the essence of our attack and propose possible defense.

References

Page 1

	Year	Citations

Page 1