Publication | Open Access
Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification
122
Citations
47
References
2021
Year
Hardware TrojanConvolutional Neural NetworkEngineeringMachine LearningEvasion TechniqueInformation ForensicsImage ClassifiersHardware SecurityData SciencePattern RecognitionAdversarial Machine LearningFeature LearningThreat DetectionComputer ScienceNeural NetworksDeep LearningTrojan AttacksData SecurityDeep Neural NetworksControlled Detoxification
Trojan (backdoor) attack is a form of adversarial attack on deep neural networks where the attacker provides victims with a model trained/retrained on malicious data. The backdoor can be activated when a normal input is stamped with a certain pattern called trigger, causing misclassification. Many existing trojan attacks have their triggers being input space patches/objects (e.g., a polygon with solid color) or simple input transformations such as Instagram filters. These simple triggers are susceptible to recent backdoor detection algorithms. We propose a novel deep feature space trojan attack with five characteristics: effectiveness, stealthiness, controllability, robustness and reliance on deep features. We conduct extensive experiments on 9 image classifiers on various datasets including ImageNet to demonstrate these properties and show that our attack can evade state-of-the-art defense.
| Year | Citations | |
|---|---|---|
Page 1
Page 1