Publication | Closed Access
Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information
26
Citations
35
References
2020
Year
Unknown Venue
Artificial IntelligenceReward HackingEngineeringMachine LearningData ScienceDeep Reinforcement LearningTraining MethodsAttack ModelAdversarial Machine LearningAi SafetySequential Decision MakingComputer ScienceMulti-agent LearningRobot LearningBlackbox AttacksDeep LearningRl Agents
Recent research on reinforcement learning (RL) has suggested that trained agents are vulnerable to maliciously-crafted adversarial samples. In this work, we show how such samples can be generalised from White-box and Grey-box attacks to a strong Black-box case, where the attacker has no knowledge of the agents, their training parameters or their training methods. We use sequence-to-sequence models to predict a single action or a sequence of future actions that a trained agent will make. First, we show that our approximation model, based on time-series information from the agent, consistently predicts RL agents' future actions with high accuracy in a Black-box setup on a wide range of games and RL algorithms. Second, we find that although adversarial samples are transferable from the sequence-to-sequence model to our RL agents, they often outperform Random Gaussian Noise only marginally. Third, we propose a novel use for adversarial samples in Black-box attacks of RL agents: they can be used to trigger a trained agent to misbehave after a specific time delay. This potentially enables an attacker to use devices controlled by RL agents as time bombs.
| Year | Citations | |
|---|---|---|
Page 1
Page 1