Publication | Closed Access
Safe Reinforcement Learning for Single Train Trajectory Optimization via Shield SARSA
31
Citations
36
References
2022
Year
Railway TrafficTrajectory PlanningMachine LearningReinforcement Learning (Computer Engineering)EngineeringSafe Reinforcement LearningComputer EngineeringTrain Timetable OptimizationSystems EngineeringEducationTrain ControlSpeed Profile OptimizationReinforcement Learning (Educational Psychology)Learning ControlTrajectory OptimizationShield SarsaDynamic OptimizationLinear Optimization
The single train trajectory optimization, also known as speed profile optimization (SPO), is a traditional problem to minimize the traction energy consumption of trains. As a kind of optimal method, reinforcement learning (RL) has been used to solve the SPO problem. In the learning process of a common RL algorithm, a soft constraint (punishment) is always used to keep the agent away from unsafe states. However, a soft constraint can not guarantee and explain the safety of the result. For the SPO problem, it means that the optimized speed profile obtained by a simple RL may break the speed limit which is unacceptable in reality. This paper proposes a protection mechanism called Shield and constructs a Shield SARSA ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${S}$ </tex-math></inline-formula> -SARSA) algorithm to protect the learning process of the high-speed train. Four different reward functions are used to compare the protective efficacy between the proposed algorithm and the soft constraint. The numerical experiments based on the line data from Wuxi East to Suzhou North verify the protective efficacy and effectiveness.
| Year | Citations | |
|---|---|---|
Page 1
Page 1