Publication | Open Access
Reinforcement Learning for Solving Multiple Vehicle Routing Problem with Time Window
30
Citations
27
References
2024
Year
Artificial IntelligenceEngineeringMachine LearningMulti-agent LearningIntelligent SystemsGradient PropagationOperations ResearchVehicle RoutingTime WindowData ScienceSystems EngineeringRobot LearningCombinatorial OptimizationMulti-agent PlanningAgent ModelComputer ScienceAutonomous DrivingRoute ChoiceReal-time Decision-makingRoute PlanningVehicle Routing Problem
Vehicle routing problem with time window (VRPTW) is of great importance for a wide spectrum of services and real-life applications, such as online take-out and car-hailing platforms. A promising method should generate high-qualified solutions within limited inference time, and there are three major challenges: (a) directly optimizing the goal with several practical constraints; (b) efficiently handling individual time-window limits; and (c) modeling the cooperation among the vehicle fleet. In this article, we present an end-to-end reinforcement learning framework to solve VRPTW. First, we propose an agent model that encodes constraints into features as the input and conducts harsh policy on the output when generating deterministic results. Second, we design a time penalty augmented reward to model the time-window limits during gradient propagation. Third, we design a task handler to enable the cooperation among different vehicles. We perform extensive experiments on two real-world datasets and one public benchmark dataset. Results demonstrate that our solution improves the performance by up to 11.7% compared to other RL baselines and could generate solutions for instances within seconds, while existing heuristic baselines take for minutes as well as maintain the quality of solutions. Moreover, our solution is thoroughly analyzed with meaningful implications due to the real-time response ability.
| Year | Citations | |
|---|---|---|
Page 1
Page 1