Publication | Closed Access
Resource Allocation Based on Deep Reinforcement Learning in IoT Edge Computing
288
Citations
32
References
2020
Year
EngineeringMachine LearningEdge DeviceEducationReinforcement Learning (Educational Psychology)Reinforcement Learning (Computer Engineering)Data ScienceInternet Of ThingsComputer EngineeringResource Allocation PolicyComputer ScienceMobile ComputingDeep LearningEdge ArchitectureMarkov Decision ProcessDeep Reinforcement LearningEdge ComputingCloud ComputingIot Edge ComputingMulti-access Edge ComputingMobile Edge ComputingResource AllocationResource Optimization
Mobile edge computing enables processing of IoT data at the network edge, but its limited virtual resources are shared and contested by edge applications. The authors propose a resource‑allocation policy aimed at maximizing resource‑utilization efficiency in IoT edge computing. They model the allocation as a Markov decision process and solve it with a deep reinforcement‑learning approach, introducing an improved DQN that uses multiple replay memories to reduce mutual interference. Simulations demonstrate that the improved DQN converges faster than the baseline and yields lower job completion times with fewer resources than competing policies.
By leveraging mobile edge computing (MEC), a huge amount of data generated by Internet of Things (IoT) devices can be processed and analyzed at the network edge. However, the MEC system usually only has the limited virtual resources, which are shared and competed by IoT edge applications. Thus, we propose a resource allocation policy for the IoT edge computing system to improve the efficiency of resource utilization. The objective of the proposed policy is to minimize the long-term weighted sum of average completion time of jobs and average number of requested resources. The resource allocation problem in the MEC system is formulated as a Markov decision process (MDP). A deep reinforcement learning approach is applied to solve the problem. We also propose an improved deep Q-network (DQN) algorithm to learn the policy, where multiple replay memories are applied to separately store the experiences with small mutual influence. Simulation results show that the proposed algorithm has a better convergence performance than the original DQN algorithm, and the corresponding policy outperforms the other reference policies by lower completion time with fewer requested resources.
| Year | Citations | |
|---|---|---|
Page 1
Page 1