iRAF: A Deep Reinforcement Learning Approach for Collaborative Mobile Edge Computing IoT Networks

TLDR

AI and data‑driven methods have dramatically improved complex problem solving for IoT, and Monte Carlo tree search (MCTS) offers future‑trajectory simulation to identify optimal actions. The paper proposes iRAF, an intelligent resource‑allocation framework for collaborative mobile edge computing networks. iRAF employs a multitask deep reinforcement learning algorithm that learns from network states and task characteristics, uses self‑play training to maximize latency and power efficiency, and trains a deep neural network to predict its own actions via self‑supervised learning from Monte Carlo tree search. Numerical experiments demonstrate that iRAF improves service latency by 59.27% over greedy search and 51.71% over deep Q‑learning.

Abstract

Recently, as the development of artificial intelligence (AI), data-driven AI methods have shown amazing performance in solving complex problems to support the Internet of Things (IoT) world with massive resource-consuming and delay-sensitive services. In this paper, we propose an intelligent resource allocation framework (iRAF) to solve the complex resource allocation problem for the collaborative mobile edge computing (CoMEC) network. The core of iRAF is a multitask deep reinforcement learning algorithm for making resource allocation decisions based on network states and task characteristics, such as the computing capability of edge servers and devices, communication channel quality, resource utilization, and latency requirement of the services, etc. The proposed iRAF can automatically learn the network environment and generate resource allocation decision to maximize the performance over latency and power consumption with self-play training. iRAF becomes its own teacher: a deep neural network (DNN) is trained to predict iRAF's resource allocation action in a self-supervised learning manner, where the training data is generated from the searching process of Monte Carlo tree search (MCTS) algorithm. A major advantage of MCTS is that it will simulate trajectories into the future, starting from a root state, to obtain a best action by evaluating the reward value. Numerical results show that our proposed iRAF achieves 59.27% and 51.71% improvement on service latency performance compared with the greedy-search and the deep Q-learning-based methods, respectively.

References

Page 1

	Year	Citations

Page 1