GAN-Powered Deep Distributional Reinforcement Learning for Resource Management in Network Slicing

TLDR

Network slicing is a key 5G technology that dynamically allocates shared physical resources to diverse services, making demand‑aware resource allocation critical. This study addresses resource allocation among multiple slices in a radio access network by modeling service demand as the environment state and resource allocation as the action. We propose a GAN‑powered deep distributional Q‑network (GAN‑DDQN) with reward‑clipping and a dueling generator to learn the action‑value distribution and mitigate randomness in SLA satisfaction and spectrum efficiency. Simulation results confirm that GAN‑DDQN and its dueling variant outperform baseline methods in resource allocation performance.

Abstract

Network slicing is a key technology in 5G communications system. Its purpose is to dynamically and efficiently allocate resources for diversified services with distinct requirements over a common underlying physical infrastructure. Therein, demand-aware resource allocation is of significant importance to network slicing. In this paper, we consider a scenario that contains several slices in a radio access network with base stations that share the same physical resources (e.g., bandwidth or slots). We leverage deep reinforcement learning (DRL) to solve this problem by considering the varying service demands as the environment state and the allocated resources as the environment action. In order to reduce the effects of the annoying randomness and noise embedded in the received service level agreement (SLA) satisfaction ratio (SSR) and spectrum efficiency (SE), we primarily propose generative adversarial network-powered deep distributional Q network (GAN-DDQN) to learn the action-value distribution driven by minimizing the discrepancy between the estimated action-value distribution and the target action-value distribution. We put forward a reward-clipping mechanism to stabilize GAN-DDQN training against the effects of widely-spanning utility values. Moreover, we further develop Dueling GAN-DDQN, which uses a specially designed dueling generator, to learn the action-value distribution by estimating the state-value distribution and the action advantage function. Finally, we verify the performance of the proposed GAN-DDQN and Dueling GAN-DDQN algorithms through extensive simulations.

References

Page 1

	Year	Citations

Page 1