Concepedia

TLDR

Reinforcement learning, particularly Q‑learning, faces a fundamental challenge in balancing exploration and exploitation. This work proposes a fidelity‑based probabilistic Q‑learning algorithm to address that trade‑off and applies it to quantum system control. The algorithm iteratively updates action probabilities using fidelity as a guiding signal within a probabilistic Q‑learning framework, and is evaluated on spin‑1/2 and Λ‑type atomic systems. Experiments show that FPQL achieves a superior exploration–exploitation balance, escapes local optima, and speeds up learning compared to conventional methods.

Abstract

The balance between exploration and exploitation is a key problem for reinforcement learning methods, especially for Q-learning. In this paper, a fidelity-based probabilistic Q-learning (FPQL) approach is presented to naturally solve this problem and applied for learning control of quantum systems. In this approach, fidelity is adopted to help direct the learning process and the probability of each action to be selected at a certain state is updated iteratively along with the learning process, which leads to a natural exploration strategy instead of a pointed one with configured parameters. A probabilistic Q-learning (PQL) algorithm is first presented to demonstrate the basic idea of probabilistic action selection. Then the FPQL algorithm is presented for learning control of quantum systems. Two examples (a spin-1/2 system and a Λ-type atomic system) are demonstrated to test the performance of the FPQL algorithm. The results show that FPQL algorithms attain a better balance between exploration and exploitation, and can also avoid local optimal policies and accelerate the learning process.

References

YearCitations

Page 1