Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

TLDR

Formal exploration in model‑based reinforcement learning typically estimates model accuracy based on data quantity or prior assumptions, ignoring empirical prediction error. The study proposes extensions that drive exploration solely from empirical estimates of learner accuracy and learning progress. The authors analyze the extensions theoretically in the standard stationary finite state‑action setting. Experiments show the exploration measures remain robust in non‑stationary environments and when prior assumptions mislead existing methods.

Abstract

Formal exploration approaches in model-based reinforcement learning estimate the accuracy of the currently learned model without consideration of the empirical prediction error. For example, PAC-MDP approaches such as R-MAX base their model certainty on the amount of collected data, while Bayesian approaches assume a prior over the transition dynamics. We propose extensions to such approaches which drive exploration solely based on empirical estimates of the learner's accuracy and learning progress. We provide a sanity check theoretical analysis, discussing the behavior of our extensions in the standard stationary finite state-action case. We then provide experimental studies demonstrating the robustness of these exploration measures in cases of non-stationary environments or where original approaches are misled by wrong domain assumptions.

References

Page 1

	Year	Citations

Page 1