Publication | Closed Access
An analytic solution to discrete Bayesian reinforcement learning
294
Citations
14
References
2006
Year
Unknown Venue
Artificial IntelligenceEngineeringMachine LearningValue Function ApproximationBayesian Reinforcement LearningOnline LearningMulti-agent LearningIntelligent SystemsStochastic GameUncertainty QuantificationManagementRobot LearningDecision TheoryEffective Online LearningOnline AlgorithmSequential Decision MakingProbability TheoryComputer ScienceMarkov Decision ProcessExploration V Exploitation
Reinforcement learning was originally designed for online learning, but existing algorithms require costly exploration, so it is mainly used offline in simulations. The authors introduce BEETLE, an online learning algorithm that is computationally efficient and reduces exploration. BEETLE adopts a Bayesian model‑based framework for partially observable Markov decision processes, analytically proving that the optimal value function is the upper envelope of multivariate polynomials and using a point‑based value iteration algorithm that exploits this parameterization. The algorithm demonstrates that the optimal value function can be represented as the upper envelope of multivariate polynomials and achieves efficient online learning through a point‑based value iteration method.
Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time consuming for online learning. As a result, RL is mostly used for offline learning in simulated environments. We propose a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration. We take a Bayesian model-based approach, framing RL as a partially observable Markov decision process. Our two main contributions are the analytical derivation that the optimal value function is the upper envelope of a set of multivariate polynomials, and an efficient point-based value iteration algorithm that exploits this simple parameterization.
| Year | Citations | |
|---|---|---|
Page 1
Page 1