Concepedia

Abstract

Summary A multi-armed bandit problem is investigated in which rewards obtained from pulls of any arm depend on the states of the other arms, as well as on the state of the arm pulled. A Dynamic Allocation Index is defined for this class of problems, and it is shown that this leads to optimal policies.

References

YearCitations

Page 1