Publication | Closed Access
Learning evaluation functions for global optimization
60
Citations
50
References
1998
Year
Unknown Venue
In complex sequential decision problems suchasscheduling factory production, planning medical treatments, and playing backgammon, optimal decision policies are in general unknown, and it is often difficult, even for human domain experts, to manually encode good decision policies in software. The reinforcement-learning methodology of "value function approximation" (VFA) offers an alternative: systems can learn effective decision policies autonomously, simply by simulating the task and keeping statistics on which decisions lead to good ultimate performance and which do not. This thesis advances the state of the art in VFA in two ways. First, it
| Year | Citations | |
|---|---|---|
Page 1
Page 1