Concepedia
Publication | Closed Access
Technical Note: Q-Learning
3.6K
Citations
9
References
1992
Year
Learning from delayed rewards
Chris Watkins
OpenGrey (Institut de l'Information Scientifique et Technique)
Artificial IntelligenceEngineeringMachine LearningStochastic GameGame Theory +3
1989
5.5K
Learning to Predict by the Methods of Temporal Differences
Richard S. Sutton
Machine Learning
EngineeringMachine LearningData ScienceTemporal DifferencesPredictive Analytics +6
1988
3.9K
Applied Dynamic Programming.
Marshall Freimer, Richard Bellman, Stuart E. Dreyfus
Journal of the American Statistical Association
EngineeringDynamic EnvironmentDynamic ProgrammingComputer ScienceApplied Dynamic Programming +3
1964
2.2K
Self-improving reactive agents based on reinforcement learning, planning and teaching
Long-Ji Lin
Artificial IntelligenceEngineeringReinforcement Learning (Computer Engineering)Agent Decision-makingAutonomous Learning +10
1.6K
Introduction to Stochastic Dynamic Programming.
Donald A. Berry, Sheldon M. Ross
Stochastic SimulationEngineeringStochastic ProcessesDynamic ProgrammingSystems Engineering +4
1986
1.2K
Temporal credit assignment in reinforcement learning
ScholarWorks@UMassAmherst (University of Massachusetts Amherst)
1984
778
Input generalization in delayed reinforcement learning: an algorithm and performance comparisons
David Chapman, Leslie Pack Kaelbling
1991
249
Automatic programming of behavior-based robots using reinforcement learning
Sridhar Mahadevan, Jonathan H. Connell
Artificial Intelligence
Artificial IntelligenceEngineeringRobotic AgentAutomationAction Model Learning +6
150
Learning control of finite Markov chains with an explicit trade-off between estimation and control
Mitsuo Satõ, K. Abe, Hiroshi Takeda
IEEE Transactions on Systems Man and Cybernetics
EngineeringMachine LearningValue Function ApproximationLearning ControlLifelong Reinforcement Learning +15
34
Page 1