CodeBus
www.codebus.net
Search
Sign in
Sign up
Hot Search :
Source
embeded
web
remote control
p2p
game
More...
Location :
Home
Search - iteration
Main Category
SourceCode
Documents
Books
WEB Code
Develop Tools
Other resource
Sub Category
assembly language
SCSI-ASPI
ELanguage
Disk Tools
Speech/Voice recognition/combine
Editor
Anti-virus
MultiLanguage
MPI
source in ebook
Delphi VCL
OS Develop
MiddleWare
MacOS develop
LabView
e-language
python
Search - iteration - List
[
LabView
]
WindyGridWorldQLearning
DL : 0
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. This paper presents and proves in detail a convergence theorem for Q,-learning based on that outlined in Watkins (1989). We show that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely. We also sketch extensions to the cases of non-discounted, but absorbing, Markov environments, and where many Q values can be changed each iteration, rather than just one.
Date
: 2026-01-01
Size
: 2kb
User
:
amin
CodeBus
is one of the largest source code repositories on the Internet!
Contact us :
1999-2046
CodeBus
All Rights Reserved.