You need to have JavaScript enabled in order to access this site.

Lecture 7: Introduction to Reinforcement Learning and Q-learning

(Big slides Download Big slides) (Small slides Download Small slides) (Recording)

After a general introduction, this lecture gives a general formulation of dynamic programming in discrete time.

AlphaZero, off-line training and on-line play
Stochastic dynamic programming
Stochastic shortest path problems
Q-learning

Read sections 1-2 in Bertsekas' tutorial paper Lessons from AlphaZero Download tutorial paper Lessons from AlphaZero plus sections 4.1-4.2 before Assumption 4.2.1 and 5.4 before 5.4.1 in his book Links to an external site..