Lecture 5: Q-learning
(Big slides) (Small slides) (Recording)
After a general introduction, this lecture gives a general formulation of dynamic programming in discrete time.
- AlphaZero, off-line training and on-line play
- Stochastic dynamic programming
- Stochastic shortest path problems
- Q-learning
Read sections 1-2 in Bertsekas' tutorial paper Lessons from AlphaZero plus sections 4.1-4.2 before Assumption 4.2.1 and 5.4 before 5.4.1 in his book.