Lecture 7: Introduction to Reinforcement Learning and Q-learning
(Big slides Download Big slides) (Small slides Download Small slides) (Recording)
After a general introduction, this lecture gives a general formulation of dynamic programming in discrete time.
- AlphaZero, off-line training and on-line play
- Stochastic dynamic programming
- Stochastic shortest path problems
- Q-learning
Read sections 1-2 in Bertsekas' tutorial paper Lessons from AlphaZero Download tutorial paper Lessons from AlphaZero plus sections 4.1-4.2 before Assumption 4.2.1 and 5.4 before 5.4.1 in his book Links to an external site..