Lecture 7: Introduction to Reinforcement Learning and Q-learning

(Big slides Download Big slides) (Small slides Download Small slides) (Recording)

After a general introduction, this lecture gives a general formulation of dynamic programming in discrete time.

  • AlphaZero, off-line training and on-line play
  • Stochastic dynamic programming
  • Stochastic shortest path problems
  • Q-learning

Read sections 1-2 in Bertsekas' tutorial paper Lessons from AlphaZero Download tutorial paper Lessons from AlphaZero plus sections 4.1-4.2 before Assumption 4.2.1 and 5.4 before 5.4.1 in his book Links to an external site..