Laboratory 2: Reinforcement Learning

The aim of Lab2 is to get you familiarized with some of the main concepts of (deep) Reinforcement Learning. Since this topic is quite complex, you will not be building any of the components on your own - everything you need is already written for you in a jupyter notebook. All you have to do is run the notebooks, study the concepts and show up for a short oral examination to discuss what you have learned.

To be more precise, we expect you to run a model which is coded up for you in the notebook E10 of the lab, and explore some different combinations of the parameters to see if you can improve the behavior of the agent.

Steps for running the lab

Login to one of the lab computers at heron-xx.control.lth.se or philon-xx.control.lth.se
Choose frtn75-rl and the version 0.2.2
Go through all of the available notebooks. Yes, that means everything from E05 to E10
The problem statement for the lab is at the end of exercise E10. Run this model with at least 3 different combinations of parameters and write down your findings (the graphs are there to help you). Write them down however you prefer - you will not be submitting them but you will be orally presenting them to the TA.

Lab examination

At the lab session, I expect you to sit down in pairs and spend about 15 minutes talking to your TA about the notebooks. You will be asked questions about the exercise notebooks to see if you have actually gone through them. Here are some example questions:

talk about the policy and Q-function used in E10
- what are they
- how do we represent them
- how did we parametrize them
which algorithm was used in E10
- what is an actor
- what is a critic
which combinations of parameters did you try
- which effect did they have on the model

Keep in mind that we also expect you to run notebooks E05-E09, so questions from those are fair game as well. Some examples include:

what is gradient descent
what is overfitting/underfitting
what is a neural network
what is ADAM?

A general heuristic is - if something is the title of a section, you should know how to define it and you should know how we use it.

Time and place

Come to Lab B in the basement of KC:M. You can put your self into a time slot at this link. The sign ups are now closed. You can still use the link to find your slot.

The final examination will be held on Thursday 2023-02-23 and Friday 2023-02-24.

Warning

Training the model in question takes a long time (4 to 6 hours). Make sure to start solving this exercise at least a week before your slot. It took me (your TA) three full working days to go through the notebooks E05-E10 and train all of the models properly.

There is no method of "speedrunning" the lab. One training session of the final model takes 4 hours no matter how you do it (unless you, for some reason, have a scientific computing cluster at home). Take your time or you will fail to train the model enough times.

Contact

Questions about the lab should be sent to luka.bakovic@control.lth.se. Please put '[FRTN75]' at the start of your subject field. If your question is about a jupyter notebook, please let me know which specific notebook and which part of the notebook you are asking a question about.