Olle - Atari mania

Presentation

Download project_pres.pdf

Quick recap. I trained Dqn and PPO agents on atari games. I used the open ai gym environment, and trained on GameName-v4 (i.e. pixels as inputs). I used the tf-agents implementations of agents, networks, replay buffers and drivers. 

Major learned points

Save checkpoints and policies

Tutorial:

https://www.tensorflow.org/agents/tutorials/10_checkpointer_policysaver_tutorial?hl=fi_FI Links to an external site.

Checkpoints (model, replay buffer, etc.): https://www.tensorflow.org/agents/api_docs/python/tf_agents/utils/common/Checkpointer?hl=fi_FI Links to an external site.

Policies (lighter version): https://www.tensorflow.org/agents/api_docs/python/tf_agents/policies/policy_saver/PolicySaver Links to an external site.

Generating Gifs

  1. Use separate, fresh, environments
  2. Ensure that the agents don't get stuck in infty loop, you could
    1. Restart and fire on life lost (break out)
    2. Limit the amount of steps (might be a pity if you have a really good policy)
  3. Use python's string format to automatically generate gifs.

The tutorial https://www.tensorflow.org/agents/tutorials/10_checkpointer_policysaver_tutorial?hl=fi_FI Links to an external site. has the code I used to generate gifs. 

Results

.zip archive with python files to train models, save them and generate .gifs :scripts.zip Download scripts.zip

Note that the scripts trainDqn and trainPPO will restore the previously trained models and continue where they left off.

Some results:

assault_dqn_1 Download assault_dqn_1

assault_dqn_10000 Download assault_dqn_10000

assault_dqn_90000 Download assault_dqn_90000

breakout_1 Download breakout_1

breakout_90000 Download breakout_90000