Stage #1
In this stage I will use a specific game, “Breakout”, to test the whole structure.
- Figure out the observation given by openai gym.
- Complete preprocessing part.
- Model architecture build.
- Network parameter replace mechanism.
- Original experience replay.
- Epsilon decay.
- Link all part.
- Add debug information output.
- Save session parameter.
- Random start (no op).
- Test algorithm on game “Breakout”.
- Configuration setup for algorithm.
Stage #2
In this stage I will continue building some parts for DQN.
- Add Tensorflow tensorboard support.
- Display weight
- Test algorithm on several games. (Aborted: because the algorithm takes too much time to converge)
- Test on all games mentioned in the paper. (Same reason)
- Generate last three column of Extended Data Table 2. (Same reason)
Stage #3
- Implement another memory mechanism
- Double DQN Implementation
- Prioritized Experience Replay
- Dueling network
- DQfD
- Store reported score in papers in the repo, and display them
- Model evaluation
Double DQN
- Algorithm Implementation
- Test training
DQfQ
- Change gym to ale, because each action in gym is repeatedly performed for a duration of kk frames. Fix: Gym provide another environment which don’t have frame skip. Link
- Store for experience replay and demostration replay.
- Display replay file in pygame.
- Test algorithm.