YADQN Note #2

Reminder

  1. The game “Breakout” is in here. The observation is an array of shape (210, 160, 3).
  2. Because preprocess part needs to record several image, which need a queue, so it can not be just one function. As I browse the documemt of tensorflow, I think I find a queue construction in tensorflow. Maybe I can try that.
  3. The queues in tensorflow can be found here.
  4. One more clear example about queues in tensorflow is here.
  5. If you test tensorflow gpu code in PyCharm, you will get the error ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory. That’s because PyCharm don’t have the path variable set correctly. Path variable like this:

    1
    2
    3
    PATH=/usr/local/cuda/bin:$PATH
    CUDA_HOME=/usr/local/cuda
    LD_LIBRARY_PATH=/usr/local/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH

    Location need to be set:
    Setting->Build, Execution, Deployment->Console->Python Console
    Run->Edit Configurations: Defaults->Python and Defaults->Python tests->Unittests. Delete no default settings, and it will generate setting again based on default settings.

  6. Use tf.stack to stack image. It will convert numpy.narray into tensor.
  7. Although observation from gym may seem empty when you print this variable, it actually is not empty. And you don’t have to render the game UI.
  8. You can use tf.image.convert_image_dtype to convert image.
  9. Y channel in the paper just means converting RGB image into grayscale image.
  10. PIL.Image.Show() don’t show any window. Solution: sudo apt-get install imagemagick. Reference. Image.fromarray need uint8 image.
  11. Maybe using queue is not a good idea, because we need to get 4 images in one time.
  12. There is a timeline module which can be imported using from tensorflow.python.client import timeline. Reference.
  13. You can not use tf.layers.conv2d function because this function don’t contain collection_name parameter. It will cause some troubles when trying to replace network parameters.
  14. Here are some codes about epsilon decay. It also contain experience replay code.
  15. A strange thing: when the action is 0, the environment of openai gym don’t change; when the action is 1, the agent’s action is waiting.
  16. This repo contain dqn and a3c.
  17. Python with statement, contextmanager and yield, link.
  18. Avoid sess.run will significantly improve the performance.
  19. env.render() raise an error. Because you can not call initialization of tensorflow before env.render(). Reference.
  20. In tensorflow, you have to define all ops in the beginning, otherwise memory usage will continuously increase. Use sess.graph.finalize() to see if you define ops below finalize.
  21. Tensorflow will only update variable after sess.run(ops), no matter how op in ops arrange.
  22. You have to excute some operations before others.

    1
    2
    self._sess.run(self._max_img_update, feed_dict={self._input: input_img})
    self._sess.run(self._update, feed_dict={self._input: input_img})

    If you excute maximizing two images and excute storing last image in the same time, it will result in _max_img and _last have same value.

  23. After define all operation before sess.graph.finalize(), the code run much faster than before, and there is no memory leak problem.
  24. While using deque as buffer of experience replay, there is a problem related to python version. There is no problem when you use python 2.7 or 3.5, but you will get a error while using python 3.4. Update my environment to python 3.5 fix this problem. But still there are some dependency needed to be installed.

    1
    2
    sudo apt-get install libssl-dev
    sudo apt-get install make build-essential libssl-dev zlib1g-dev libbz2-dev libsqlite3-dev
  25. Running this code require about 6GB memory for the experience replay. So I upgrade my PC building.

  26. While training, loss always stay low, but the performance of the network is not well.
  27. Example about how to set the speed of environment in openai gym. But configure method has been removed, but you can set mode in render method. And disable rendering game will speed up the training process.
  28. Deque size larger than 65535 may cause memory exploded.
  29. Strange thing is that I have to wait for about 100000 runs of game, and there is no improvement of the performance.
  30. One transition in experience replay will take $4 \times (84 \times 84 \times 4 \times 2 + 1 + 1) = 225800$ Bytes, which is about 220KB.
  31. Some repositories about DQN are using another memory mechanism. They store (s, a, r, is_terminal) in memory, and s just contains one image. So the memory requirement of experience replay is reduced significantly. So I intend to implement this in my code.
  32. I got this warning today The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.. And I search it on the net. Some say it means build tensorflow from source can speed up CPU computations. I might try to build one day.
  33. Using lzma package, you need to compile python with lzma support. sudo apt install build-essential zlib1g-dev libbz2-dev libncurses5-dev libreadline6-dev libsqlite3-dev libssl-dev libgdbm-dev liblzma-dev

Changes

  1. squared gradient momentum and min squared momentum can not be found in tensorflow. So this two values are not presented in code.
  2. Because the code require too much memory, I change the experience replay size from 1000000 to 100000, replay start size from 50000 to 5000, and others remain the same.
  3. My new implement of memory significantly reduce usage of phisical memory, and enable us to using the original replay size of DQN.
Contents
  1. 1. Reminder
  2. 2. Changes
|