Reminder
- The game “Breakout” is in here. The observation is an array of shape (210, 160, 3).
- Because preprocess part needs to record several image, which need a queue, so it can not be just one function. As I browse the documemt of tensorflow, I think I find a queue construction in tensorflow. Maybe I can try that.
- The queues in tensorflow can be found here.
- One more clear example about queues in tensorflow is here.
If you test tensorflow gpu code in PyCharm, you will get the error
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory
. That’s because PyCharm don’t have the path variable set correctly. Path variable like this:123PATH=/usr/local/cuda/bin:$PATHCUDA_HOME=/usr/local/cudaLD_LIBRARY_PATH=/usr/local/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATHLocation need to be set:
Setting->Build, Execution, Deployment->Console->Python Console
Run->Edit Configurations
:Defaults->Python
andDefaults->Python tests->Unittests
. Delete no default settings, and it will generate setting again based on default settings.- Use tf.stack to stack image. It will convert numpy.narray into tensor.
- Although observation from gym may seem empty when you print this variable, it actually is not empty. And you don’t have to render the game UI.
- You can use
tf.image.convert_image_dtype
to convert image. - Y channel in the paper just means converting RGB image into grayscale image.
PIL.Image.Show()
don’t show any window. Solution:sudo apt-get install imagemagick
. Reference.Image.fromarray
need uint8 image.- Maybe using queue is not a good idea, because we need to get 4 images in one time.
- There is a timeline module which can be imported using
from tensorflow.python.client import timeline
. Reference. - You can not use
tf.layers.conv2d
function because this function don’t containcollection_name
parameter. It will cause some troubles when trying to replace network parameters. - Here are some codes about epsilon decay. It also contain experience replay code.
- A strange thing: when the action is
0
, the environment of openai gym don’t change; when the action is1
, the agent’s action is waiting. - This repo contain dqn and a3c.
- Python
with
statement,contextmanager
andyield
, link. - Avoid
sess.run
will significantly improve the performance. env.render()
raise an error. Because you can not call initialization of tensorflow beforeenv.render()
. Reference.- In tensorflow, you have to define all ops in the beginning, otherwise memory usage will continuously increase. Use
sess.graph.finalize()
to see if you define ops below finalize. - Tensorflow will only update variable after
sess.run(ops)
, no matter howop
inops
arrange. You have to excute some operations before others.
12self._sess.run(self._max_img_update, feed_dict={self._input: input_img})self._sess.run(self._update, feed_dict={self._input: input_img})If you excute maximizing two images and excute storing last image in the same time, it will result in
_max_img
and_last
have same value.- After define all operation before
sess.graph.finalize()
, the code run much faster than before, and there is no memory leak problem. While using deque as buffer of experience replay, there is a problem related to python version. There is no problem when you use python 2.7 or 3.5, but you will get a error while using python 3.4. Update my environment to python 3.5 fix this problem. But still there are some dependency needed to be installed.
12sudo apt-get install libssl-devsudo apt-get install make build-essential libssl-dev zlib1g-dev libbz2-dev libsqlite3-devRunning this code require about 6GB memory for the experience replay. So I upgrade my PC building.
- While training, loss always stay low, but the performance of the network is not well.
- Example about how to set the speed of environment in openai gym. But configure method has been removed, but you can set mode in render method. And disable rendering game will speed up the training process.
- Deque size larger than 65535 may cause memory exploded.
- Strange thing is that I have to wait for about 100000 runs of game, and there is no improvement of the performance.
- One transition in experience replay will take $4 \times (84 \times 84 \times 4 \times 2 + 1 + 1) = 225800$ Bytes, which is about 220KB.
- Some repositories about DQN are using another memory mechanism. They store
(s, a, r, is_terminal)
in memory, ands
just contains one image. So the memory requirement of experience replay is reduced significantly. So I intend to implement this in my code. - I got this warning today
The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
. And I search it on the net. Some say it means build tensorflow from source can speed up CPU computations. I might try to build one day. - Using lzma package, you need to compile python with lzma support.
sudo apt install build-essential zlib1g-dev libbz2-dev libncurses5-dev libreadline6-dev libsqlite3-dev libssl-dev libgdbm-dev liblzma-dev
Changes
squared gradient momentum
andmin squared momentum
can not be found in tensorflow. So this two values are not presented in code.- Because the code require too much memory, I change the experience replay size from 1000000 to 100000, replay start size from 50000 to 5000, and others remain the same.
- My new implement of memory significantly reduce usage of phisical memory, and enable us to using the original replay size of DQN.