tl;dr The authors propose PQN, a simplified deep online Q-Learning that uses very small replay buffers. Normalization and parallelized sampling from vectorized environments stabilizes training without the need for huge replay buffers. PQN is competitive with more complex methods such as Rainbow, PPO-RNN, QMix while being 50x faster than traditional DQN.| VITALab