tl;dr The authors propose PQN, a simplified deep online Q-Learning that uses very small replay buffers. Normalization and parallelized sampling from vectorized environments stabilizes training without the need for huge replay buffers. PQN is competitive with more complex methods such as Rainbow, PPO-RNN, QMix while being 50x faster than traditional DQN.| VITALab
At a high level, all reinforcement learning (RL) approaches can be categorized into 2 main types: Model-based and model-free. One might think that this is referring to whether or not we’re using an ML model. However, this is actually referring to whether we have a model of the environment. We’ll discuss more about this during this blog post.| Dilith Jayakody
Q-learning and SARSA are two of the algorithms that one generally encounters early in the journey of learning reinforcement learning. However, despite the high similarity between these two algorithms, in practice, Q-learning often takes prominence in terms of performance. In this blog post, we’ll discuss the similarities and differences between these two algorithms, as well as the reason for the strength of one over the other.| Dilith Jayakody
Introduced in 2017 by John Schulman et al., Proximal Policy Optimization (PPO) still stands out as a reliable and effective reinforcement learning algorithm. In this blog post, we’ll explore the fundamentals of PPO, its evolution from Trust Region Policy Optimization (TRPO), how it works, and its challenges.| Dilith Jayakody
Trust Region Policy Optimization (TRPO) is a Policy Gradient method that addresses many of the issues of Vanilla Policy Gradients (VPG). Despite not being state-of-the-art currently, it paved the path for more robust algorithms like Proximal Policy Optimization (PPO).| Dilith Jayakody
Deep reinforcement learning is a powerful technique for creating effective decision-making systems, but its complexity has hindered widespread adoption. Despite the perceived cost of RL, a wide range of interesting applications are already feasible with current techniques. The main barrier to broader use of RL is now the lack of accessible tooling and infrastructure. In […]| Clemens' Blog
The rapid progress in deep reinforcement learning (RL) over the last few years holds the promise of fixing the shortcomings of computer opponents in video games and of unlocking entirely new regions in game design space. However, the exorbitant engineering effort and hardware investments required to train neural networks that master complex real-time strategy games […]| Clemens' Blog
Within these pages are recorded my attempts to wield the highest arcane art and conjure minds that play the game of CodeCraft. Humble Beginnings As all advanced AI technologies, our tale begins with hacky plumbing that lets our game speak in the serpent’s tongue and links its fleeting worlds with magic mirrors made of chrome […]| Clemens' Blog
I spent a good chunk of my time over the last two years applying deep reinforcement learning techniques to create an AI that can play the CodeCraft real-time strategy game. My primary motivation was to learn how to tackle nontrivial problems with machine learning and become proficient with modern auto-differentiation frameworks. Thousands of experiment runs […]| Clemens' Blog
The capabilities of game-playing AIs have grown rapidly over the last few years. This trend has culminated in the defeat of top human players in the complex real-time strategy (RTS) games of DoTA 2 [1] and StarCraft II [2] in 2019. Alas, the exorbitant engineering and compute resources employed by these projects has made their replication difficult. […]| Clemens' Blog