The Kalman Filtering Process| Dilith Jayakody
Sharpness-Aware Minimization (SAM) is an optimization technique that minimizes both the loss and sharpness of a given objective function. It was proposed by P. Foret et al. in their paper titled “Sharpness-Aware Minimization for Efficiently Improving Generalization” during their time at Google. The technique exhibits several benefits such as improved efficiency, generalization, and robustness to local noise. Further, the algorithm is easier to implement due to the absence of 2nd order der...| Dilith Jayakody
Model Predictive Path Integral (MPPI) Control is a powerful algorithm introduced in a 2015 publication titled “Model Predictive Path Integral Control Using Covariance Variable Importance Sampling.” Developed by a research group, MPPI offers a unique approach to controlling nonlinear systems subject to specific disturbances.| Dilith Jayakody
The reparameterization trick is an ingenious method to sidestep the challenge of backpropagating through a random or stochastic node within a neural network. This has found prominence, particularly in the context of Variational Autoencoders (VAEs). In this blog post, we will discuss what the reparameterization trick is and what it solves.| Dilith Jayakody
At a high level, all reinforcement learning (RL) approaches can be categorized into 2 main types: Model-based and model-free. One might think that this is referring to whether or not we’re using an ML model. However, this is actually referring to whether we have a model of the environment. We’ll discuss more about this during this blog post.| Dilith Jayakody
Importance Sampling is a tool that helps us tackle a common challenge: calculating expectations. While that might sound like a straightforward task, it often becomes a formidable problem, especially when dealing with high-dimensional data. In this blog post, we’ll delve into the intricacies of this technique and explore its significance.| Dilith Jayakody
Q-learning and SARSA are two of the algorithms that one generally encounters early in the journey of learning reinforcement learning. However, despite the high similarity between these two algorithms, in practice, Q-learning often takes prominence in terms of performance. In this blog post, we’ll discuss the similarities and differences between these two algorithms, as well as the reason for the strength of one over the other.| Dilith Jayakody
Introduced in 2017 by John Schulman et al., Proximal Policy Optimization (PPO) still stands out as a reliable and effective reinforcement learning algorithm. In this blog post, we’ll explore the fundamentals of PPO, its evolution from Trust Region Policy Optimization (TRPO), how it works, and its challenges.| Dilith Jayakody
Transformers are a class of models that has gained a lot of traction over the years, especially in the domain of natural language processing and understanding.| Dilith Jayakody
Trust Region Policy Optimization (TRPO) is a Policy Gradient method that addresses many of the issues of Vanilla Policy Gradients (VPG). Despite not being state-of-the-art currently, it paved the path for more robust algorithms like Proximal Policy Optimization (PPO).| Dilith Jayakody