Auditing| sqlmesh.readthedocs.io
Relerance feedback: from ancient history to LLMs. Why relevance feedback techniques are good on paper but not popular in neural search, and what we can do about it.| qdrant.tech
When I started learning about loss functions, I could always understand the intuition behind them. For example, the mean squared error (MSE) for regression seemed logical—penalizing large deviations from the ground-truth makes sense. But one thing always bothered me: I could never come up with those loss functions on my own. Where did they come from? Why do we use these specific formulas and not something else? This frustration led me to dig deeper into the mathematical and probabilistic fo...| Rish's AI Notes
Meta's had a tough time getting people to buy into their VR Metaverse over the couple years, but all's not lost. The company has been making up for some of their missteps by offering free access to pre-trained models and open source artificial intelligence tools. Their move to introduce MusicGen in June 2023 was a direct challenge to Google's MusicLM team. Meta proved they could give users simple and controllable music generation based on any text description.During mid-March 2024, the music cop| AudioCipher
[Updated on 2019-07-18: add a section on VQ-VAE & VQ-VAE-2.] [Updated on 2019-07-26: add a section on TD-VAE.] Autocoder is invented to reconstruct high-dimensional data using a neural network model with a narrow bottleneck layer in the middle (oops, this is probably not true for Variational Autoencoder, and we will investigate it in details in later sections). A nice byproduct is dimension reduction: the bottleneck layer captures a compressed latent encoding.| lilianweng.github.io
[Updated on 2018-06-30: add two new policy gradient methods, SAC and D4PG.] [Updated on 2018-09-30: add a new policy gradient method, TD3.] [Updated on 2019-02-09: add SAC with automatically adjusted temperature]. [Updated on 2019-06-26: Thanks to Chanseok, we have a version of this post in Korean]. [Updated on 2019-09-12: add a new policy gradient method SVPG.] [Updated on 2019-12-22: add a new policy gradient method IMPALA.] [Updated on 2020-10-15: add a new policy gradient method PPG & som...| lilianweng.github.io
Professor Naftali Tishby passed away in 2021. Hope the post can introduce his cool idea of information bottleneck to more people. Recently I watched the talk “Information Theory in Deep Learning” by Prof Naftali Tishby and found it very interesting. He presented how to apply the information theory to study the growth and transformation of deep neural networks during training. Using the Information Bottleneck (IB) method, he proposed a new learning bound for deep neural networks (DNN), as ...| lilianweng.github.io
[Updated on 2018-09-30: thanks to Yoonju, we have this post translated in Korean!] [Updated on 2019-04-18: this post is also available on arXiv.] Generative adversarial network (GAN) has shown great results in many generative tasks to replicate the real-world rich content such as images, human language, and music. It is inspired by game theory: two models, a generator and a critic, are competing with each other while making each other stronger at the same time.| lilianweng.github.io
There are a few mathematical results that any researcher in applied mathematics uses on a daily basis. One of them is Jensen’s inequality, which allows bounding expectations of functions of random variables. This really happens a lot in any probabilistic arguments but also as a tool to generate inequalities and optimization algorithms. In this blog post, I will present a collection of fun facts about the inequality, from very classical to more obscure. If you know other cool ones, please ad...| Machine Learning Research Blog
In last month blog post, I presented the von Neumann entropy. It is defined as a spectral function on positive semi-definite (PSD) matrices, and leads to a Bregman divergence called the von Neumann relative entropy (or matrix Kullback Leibler divergence), with interesting convexity properties and applications in optimization (mirror descent, or smoothing) and probability (concentration inequalities for matrices). | Machine Learning Research Blog
Symmetric positive semi-definite (PSD) matrices come up in a variety of places in machine learning, statistics, and optimization, and more generally in most domains of applied mathematics. When estimating or optimizing over the set of such matrices, several geometries can be used. The most direct one is to consider PSD matrices as a convex set in the vector space of all symmetric matrices and thus to inherit the Euclidean geometry. Like for the positive orthant (which corresponds to diagonal ...| Machine Learning Research Blog
When our Python code is too slow, like most others we switch to C and often get 100x speed boosts, just like when we replaced SciPy distance computations with SimSIMD. But imagine going 100x faster than C code! It sounds crazy, especially for number-crunching tasks that are “data-parallel” and easy for compilers to optimize. In such spots the compiler will typically “unroll” the loop, vectorize the code, and use SIMD instructions to process multiple data elements in parallel.| ashvardanian.com
Back to foundations! The origins of RLHF, sociology's influence on it, the tension between human vs synthetic data, and emerging research in the field| www.latent.space
Variational autoencoders (VAEs) are a family of deep generative models with use cases that span many applications, from image processing to bioinformatics. There are two complimentary ways of viewing the VAE: as a probabilistic model that is fit using variational Bayesian inference, or as a type of autoencoding neural network. In this post, we present the mathematical theory behind VAEs, which is rooted in Bayesian inference, and how this theory leads to an emergent autoencoding algorithm. We...| Matthew N. Bernstein
Auditing| sqlmesh.readthedocs.io
The Kullback-Liebler| blog.alexalemi.com
Diffusion models have made quite a splash, especially after the open-source release of Stable Diffusion. What are diffusion models, where does the loss come from and what does a simple example look like? I've recently helped open-source a simple, pedagogical, self-contained| blog.alexalemi.com
We’re on a journey to advance and democratize artificial intelligence through open source and open science.| huggingface.co