A deep dive into DeepSeek’s Multi-Head Latent Attention, including the mathematics and implementation details. The layer is recreated in Julia using Flux.jl.| liorsinai.github.io
The Martinez-Rueda algorithm computes boolean operations between polygons. It can be used for polygon intersections (polygon clipping), unions, differences a...| liorsinai.github.io
A series on automatic differentiation in Julia. Part 5 shows how the MicroGrad.jl code can be used for a machine learning framework like Flux.jl. The working...| liorsinai.github.io
A series on automatic differentiation in Julia. Part 4 extends part 3 to handle maps, getfield and anonymous functions. It creates a generic gradient descent...| liorsinai.github.io
A series on automatic differentiation in Julia. Part 3 uses metaprogramming based on IRTools.jl to generate a modified (primal) forward pass and to reverse d...| liorsinai.github.io
A series on automatic differentiation in Julia. Part 2 uses metaprogramming to generate a modified (primal) forward pass and to reverse differentiate it into...| liorsinai.github.io
A series on automatic differentiation in Julia. Part 1 provides an overview and defines explicit chain rules.| liorsinai.github.io
Quantifying how likely each birthday is present (covered) in some large group of people.| liorsinai.github.io
A transformer for generating text in Julia, trained on Shakespeare’s plays. This model can be used as a Generative Pre-trained Transformer (GPT) with further...| liorsinai.github.io
A radix tree in Julia, built following Test Driven Development (TDD).| liorsinai.github.io
Description of the Weiler-Atherton polygon clipping algorithm.| liorsinai.github.io
How to calculate the statistical distance between two 2D distributions of points. But first a lesson in bad statistics, the pitfalls of visual solutions and ...| liorsinai.github.io