A look at extending pre-trained representations with document retrieval to better solve downstream tasks.| machine learning musings
Augmenting transformer language models with sparse access of large memory matrices| machine learning musings
Leveraging the knowledge locked away in language models by reframing categorical tasks as constrained text generation.| machine learning musings
A foray into numeric precision reduction, operation fusion, pruning, knowledge distillation, and module replacement.| machine learning musings
Put on your headphones, jam out to some funky 80s rock and read about an equally funky variation on multi-head attention.| machine learning musings
A practical, code-first look at DeepMind's new haiku library.| machine learning musings
Gesture and sign recognition is a growing field in computer vision, powering accessibility tools and natural user interfaces. Most beginner projects rely on hand landmarks or small CNNs, but these often miss the bigger picture because gestures are no...| freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Near the end of its life, legendary toy manufacturer Takatoku Toys approached ARTMIC and essentially asked, “Got any cool robots?”| ZIMMERIT - Anime | Manga | Garage Kits | Doujin
Attention powers “transformers” - the seemingly complex architecture behind large language models (LLMs) like ChatGPT. But what does attention even mean?| Maharshi's blog
This question comes up a lot, usually in regards to films like "Jurassic Park" (1993) and "Transformers" (2007), especially when referring to franchise films. Some folks feel that the visual effects of a successful movie's sequels are "worse" than the original film's, even though the "technology is better". The problem with the premise of this question is that it disregards the human and creative aspects of filmmaking, instead defaulting to "technology is better, why aren't the images better?...| FXRant
Our visual effects work for "Transformers" (2007) is still being lauded to this day, which is a testament to the amazing talents of the visual effects teams at Industrial Light & Magic under the supervision of Scott Farrar, Russell Earl and Scott Benza.| FXRant
A deep dive into DeepSeek’s Multi-Head Latent Attention, including the mathematics and implementation details. The layer is recreated in Julia using Flux.jl.| liorsinai.github.io
The past three years have seen significant interest in applying language models to the task of visual document understanding – integrating spatial, textual, and visual signals to make sense of PDFs and scanned documents.| machine learning musings
Last night was, I believe, the fourth night of our recently reconstituted raid team's journey through the content that has been added to Final Fantasy XIV in our absence. Having made our way through the 5th through 7th stages of the Alexander raid we found ourselves facing what most of our group referred to as Voltron. I know who he really is though. He's Bruticus.| Thalen Speaks
A new shapeshifting robot inspired by the ancient Japanese paper-folding art of Origami could represent the future of space travel.| The Debrief
A series on automatic differentiation in Julia. Part 5 shows how the MicroGrad.jl code can be used for a machine learning framework like Flux.jl. The working...| liorsinai.github.io
A series on automatic differentiation in Julia. Part 4 extends part 3 to handle maps, getfield and anonymous functions. It creates a generic gradient descent...| liorsinai.github.io
A series on automatic differentiation in Julia. Part 3 uses metaprogramming based on IRTools.jl to generate a modified (primal) forward pass and to reverse d...| liorsinai.github.io
A series on automatic differentiation in Julia. Part 2 uses metaprogramming to generate a modified (primal) forward pass and to reverse differentiate it into...| liorsinai.github.io
A series on automatic differentiation in Julia. Part 1 provides an overview and defines explicit chain rules.| liorsinai.github.io
A transformer for generating text in Julia, trained on Shakespeare’s plays. This model can be used as a Generative Pre-trained Transformer (GPT) with further...| liorsinai.github.io
The IEA reports grids investment is expected to reach $400 billion in 2024, with Europe, US, China, parts of Latin America leading the way.| Smart Energy International
Film Tourists in Los Angeles - From Cinema Scope Magazine Cinema Scope - Cinema Scope| Cinema Scope
How we created a song for the AI Song Contest 2021 with the help of transformers and other music generation techniques.| Wingedsheep: Artificial Intelligence Blog
Introduction to Tokenization in NLP - OpenAI ChatGPT| Ankur | NLP Enthusiast
Decode the transformers network| Ankur | NLP Enthusiast
Huggingface transformers on Macbook Pro M1 GPU| Ankur | NLP Enthusiast
Optimal Transport, the Sinkhorn Transformer, and Charmin Ultra-Soft| machine learning musings
Exploring 6 noteworthy approaches for incorporating longer-term context in transformer models.| machine learning musings