The hierarchical highlight journalling system is as follows: Near the end of the day, write a single sentence containing your highlight of that day. At the end of the week, look back at your daily highlights. Select your favourite to be your highlight of the week. At the end of the month, look back at the weekly highlights. Select your favourite to be your highlight of the month. At the end of the year, look back at your monthly highlights. Select your favourite to be your highlight of the year.| sidsite
The site of Sid| sidsite
It seems that ChatGPT has memorised copyrighted text, but it can be difficult to get the model to output this text, because of some kind of copyright detection that OpenAI have implemented.| sidsite
Gridnotes (working title) is an infinite 2D text editor.| sidsite
Link to the demo.| sidsite
The Reversal Curse (Sep 2023, Berglund et al.) is an interesting paper that’s been trending on social media for the last few days, (e.g. Twitter thread by Neel Nanda here, Hacker News discussion here).| sidsite
A walkthrough of BPE, with a worked example and Python implementations.| sidsite
This post introduces the concept of the learning per example (LPE). LPE is a measure of how much a deep learning model has learned about each example in a given training batch.| sidsite
I trained a BERT model (Devlin et al, 2019) from scratch on my desktop PC (which has a Nvidia 3060 Ti 8GB GPU). The model architecture, tokenizer, and trainer all came from Hugging Face libraries, and my contribution was mainly setting up the code, setting up the data (~20GB uncompressed text), and leaving my computer running. (And making sure it was working correctly, with good GPU utilization.)| sidsite
The article, Analyzing Data 180,000x Faster with Rust, first presents some unoptimized Python code, and then shows the process of rewriting and optimizing the code in Rust, resulting in a 180,000x speed-up. The author notes:| sidsite