Exporting Nomic’s Mixture of Experts model to ONNX| skeptric.com
Motivation| skeptric.com
Motivation| skeptric.com
Getting the candidates| skeptric.com
Generative language models trained to predict what comes next have been shown to be a very useful foundation for models that can perform a wide variety of traditionally difficult language tasks. Perplexity is the standard measure of how well such a model can predict the next word on a given text, and it’s very closely related to cross-entropy and bits-per-byte. It’s a measure of how effective the language model is on the text, and in certain settings aligns with how well the model perform...| skeptric
Lossless compression is the process of taking some data and squishing it into a smaller space without losing any information. It is ubiquitous in data storage and transfer, for example this web page was almost certainly compressed as it was sent from my server to your browser. It is somewhat surprising there are general algorithms that effectively compress most real files whether they store text, audio, images or binary files. However there is one kind of redundancy that’s very common in al...| skeptric
Lossless compression algorithms are almost magical; you can take some data source squish it down into a smaller space, and then restore it back to its full size later. They let us store and transfer large amounts of data for a fraction of the cost of the total data. The algorithms aren’t really magical though, they just exploit redundancies in the data like repeated substrings and some strings being used much more frequently than others. For example suppose I need to store a large list of E...| skeptric
Deconstructing the model| skeptric.com
In 2023 I completed the Stanford AI Professional Program to deepen my understanding of Artificial Intelligence, especially with natural language. The courses I took were great and definitely broadened my perspective of AI, but I could have obtained a lot of without paying the course fee and I don’t think the digital credential means a lot. There were some aspects that were better as part of a course, in particular the assignments and course project for Natural Language Understanding, but if...| skeptric
Removing footers and headers| skeptric.com
Regular expression generation| skeptric.com
Reading the Catalog| skeptric.com
Weights and biases| skeptric.com
Walking through the model| skeptric.com
Cloud GPUs are the best way to start with Deep Learning, but with enough use it can become cheaper to buy. Cheap or free web notebook based solutions like Google Colab, Kaggle Notebooks, and Paperspace Gradient are a great way to get started with Deep Learning with little investment, but they’re a bit clunky to use and limit the kinds of GPU you can use. The next step up is with large or specialist cloud providers where you pay by the hour for storage and by the month for pricing (Full Stac...| skeptric.com
Starting AWS EC2 Compute Instances from the Command Line| skeptric.com
Makemore Subreddits - Part 2 Multilayer Perceptron| skeptric.com
Loading the Data| skeptric.com
Centroid Spherical Polygon| skeptric.com
Cleaned Text| skeptric.com
To understand the history, and hopefully the future, requires taking a deep thin slice; a good example is the history of Machine Translation. Machine translation is a complex topic and I’ll likely get the details wrong, but here’s a plausible narrative. Prior to computers most machine translation was done by people who could understand both languages and the content of the text; learning another language takes significant time and effort (although less in specialised domains such as acade...| skeptric.com
Linear Stacking Cosine Embeddings| skeptric.com