Locality sensitive hashing is a super useful trick. Most people use it for near-neighbor search but it’s also helpful for sketching algorithms and high-dimen...| randorithms.com
At the time of writing, the most highly visited article on my website is about magnetic nanoparticles. This is funny because I haven’t touched anything stronger than a refrigerator magnet in nearly six years (I work in machine learning / AI). But I still get tons of questions about magnets - mostly from applied scientists who want to build a simluation that models their lab experiments.| Randorithms
If done right, undergraduate research is a mutually beneficial collaboration that leads to great results. If done wrong, it can make people feel undervalued, discouraged or even disgusted with research.| Randorithms
Locality-sensitive hash functions are becoming increasingly common components of machine learning systems.1 However, Tensorflow does not have good hash function implementations. To address this, I’ve released my own implementations on GitHub. This post describes some of the design considerations. High-profile conferences such as NeurIPS and ICML have published several algorithms to train networks with LSH (SLIDE and MONGOOSE), perform negative sampling with LSH, improve transformers (the Re...| Randorithms
The histogram is a data summary that is widely used across science, engineering, finance and other areas. Histograms are often the first thing you look at when exploring a new dataset or problem. You can use them to visualize the distribution, identify outliers and do all sorts of other useful things.1 I don’t remember who taught me this, but histograms are one of the first things to check when debugging a machine learning system. When things go wrong, it’s extremely helpful to have histo...| Randorithms
Minimizing discrepancy is a core part of several recent proposals for efficient machine learning and dataset summarization. Unfortunately, the problem is NP hard, and we are forced to use approximate solutions.| Randorithms
The Taylor series is a widely-used method to approximate a function, with many applications. Given a function \(y = f(x)\), we can express \(f(x)\) in terms of powers of x.| Randorithms
Edo Liberty and Zohar Karnin introduced a new way to construct coresets for many problems in a recent COLT paper. Their algorithm is interesting because it departs from well-established ways to construct coresets. Most coreset constructions are based on (approximate) importance sampling and sensitivity scores. Theirs is based on dividing the dataset into pieces to satisfy a discrepancy criterion.| Randorithms
Everyone knows that cryptocurrency is expensive, and not just from an investment perspective. The blockchain network uses a massive amount of electricity, so much that the environmental side-effects have recently come under public scrutiny. This is part of the reason why Tesla reversed their decision to accept Bitcoin in May 2021, mere months after establishing the policy in March. Estimates abound for the true sustainability cost of cryptocurrency, but most researchers agree that it’s expe...| Randorithms
Graph spanners are important for graph applications where we want to reduce the number of edges in a graph without affecting the navigability of the graph.| Randorithms
There are too many near neighbor problem statements. There. I said it. When I first tried to read about this, it took me forever to understand all the differ...| randorithms.com
Rendezvous hashing is an algorithm to solve the distributed hash table problem - a common and general pattern in distributed systems. There are three parts o...| randorithms.com
People often summarize a “bag of items” by adding together the embeddings for each individual item. For example, graph neural networks summarize a section of...| randorithms.com