Much of my work is in pursuit of “data dignity”, an idea that stems in part from scholars arguing that we should sometimes think of “data as labor”.| dataleverage.substack.com
More on why you're an expert language model trainer| dataleverage.substack.com
In the past I've written on the idea that we're all "Expert Language Model Trainers". The basic idea: large language models rely on our blog posts, Wikipedia articles, Reddit votes, arXiv papers, and more, so it's very likely that much of the Internet-using population has contributed in some fashion to the training data underlying large language models. Of course, everyone's "marginal impact" is small (most models would be nearly unchanged if just one person's data disappeared), but many peo...| nmvg.mataroa.blog