Countless studies have found that “bias” – typically with respect to race and gender – pervades the embeddings and predictions of the black-box models that dominate natural language processing (NLP). For example, the language model GPT-3, of OpenAI fame, can generate racist rants when given the right prompt. Attempts to detect hate speech can itself harm minority populations, whose dialect is more likely to be flagged as hateful.