I recently threw out a random thought on Twitter, wondering if there might be room for something I called semi-local inference. This wouldn’t be on-device processing, but something like using a WiFi router to run powerful language models (LLMs). I was curious about the potential benefits—speed, cost, and privacy—over using APIs to power and control smart home or office devices. Here’s the tweet that started it all: