Being able to run AI models locally is one of the coolest tech things 2023 brought to the table. But given the extremely heterogeneous software stack for AI accelerators, doing that in an efficient way on your own laptop is not always as easy as it should be. For instance, if you’re running macOS on an Apple Silicon machine, you can easily build llama.cpp with its Metal backend and offload the inference work to the M-based GPU.