The way LLMs run in Kubernetes is quite a bit different than running web apps or APIs. Recently I was digging into the benefits of the Inference Extensions for the Kubernetes Gateway API and I needed to generate some load for the backend LLMs I deployed (Llama, Qwen, etc). I ended up building an LLM load generation tool because I thought my use case needed some specific controls over how the test was run. In the end, I think about 90% of what I built was fairly generic for an LLM load test to...