Large language model (LLM) processing dominates many AI discussions today. The broad, rapid adoption of any application often brings an urgent need for scalability. GPU devotees are discovering that where one GPU may execute an LLM well, interconnecting many GPUs often doesn’t scale as hoped since latency starts piling up with noticeable effects on user…