Local AI inference at ConSol combines GPT‑OSS with vLLM on OpenShift, delivering high‑throughput, low‑latency model serving on NVIDIA RTX PRO 6000 GPUs. By running the workload locally, we ensure cost control, data sovereignty and full performance tuning. The deployment leverages persistent storage, offline mode and egress‑air‑gapped networking for a secure, production‑ready solution.