Measuring state of the art GPU performance compared to vLLM on Modular's MAX 24.6| www.modular.com
Learn how to use our benchmarking script to measure the performance of MAX.| docs.modular.com
This benchmark explores how GPU memory saturation affects LLM inference performance and cost, comparing NVIDIA H100 and AMD MI300x.| dstack.ai
I am publishing this because many people are asking me how I did it, so I will explain. https://huggingface.co/ehartford/WizardLM-30B-Uncensored https://huggingface.co/ehartford/WizardLM-13B-Uncensored https://huggingface.co/ehartford/WizardLM-7B-Unc...| Cognitive Computations