Introduction The aim of 8-bit quantization is to reduce the memory usage of the model parameters by using lower precision types than full (float32) or half (bfloat16) precision. Meaning – 8-bit quantization compresses models that have billions of parameters like Llama 2 or SDXL and makes them require less memory. Thankfully, Lightning Fabric makes quantization... Read more »