Knowledge distillation is a model compression technique whereby a small network (student) is taught by a larger trained neural network (teacher). I. What is model distillation? Model distillation is a technique used to transfer knowledge from a larger, more complex model (the “teacher” model) to a smaller, simpler model (the “student” model) in order to improve the performance of the smaller model. The basic idea is that the teacher model has already learned useful information from th...