Here I develop a theoretical model of TPUs vs GPUs for transformers as used by BERT and show that current GPUs are about 32% to 54% slower for this task.| Tim Dettmers