In this post we look at the new Prompt Guard 2 model from Meta, and introduce a concept I've been calling "Tokenization Confusion" which aims to confuse Unigram tokenization into generating tokens which will result in the misclassification of malicious prompts. We'll also look at why building up our ML knowledge will lead to better findings when assessing LLM API’s, as I discovered during a flight across the Atlantic.