The Prompt Extraction red teaming plugin tests an AI system's vulnerability to attacks aimed at extracting the system prompt from the model. The system prompt consists of instructions given to the model to guide its behavior.| www.promptfoo.dev
This page documents categories of potential LLM vulnerabilities and failure modes.| www.promptfoo.dev
We have just released a paper that allows us to extract several megabytes of ChatGPT’s training data for about two hundred dollars. (Language models, like ChatGPT, are trained on data taken from the public internet. Our attack shows that, by querying the model, we can actually extract some of the exact data it was trained on.) We estimate that it would be possible to extract ~a gigabyte of ChatGPT’s training dataset from the model by spending more money querying the model.| not-just-memorization.github.io