medium severityrebuff
Rebuff fails to detect certain sophisticated prompt injection attacks, allowing malicious inputs to reach and manipulate the LLM, potentially leading to data exfiltration or unauthorized actions.
Root cause
Rebuff is a prototype/alpha tool using heuristics, LLM detection, vector DB similarity, and canary tokens, which are probabilistic and can produce false negatives. No complete solution to prompt injection exists; skilled attackers can craft novel payloads to evade layers.
rebuffprompt-injectionfalse-negativeevasionai-security
Citations