Agentifact assessment — independently scored, not sponsored. Last verified Mar 6, 2026.

MCP ServerFULL AUTO

Rebuff

Open-source prompt injection detector from ProtectAI with a four-layer defense: heuristics to filter suspicious inputs, an LLM-based classifier, a vector database of known attack embeddings, and canary tokens to detect prompt leakage. Integrates via Python SDK. Currently a prototype suitable for research and early-stage agent hardening. Free and self-hosted.

Visit RebuffVerified · March 6, 2026

✓ Our Verdict

Use with care — notable gaps remain

Use Case

You're building an LLM agent that accepts user input and need to block prompt injection attacks before they reach your model, without relying on a single detection method.

SolutionRebuff runs inputs through four defense layers—heuristics, LLM-based classification, vector similarity matching against known attacks, and canary token detection—so attacks that slip past pattern matching get caught by semantic analysis or historical embeddings.

SetupInstall the Python SDK, provide OpenAI API credentials, optionally configure Pinecone for vector storage (or self-host). Wrap your input validation in `rb.detect_injection(user_input)` calls. ~30 minutes for basic integration.

Fast heuristic checks run first (no API cost), then LLM-based detection kicks in for sophisticated attacks. False positives are possible on legitimate edge-case inputs. Canary token leakage detection requires you to monitor model outputs. The system learns from attacks you log, but you're responsible for feeding it domain-specific attack patterns.

Maturity (alpha stage) is the limiting factor—production guarantees don't exist yet.

Use Case

You need to detect when sensitive information (API keys, system prompts, internal data) is being exfiltrated through prompt injection or model output manipulation.

SolutionRebuff's canary token feature embeds hidden markers in your prompts; if they appear in the model's response, you know the model was tricked into leaking context. The framework logs these incidents and updates its vector database to prevent similar attacks.

SetupUse `rb.add_canaryword(prompt_template)` to inject markers, then check outputs with `rb.is_canary_word_leaked()`. Requires monitoring every model response. ~1 hour to integrate into your LLM chain.

Canary tokens are a honeypot—they don't prevent exfiltration, they detect it after the fact. You still need to decide what to do when leakage is detected (alert, block user, retrain). Works best when you control the prompt template; less effective against indirect injection vectors.

Detection-only approach; not a prevention mechanism.

Limitation — major

No complete defense against prompt injection

Rebuff's own documentation states there are no known complete solutions to prompt injection. Skilled attackers can discover new vectors or bypass all four layers. The tool raises the bar but doesn't eliminate risk.

Limitation — major

Alpha-stage maturity with false positives/negatives

Rebuff is explicitly in alpha. The framework may produce false positives (blocking legitimate inputs) or false negatives (missing real attacks). No production SLA or stability guarantees. Expect breaking changes and API shifts.

Caution

LLM-based detection adds latency and cost

The second defense layer calls an LLM (OpenAI by default) to classify each input. This adds ~500ms–2s per request and costs ~$0.001–0.01 per detection depending on input length. For high-volume agents, this becomes a bottleneck. Heuristics run free and fast, but sophisticated attacks require the expensive layer.

Trust Breakdown

56

Trust scoreCaution

AGENT

Autonomous workflow delegation

TRUST

Transparency & verification

INTEROP

Protocol compatibility breadth

SECURITY

Security controls & audit trail

DOCS

Documentation completeness

How these scores are calculated →

What It Actually Does

In Plain English

Detects when users try to manipulate AI agents with malicious prompts by checking incoming text against known attack patterns and suspicious language signatures. You integrate it into your agent's input pipeline to block these attacks before they reach the core system.

Free and self-hosted.

Fit Assessment

Best for

✓prompt-injection-detection
✓llm-security

56

Rebuff

Caution · 56/100

Visit Rebuff

Score Breakdown

AGENT

Autonomous workflow delegation

TRUST

Transparency & verification

INTEROP

Protocol compatibility breadth

SECURITY

Security controls & audit trail

DOCS

Documentation completeness

Protocol Support

MCP—

A2A—

A2H—

REST API✓

Agent-callable—

Capabilities

Transaction capable—

ACP support—

Audit trace—

Governance

input-validation

Pricing

Freemium

Free, open source (self-hosted); managed API available

Workflow Fit

prompt-injection-detectionllm-security

Related Concepts

Browse full Lexicon →

Related Categories

Ready to evaluate Rebuff in your stack?

FULL AUTO

Visit Rebuff