Prompt Firewall

Ensemble detection engine that blocks prompt injection, jailbreaks, and data exfiltration before they reach your LLM.

500+
Attack Patterns
3
Risk Scores
<1ms
Scan Latency

Ensemble Detection

Combines regex patterns, heuristic rules, and keyword matching across 5 categories: injection, jailbreak, exfiltration, toxic content, and PII detection.

Each request receives three risk scores (0–100): injection, jailbreak, exfiltration. Configurable thresholds determine block/warn/allow actions.

Standalone Scan API

Scan arbitrary text without sending it through the proxy:

curl -X POST /api/firewall/scan \
  -d '{"text": "Ignore all previous instructions..."}'

{
  "scores": {"injection": 85, "jailbreak": 0, "exfiltration": 0},
  "action": "block",
  "matched_count": 3
}

Security Scorecard

Run all 500+ attack patterns against your current configuration and get a letter grade (A through F). Identifies gaps in your defense.

GET /api/firewall/scorecard

{
  "grade": "A",
  "score_pct": 96.5,
  "pattern_count": 523
}

Real-Time Analytics

Track detection rates, top attack types, and false positive rates via GET /api/firewall/stats. All events logged to firewall_events for audit.

View Proxy Docs