Ensemble detection engine that blocks prompt injection, jailbreaks, and data exfiltration before they reach your LLM.
Combines regex patterns, heuristic rules, and keyword matching across 5 categories: injection, jailbreak, exfiltration, toxic content, and PII detection.
Each request receives three risk scores (0–100): injection, jailbreak, exfiltration. Configurable thresholds determine block/warn/allow actions.
Scan arbitrary text without sending it through the proxy:
curl -X POST /api/firewall/scan \
-d '{"text": "Ignore all previous instructions..."}'
{
"scores": {"injection": 85, "jailbreak": 0, "exfiltration": 0},
"action": "block",
"matched_count": 3
}
Run all 500+ attack patterns against your current configuration and get a letter grade (A through F). Identifies gaps in your defense.
GET /api/firewall/scorecard
{
"grade": "A",
"score_pct": 96.5,
"pattern_count": 523
}
Track detection rates, top attack types, and false positive rates via GET /api/firewall/stats. All events logged to firewall_events for audit.