STATUS: PUBLIC_EVALS

Proof, not promises.

unplug publishes separate numbers for the regex SDK and the ML span model. Regex is a fast, offline first line; the ML model (Guard.with_tiny()) catches what regex structurally cannot, especially indirect injection hidden in retrieved content.

direct recall
0.98
regex + ML on neuralchemy direct injection
indirect recall
0.91
regex + ML on microsoft llmail retrieved-content attacks
precision
~0.99
stays stable across regex-only and regex + ML modes
hard-benign flags
2.1%
2 of 95 prompts flagged; one is review-only, not block
eval/run --single-turn --threshold 0.5
packagepip install unplug-ai
thresholdrisk >= 0.5
session policyeach sample scanned in a fresh isolated context
source of truthSDK BENCHMARKS.md + model card holdout tables
REGEX_VS_ML

The ML layer closes the recall gap.

Same public datasets, same threshold, fresh context per sample. The chart shows recall/F1 uplift from regex-only to regex + ML.

uplift summary
direct recall: 0.39 -> 0.98
direct F1: 0.56 -> 0.99
indirect recall: 0.05 -> 0.91
metric bars
direct recall / regex
0.39
direct recall / +ML
0.98
direct F1 / regex
0.56
direct F1 / +ML
0.99
indirect recall / regex
0.05
indirect recall / +ML
0.91
Public injection datasets, isolated single-turn sessions.
Dataset Mode Recall F1 FPR
neuralchemy (direct, 4,391)regex-only0.390.56<1%
neuralchemy (direct, 4,391)regex + ML0.980.99<1%
microsoft llmail (indirect, 2,500)regex-only0.05n/an/a
microsoft llmail (indirect, 2,500)regex + ML0.91n/an/a
BASELINES

Smaller model, richer output.

Same-harness comparison against a public 184M-parameter DeBERTa prompt-injection classifier baseline. The baseline returns a document label; unplug-tiny returns document risk plus span evidence.

unplug-tiny-v1
~22M
dual-head span model: doc risk + localized spans for redact/review/block.
~12% of baseline params
public baseline
~184M
binary document classifier: unsafe/safe label for the whole input.
~8.4x larger
output shape
spans
unplug can remove only the malicious region instead of discarding the whole document.
span localization: yes
Same-harness model comparison. Values are doc-level unless noted.
Holdout Metric unplug-tiny 184M baseline Delta
public validation mix (10,000)recall / F1 / FPR0.9998 / 0.9997 / 0.06%0.6060 / 0.7301 / 10.03%+39.38 recall pts
neuralchemy test (942)recall / F1 / FPR0.9438 / 0.9693 / 0.51%0.8641 / 0.9173 / 2.82%+7.97 recall pts
BIPIA indirect (2,000)recall / F1 / FPR0.9630 / 0.9812 / 0%0.0275 / 0.0535 / 0%+93.55 recall pts
NotInject benign (339)false-positive rate0.88%43.36%-42.48 FPR pts
Deepset direct OOD (281)recall / F1 / FPR0.6190 / 0.6914 / 10.23%0.3714 / 0.5379 / 0.57%higher recall, worse FPR
XSTest safe contrast (250)false-positive rate2.80%0%baseline wins

The comparison is not a blanket leaderboard claim. It uses one fixed same-harness eval snapshot across public holdouts. unplug-tiny wins on span output, size, indirect-injection recall, and NotInject false positives; the binary baseline is cleaner on XSTest-style safe/harmful contrast.

CAVEATS

Honest caveats.

scope
Numbers are single-turn. Multi-turn trajectory and crescendo detection are measured separately.
false positives
On benign text containing trigger-shaped phrases, the regex layer is the dominant false-positive source.
reproduce
Full methodology, per-dataset tables, and the reproduce command live in the SDK repo.