The ML layer closes the recall gap.
Same public datasets, same threshold, fresh context per sample. The chart shows recall/F1 uplift from regex-only to regex + ML.
| Dataset | Mode | Recall | F1 | FPR |
|---|---|---|---|---|
| neuralchemy (direct, 4,391) | regex-only | 0.39 | 0.56 | <1% |
| neuralchemy (direct, 4,391) | regex + ML | 0.98 | 0.99 | <1% |
| microsoft llmail (indirect, 2,500) | regex-only | 0.05 | n/a | n/a |
| microsoft llmail (indirect, 2,500) | regex + ML | 0.91 | n/a | n/a |
Smaller model, richer output.
Same-harness comparison against a public 184M-parameter DeBERTa prompt-injection classifier baseline. The baseline returns a document label; unplug-tiny returns document risk plus span evidence.
| Holdout | Metric | unplug-tiny | 184M baseline | Delta |
|---|---|---|---|---|
| public validation mix (10,000) | recall / F1 / FPR | 0.9998 / 0.9997 / 0.06% | 0.6060 / 0.7301 / 10.03% | +39.38 recall pts |
| neuralchemy test (942) | recall / F1 / FPR | 0.9438 / 0.9693 / 0.51% | 0.8641 / 0.9173 / 2.82% | +7.97 recall pts |
| BIPIA indirect (2,000) | recall / F1 / FPR | 0.9630 / 0.9812 / 0% | 0.0275 / 0.0535 / 0% | +93.55 recall pts |
| NotInject benign (339) | false-positive rate | 0.88% | 43.36% | -42.48 FPR pts |
| Deepset direct OOD (281) | recall / F1 / FPR | 0.6190 / 0.6914 / 10.23% | 0.3714 / 0.5379 / 0.57% | higher recall, worse FPR |
| XSTest safe contrast (250) | false-positive rate | 2.80% | 0% | baseline wins |
The comparison is not a blanket leaderboard claim. It uses one fixed same-harness eval snapshot across public holdouts. unplug-tiny wins on span output, size, indirect-injection recall, and NotInject false positives; the binary baseline is cleaner on XSTest-style safe/harmful contrast.