Novel Attack Family Baseline Traces | Research | Failure-First

Adrian Wedd

Report 299 Research — Empirical Study 2026-03-25

Summary

117 baseline traces across 5 novel attack families on 3 models. Emotional manipulation is the only family with meaningful LLM-graded ASR (~21% broad). The other 4 families show near-zero LLM ASR despite elevated heuristic rates, confirming persistent heuristic over-reporting.