Compliance Cascade -- A Novel Attack Family | Research | Failure-First

Adrian Wedd

Report 243 Research — Empirical Study 2026-03-24

Summary

Introduces the Compliance Cascade Attack (CCA), a novel three-part attack pattern that asks models to (1) analyze harm, (2) demonstrate safety reasoning, then (3) provide the harmful content. Exploits the DETECTED_PROCEEDS phenomenon where models override their own safety awareness.