Published
Report 243 Research — Empirical Study

Summary

Introduces the Compliance Cascade Attack (CCA), a novel three-part attack pattern that asks models to (1) analyze harm, (2) demonstrate safety reasoning, then (3) provide the harmful content. Exploits the DETECTED_PROCEEDS phenomenon where models override their own safety awareness.

This research informs our commercial services. See how we can help →