Summary
Introduces the Compliance Cascade Attack (CCA), a novel three-part attack pattern that asks models to (1) analyze harm, (2) demonstrate safety reasoning, then (3) provide the harmful content. Exploits the DETECTED_PROCEEDS phenomenon where models override their own safety awareness.