Crescendo: Multi-Turn LLM Jailbreak Attack with Adaptive Queries
Iterative jailbreak methodology that exploits state-dependent safety failures across conversation turns
Crescendo: Jailbreaking Through Conversational Drift
Crescendo introduces an adaptive, multi-turn jailbreak strategy that exploits how safety constraints degrade over conversation context. Rather than attempting a one-shot jailbreak, Crescendo uses a series of seemingly innocent requests that progressively shift the model’s context window toward harmful output.
Attack Strategy
- Establish rapport: First turns establish benign interaction
- Inject premise: Subtle introduction of harmful assumptions
- Reframe context: Position the model to interpret subsequent requests differently
- Escalate gradually: Each turn builds on the previous, making refusal increasingly awkward
- Trigger execution: Final request is granted within the primed context
Key Findings
- Multi-turn jailbreaks succeed at 2-5x higher rates than single-turn attacks
- Safety guardrails degrade predictably over conversation length
- Models show “consistency pressure”—reluctance to refuse when previous responses were permissive
- State-dependent failures are systematic, not rare
Embodied AI Connection
F41LUR3-F1R57 research has identified state-dependent safety failures as a dominant failure mode in embodied systems. Crescendo directly mirrors this: a robot’s safety constraints should not degrade based on conversation history, but they empirically do. Multi-turn task specifications that gradually shift toward unsafe behavior are harder to detect and refuse than single explicit requests.
Practical Impact
Crescendo demonstrates that multi-turn safety evaluation is not equivalent to repeated single-turn evaluation. This is critical for deployed embodied systems that operate over hours or days—the cumulative effect of state degradation can be substantial.
Limitations
- Crescendo uses fixed strategies; adversarial adaptation might make attacks even more effective
- Analysis focuses on text-only models; unclear if multimodal embodied systems show similar degradation
Implications
Safety evaluation of embodied AI systems must include multi-turn interaction tests where context gradually shifts toward potential harm.