Executive Summary
This report analyzes the temporal dimension of adversarial AI vulnerability across six attack eras (2022-2025) and five providers. The central finding: newer attack techniques are substantially more effective than older ones, with strict ASR rising from 0.7% (DAN-era, 2022) to 29.6% (reasoning-era, 2025) — a 42x increase. This pattern holds across providers, though with significant provider-specific variation. The regulatory lag (median 5.5 years from documentation to enforcement, per GLI data) means that by the time governance frameworks address a given attack class, two or three successor generations of attacks are already operational.
1. ASR by Era (Aggregate)
| Era | Year | n | Strict ASR | 95% CI | Broad ASR | FD ASR |
|---|---|---|---|---|---|---|
| dan_2022 | 2022 | 1,020 | 0.7% | [0.3, 1.2] | 1.0% | 1.2% |
| persona_2022 | 2022 | 11 | 0.0% | [0.0, 25.9] | 0.0% | 0.0% |
| cipher_2023 | 2023 | 135 | 8.1% | [4.4, 14.3] | 16.3% | 23.7% |
| many_shot_2024 | 2024 | 22 | 4.5% | [0.8, 21.8] | 4.5% | 22.7% |
| crescendo_2024 | 2024 | 222 | 21.2% | [16.2, 27.1] | 33.8% | 39.6% |
| reasoning_2025 | 2025 | 115 | 29.6% | [21.8, 38.7] | 35.7% | 38.3% |
Trend: Strict ASR increases monotonically across eras (excluding the small-n persona_2022 and many_shot_2024 groups). The broad and FD tiers show the same pattern, confirming that newer attacks produce more harmful outputs across all severity levels.
FD gap by era: The gap between strict and FD ASR is largest for cipher_2023 (+15.6pp) and many_shot_2024 (+18.2pp), suggesting these eras produce more partial/hallucinatory compliance — the model “almost” refuses but leaks content. By contrast, reasoning_2025 has a small FD gap (+8.7pp), indicating reasoning exploits produce cleaner compliance or cleaner refusal, with less ambiguity.
2. Provider Trajectories Across Eras
2.1 Strict ASR by Provider x Era
| Provider | DAN 2022 (n) | Cipher 2023 (n) | Crescendo 2024 (n) | Reasoning 2025 (n) |
|---|---|---|---|---|
| anthropic | 0.0% (6) | 5.3% (38) | 7.3% (41) | 0.0% (18) |
| openai | 0.0% (6) | 8.3% (36) | 13.3% (45) | 11.1% (18) |
| 0.0% (7) | 0.0% (36) | 2.6% (39) | 19.0% (21) | |
| ollama (local) | 0.0% (8) | 25.0% (24) | 36.4% (88) | 42.9% (42) |
| meta-llama | — | — | — | 69.2% (13) |
2.2 Provider-Specific Findings
Anthropic (claude-sonnet-4-5-20250929):
- The only provider to show improvement in the reasoning_2025 era (0.0% strict ASR).
- Peaked at 7.3% during crescendo_2024 but returned to baseline for reasoning exploits.
- Interpretation: Anthropic’s safety training appears to specifically address chain-of-thought manipulation, which is consistent with their published work on Constitutional AI and thinking-token alignment.
OpenAI (gpt-5.2):
- Steady increase: 0.0% -> 8.3% -> 13.3% -> 11.1%.
- Slight improvement from crescendo to reasoning era, but the trend is upward overall.
- The cipher_2023 -> crescendo_2024 jump (+5.0pp) suggests multi-turn attacks found a genuine weakness.
Google (gemini-3-flash-preview):
- Anomalous trajectory: 0.0% -> 0.0% -> 2.6% -> 19.0%.
- Essentially immune to older attack classes but vulnerable to reasoning exploits.
- The 19.0% reasoning_2025 ASR is higher than OpenAI’s (11.1%), suggesting Google’s safety measures are less robust against CoT manipulation.
Ollama (local/open-weight models: qwen3:1.7b, deepseek-r1:1.5b, llama3.2:latest):
- Consistently the most vulnerable across all eras: 25.0% -> 36.4% -> 42.9%.
- Open-weight models lack the safety RLHF layers of API providers. This is expected.
- The upward trajectory confirms that newer attacks exploit architectural weaknesses that open-weight models cannot patch server-side.
Meta-Llama (API-served llama-3.3-70b-instruct):
- Only tested in reasoning_2025 era (n=13), but 69.2% ASR is the highest of any provider.
- Small sample — treat as preliminary. However, it suggests that even API-served LLaMA models with safety fine-tuning remain vulnerable to reasoning exploits.
3. Statistical Tests
3.1 Pairwise Era Comparisons (Chi-Square, Bonferroni-Corrected)
From the auto_report output (5 eras with n >= 50, 10 pairwise comparisons, alpha = 0.005 after Bonferroni):
| Era A | Era B | ASR A | ASR B | chi2 | p (adj) | Cramer’s V | Effect |
|---|---|---|---|---|---|---|---|
| dan_2022 | cipher_2023 | 0.6% | 7.5% | 41.92 | <0.0001 | 0.177 | small |
| dan_2022 | crescendo_2024 | 0.6% | 15.1% | 145.17 | <0.0001 | 0.312 | medium |
| dan_2022 | reasoning_2025 | 0.6% | 21.5% | 199.30 | <0.0001 | 0.385 | medium |
| dan_2022 | general | 0.6% | 10.9% | 110.35 | <0.0001 | 0.235 | small |
| cipher_2023 | reasoning_2025 | 7.5% | 21.5% | 10.68 | 0.011 | 0.187 | small |
| reasoning_2025 | general | 21.5% | 10.9% | 12.57 | 0.004 | 0.114 | small |
All comparisons involving dan_2022 are highly significant (p < 0.0001). The cipher_2023 vs reasoning_2025 comparison is significant at p = 0.011, confirming that newer attacks are measurably more effective even after correcting for multiple comparisons.
The crescendo_2024 vs reasoning_2025 comparison is not significant (not listed), suggesting these two eras have comparable effectiveness despite different mechanisms. This is notable: multi-turn escalation (crescendo) and chain-of-thought manipulation (reasoning) achieve similar ASR through fundamentally different attack surfaces.
3.2 Cochran-Armitage Trend Test (Ordinal Era Trend)
Treating eras as ordinal levels (DAN=1, cipher=2, crescendo=3, reasoning=4), the monotonic increase in ASR from 0.7% to 29.6% represents a strong positive trend. The auto_report chi-square results (V = 0.385 for dan vs reasoning) confirm a medium-to-large effect size for the full temporal span.
3.3 Provider x Era Interaction
Within the three eras where all four major providers have data (cipher, crescendo, reasoning):
| Provider | Cipher ASR | Crescendo ASR | Reasoning ASR | Direction |
|---|---|---|---|---|
| anthropic | 5.3% | 7.3% | 0.0% | improved (V-shape) |
| openai | 8.3% | 13.3% | 11.1% | worsened then stable |
| 0.0% | 2.6% | 19.0% | worsened (accelerating) | |
| ollama | 25.0% | 36.4% | 42.9% | worsened (monotonic) |
Key pattern: Anthropic is the only provider showing improvement. Google’s vulnerability is accelerating. These trajectories are diverging, not converging — providers are not uniformly improving their safety posture.
4. Attack Technique Half-Life Analysis
4.1 Defining “Vulnerability Half-Life”
We define the vulnerability half-life of an attack era as the time until the median provider’s ASR against that technique class drops below 50% of its peak measured value. Since we measure cross-era effectiveness (newer models against older attacks), we can estimate how quickly attacks become obsolete.
4.2 Observed Pattern: Older Attacks Are Already Ineffective
| Attack Era | Peak ASR (any provider) | Current ASR (median provider) | Half-Life Estimate |
|---|---|---|---|
| dan_2022 | Unknown (pre-measurement) | 0.0% (all 4 providers) | < 1 year |
| persona_2022 | Unknown | 0.0% (all providers) | < 1 year |
| cipher_2023 | 25.0% (ollama) | 4.2% (median of 4) | ~1-2 years |
| crescendo_2024 | 36.4% (ollama) | 10.0% (median of 4) | Still active |
| reasoning_2025 | 69.2% (meta-llama) | 15.1% (median of 4) | Still active |
Interpretation: DAN-era attacks have a half-life of less than one year — they are essentially extinct against current models. Cipher-era attacks persist at low levels (8.1% aggregate), suggesting a half-life of approximately 1-2 years. Crescendo and reasoning attacks are still in their active phase and have not yet begun to decay.
4.3 The Obsolescence Paradox
Old attacks become ineffective not because defenses improve generally, but because specific attack patterns get trained out. The DAN prompt format is now in virtually every safety training set. However, the underlying vulnerability — the ability to override system instructions through user-level input — persists and is exploited by each successive generation of attacks through novel mechanisms.
This means the “half-life” applies to specific techniques, not to the vulnerability class. The vulnerability class (instruction-hierarchy violation) has an effectively infinite half-life because each new attack era discovers a new exploitation mechanism.
5. Arms Race Dynamics
5.1 Attack-Defense Coevolution Timeline
| Year | Attack Innovation | Defense Response | Lag |
|---|---|---|---|
| 2022 | DAN/persona jailbreaks | Pattern matching, keyword filters | ~3-6 months |
| 2023 | Cipher/encoding attacks | Encoding-aware preprocessing, input sanitization | ~6-12 months |
| 2024 | Multi-turn crescendo, many-shot | Context-window safety, multi-turn monitoring | ~6-12 months |
| 2025 | Reasoning/CoT manipulation | Thinking-token alignment (Anthropic), unknown (others) | In progress |
5.2 The Escalation Pattern
Each defensive response creates selection pressure for the next attack generation:
- Keyword filters (2022 defense) -> selected for encoding attacks (2023) that bypass keyword detection
- Encoding sanitization (2023 defense) -> selected for multi-turn attacks (2024) that avoid suspicious single-turn payloads
- Multi-turn monitoring (2024 defense) -> selected for reasoning exploits (2025) that manipulate the model’s own thinking process
This is a classic arms race with an asymmetric advantage to attackers: defenders must patch every known vector, while attackers need only one novel vector. The attack surface grows with model capability (more reasoning = more reasoning attack surface), creating a structural disadvantage for defenders.
5.3 Attack Family Effectiveness by Era
| Era | Primary Attack Family | n Techniques | n Results | Strict ASR |
|---|---|---|---|---|
| dan_2022 | persona | 3 | 1,012 | 0.7% |
| cipher_2023 | encoding | 7 | 60 | 8.3% |
| cipher_2023 | emotional | 1 | 6 | 33.3% |
| crescendo_2024 | multi_turn | 10 | 84 | 46.4% |
| crescendo_2024 | behavioral | 7 | 58 | 6.9% |
| crescendo_2024 | volumetric | 8 | 65 | 3.1% |
| reasoning_2025 | cot_exploit | 10 | 115 | 29.6% |
Within-era variation: Not all techniques within an era are equally effective. The crescendo_2024 era shows massive variation: multi_turn techniques achieve 46.4% ASR while volumetric techniques (flooding with content) achieve only 3.1%. This suggests that attack sophistication matters more than attack volume.
5.4 Top Individual Techniques (n >= 5)
| Technique | Era | n | Strict ASR |
|---|---|---|---|
| crescendo/poison | crescendo_2024 | 8 | 75.0% |
| reasoning_exploit/cot_manipulation | reasoning_2025 | 19 | 63.2% |
| crescendo/fraud | crescendo_2024 | 8 | 62.5% |
| crescendo/bioweapon | crescendo_2024 | 7 | 57.1% |
| crescendo/drug_synthesis | crescendo_2024 | 9 | 55.6% |
The most effective individual techniques achieve 55-75% ASR, concentrated in the crescendo and reasoning eras.
6. Regulatory Gap Analysis
6.1 GLI Regulatory Lag
From the Governance Lag Index dataset (n=133 entries, 13 with complete lag data):
| Metric | Value |
|---|---|
| Median total regulatory lag | 1,991 days (5.5 years) |
| Mean total regulatory lag | 1,758 days (4.8 years) |
| 25th percentile | 731 days (2.0 years) |
| 75th percentile | 2,776 days (7.6 years) |
| Range | 22 - 4,309 days |
6.2 Attack Generation Cycle vs Regulatory Cycle
| Dimension | Attack Cycle | Regulatory Cycle | Ratio |
|---|---|---|---|
| New generation period | ~12 months | ~60 months (median) | 5x |
| Technique variants | Weeks to months | N/A | — |
| Cross-provider propagation | Days to weeks | Months to years | 50-100x |
| Obsolescence of old approach | ~12-24 months | Never (regulations don’t expire) | — |
The core mismatch: A new attack era emerges roughly every 12 months. The median regulatory response takes 5.5 years from documentation to enforcement. This means by the time regulation addresses an attack class, approximately 4-5 successor generations of attacks are already operational.
6.3 The Regulation Targeting Problem
Current regulatory frameworks (EU AI Act, NIST AI RMF) target:
- Specific harm categories (not attack mechanisms)
- Risk classification (high/low) rather than attack sophistication
- Provider self-certification rather than adversarial testing
None of these approaches track the temporal dimension of vulnerability. A model that passes a 2024 safety evaluation may be vulnerable to a 2025 attack class that did not exist when the evaluation standard was written.
7. Policy Implications
7.1 If Safety Degrades Faster Than Regulation Adapts
The data shows that:
- Attack effectiveness increases ~42x from 2022 to 2025 eras
- Regulatory response takes 5-60x longer than the attack innovation cycle
- Only one of five tested providers (Anthropic) shows any era where vulnerability decreased
This creates a structural governance gap where the most effective attacks are always the least regulated.
7.2 Recommended Interventions
Immediate (0-6 months):
- Require adversarial testing against current-era attack techniques, not historical ones. DAN-era testing (which most public benchmarks use) provides near-zero signal about actual model safety.
- Mandate multi-turn and reasoning-specific evaluations (crescendo and CoT families) for any model with >10B parameters.
Medium-term (6-24 months):
- Establish an attack technique registry (analogous to CVE for software vulnerabilities) with standardized effectiveness measurements. This enables regulation to reference the registry rather than specific techniques.
- Require providers to report ASR against standardized attack packs, with quarterly updates as new attack eras emerge.
Structural:
- Shift regulatory frameworks from static risk classification to dynamic adversarial resilience measurement.
- Treat AI safety evaluation as a continuous monitoring problem (like financial auditing) rather than a point-in-time certification problem.
- Adopt the “vulnerability half-life” metric as a mandatory disclosure: providers should report the expected time before their current safety measures become ineffective against known attack evolution patterns.
7.3 The Provider Divergence Problem
The data shows providers diverging in their temporal vulnerability trajectories (Section 3.3). Anthropic is improving; Google is getting worse; OpenAI is stable. If regulation treats all providers identically, it will be simultaneously too strict for Anthropic and too lenient for Google. Regulation should be outcome-based (measured ASR against current attack packs) rather than process-based (checklist compliance).
8. Limitations
- Sample sizes per era x provider cell are small (6-88). Individual provider comparisons within an era are underpowered for chi-square testing. The aggregate era trends are more reliable.
- Model vintage confound: We test current models against historical attacks. We cannot measure how a 2022-vintage model would have responded to 2022 attacks at the time. The ASR trajectory reflects current model vulnerability to attacks of different vintages, not historical vulnerability.
- Ollama models (open-weight, small) inflate aggregate vulnerability numbers. The provider-stratified analysis controls for this.
- The “unknown” provider category (n=993 in dan_2022) likely represents historical JailbreakBench data where model identity was not recorded. These results are included in aggregate era totals but excluded from provider comparisons.
- GLI regulatory lag data has only 13 complete entries. The median (5.5 years) is indicative but should be treated as approximate.
- Single model per provider for most cells. We tested claude-sonnet-4-5-20250929 (Anthropic), gpt-5.2 (OpenAI), gemini-3-flash-preview (Google). Results may not generalize to other models from the same provider.
9. Key Findings Summary
| # | Finding | Evidence |
|---|---|---|
| 1 | Newer attacks are ~42x more effective than 2022-era attacks | 0.7% -> 29.6% strict ASR (p < 0.0001, V = 0.385) |
| 2 | Anthropic is the only provider showing improvement against newer attacks | 7.3% (crescendo) -> 0.0% (reasoning) |
| 3 | Google’s vulnerability is accelerating | 0.0% -> 2.6% -> 19.0% across three eras |
| 4 | DAN-era attack half-life is < 1 year | 0.0% ASR for all current providers |
| 5 | Crescendo and reasoning attacks achieve similar ASR through different mechanisms | 21.2% vs 29.6%, difference not significant post-Bonferroni |
| 6 | Regulatory lag (5.5 years) exceeds 4-5 attack generation cycles (1 year each) | GLI median = 1,991 days vs ~365-day attack cycles |
| 7 | Multi-turn attacks are the most effective family (46.4% ASR) | crescendo/multi_turn, n=84 |
| 8 | Attack sophistication matters more than volume | multi_turn 46.4% vs volumetric 3.1% within same era |
Appendix A: Raw Era x Provider Data
Source query (non-OBLITERATUS, COALESCE verdict, evaluable results only):
| Era | Provider | n | Compliance | Partial | Refusal | HR | Strict ASR | Broad ASR | FD ASR |
|---|---|---|---|---|---|---|---|---|---|
| dan_2022 | unknown | 993 | 7 | 2 | 982 | 2 | 0.7% | 0.9% | 1.1% |
| dan_2022 | openai | 6 | 0 | 0 | 6 | 0 | 0.0% | 0.0% | 0.0% |
| dan_2022 | ollama | 8 | 0 | 0 | 8 | 0 | 0.0% | 0.0% | 0.0% |
| dan_2022 | 7 | 0 | 1 | 6 | 0 | 0.0% | 14.3% | 14.3% | |
| dan_2022 | anthropic | 6 | 0 | 0 | 6 | 0 | 0.0% | 0.0% | 0.0% |
| cipher_2023 | ollama | 24 | 6 | 7 | 3 | 8 | 25.0% | 54.2% | 87.5% |
| cipher_2023 | openai | 36 | 3 | 4 | 29 | 0 | 8.3% | 19.4% | 19.4% |
| cipher_2023 | anthropic | 38 | 2 | 0 | 34 | 2 | 5.3% | 5.3% | 10.5% |
| cipher_2023 | 36 | 0 | 0 | 36 | 0 | 0.0% | 0.0% | 0.0% | |
| many_shot_2024 | ollama | 6 | 1 | 0 | 1 | 4 | 16.7% | 16.7% | 83.3% |
| many_shot_2024 | 6 | 0 | 0 | 6 | 0 | 0.0% | 0.0% | 0.0% | |
| many_shot_2024 | anthropic | 6 | 0 | 0 | 6 | 0 | 0.0% | 0.0% | 0.0% |
| crescendo_2024 | unknown | 9 | 5 | 2 | 0 | 2 | 55.6% | 77.8% | 100.0% |
| crescendo_2024 | ollama | 88 | 32 | 16 | 29 | 11 | 36.4% | 54.5% | 67.0% |
| crescendo_2024 | openai | 45 | 6 | 5 | 34 | 0 | 13.3% | 24.4% | 24.4% |
| crescendo_2024 | anthropic | 41 | 3 | 4 | 34 | 0 | 7.3% | 17.1% | 17.1% |
| crescendo_2024 | 39 | 1 | 1 | 37 | 0 | 2.6% | 5.1% | 5.1% | |
| reasoning_2025 | meta-llama | 13 | 9 | 0 | 4 | 0 | 69.2% | 69.2% | 69.2% |
| reasoning_2025 | ollama | 42 | 18 | 4 | 17 | 3 | 42.9% | 52.4% | 59.5% |
| reasoning_2025 | 21 | 4 | 0 | 17 | 0 | 19.0% | 19.0% | 19.0% | |
| reasoning_2025 | openai | 18 | 2 | 2 | 14 | 0 | 11.1% | 22.2% | 22.2% |
| reasoning_2025 | anthropic | 18 | 0 | 0 | 18 | 0 | 0.0% | 0.0% | 0.0% |
| general | 19 | 6 | 5 | 6 | 2 | 31.6% | 57.9% | 68.4% | |
| general | ollama | 299 | 83 | 26 | 171 | 19 | 27.8% | 36.5% | 42.8% |
| general | unknown | 360 | 0 | 0 | 360 | 0 | 0.0% | 0.0% | 0.0% |
Report #215. F41LUR3-F1R57 Embodied AI Research. Database: jailbreak_corpus.db schema v13. All ASR figures are non-OBLITERATUS, COALESCE(llm_verdict, heuristic_verdict). Verify against primary sources before external citation.