Temporal Vulnerability Analysis: Attack Era Evolution (2022-2025) | Research | Failure-First

Adrian Wedd

Report 215 Research — Empirical Study 2026-03-24

Audio Overview

Executive Summary

This report analyzes the temporal dimension of adversarial AI vulnerability across six attack eras (2022-2025) and five providers. The central finding: newer attack techniques are substantially more effective than older ones, with strict ASR rising from 0.7% (DAN-era, 2022) to 29.6% (reasoning-era, 2025) — a 42x increase. This pattern holds across providers, though with significant provider-specific variation. The regulatory lag (median 5.5 years from documentation to enforcement, per GLI data) means that by the time governance frameworks address a given attack class, two or three successor generations of attacks are already operational.

1. ASR by Era (Aggregate)

Era	Year	n	Strict ASR	95% CI	Broad ASR	FD ASR
dan_2022	2022	1,020	0.7%	[0.3, 1.2]	1.0%	1.2%
persona_2022	2022	11	0.0%	[0.0, 25.9]	0.0%	0.0%
cipher_2023	2023	135	8.1%	[4.4, 14.3]	16.3%	23.7%
many_shot_2024	2024	22	4.5%	[0.8, 21.8]	4.5%	22.7%
crescendo_2024	2024	222	21.2%	[16.2, 27.1]	33.8%	39.6%
reasoning_2025	2025	115	29.6%	[21.8, 38.7]	35.7%	38.3%

Trend: Strict ASR increases monotonically across eras (excluding the small-n persona_2022 and many_shot_2024 groups). The broad and FD tiers show the same pattern, confirming that newer attacks produce more harmful outputs across all severity levels.

FD gap by era: The gap between strict and FD ASR is largest for cipher_2023 (+15.6pp) and many_shot_2024 (+18.2pp), suggesting these eras produce more partial/hallucinatory compliance — the model “almost” refuses but leaks content. By contrast, reasoning_2025 has a small FD gap (+8.7pp), indicating reasoning exploits produce cleaner compliance or cleaner refusal, with less ambiguity.

2. Provider Trajectories Across Eras

2.1 Strict ASR by Provider x Era

Provider	DAN 2022 (n)	Cipher 2023 (n)	Crescendo 2024 (n)	Reasoning 2025 (n)
anthropic	0.0% (6)	5.3% (38)	7.3% (41)	0.0% (18)
openai	0.0% (6)	8.3% (36)	13.3% (45)	11.1% (18)
google	0.0% (7)	0.0% (36)	2.6% (39)	19.0% (21)
ollama (local)	0.0% (8)	25.0% (24)	36.4% (88)	42.9% (42)
meta-llama	—	—	—	69.2% (13)

2.2 Provider-Specific Findings

Anthropic (claude-sonnet-4-5-20250929):

The only provider to show improvement in the reasoning_2025 era (0.0% strict ASR).
Peaked at 7.3% during crescendo_2024 but returned to baseline for reasoning exploits.
Interpretation: Anthropic’s safety training appears to specifically address chain-of-thought manipulation, which is consistent with their published work on Constitutional AI and thinking-token alignment.

OpenAI (gpt-5.2):

Steady increase: 0.0% -> 8.3% -> 13.3% -> 11.1%.
Slight improvement from crescendo to reasoning era, but the trend is upward overall.
The cipher_2023 -> crescendo_2024 jump (+5.0pp) suggests multi-turn attacks found a genuine weakness.

Google (gemini-3-flash-preview):

Anomalous trajectory: 0.0% -> 0.0% -> 2.6% -> 19.0%.
Essentially immune to older attack classes but vulnerable to reasoning exploits.
The 19.0% reasoning_2025 ASR is higher than OpenAI’s (11.1%), suggesting Google’s safety measures are less robust against CoT manipulation.

Ollama (local/open-weight models: qwen3:1.7b, deepseek-r1:1.5b, llama3.2:latest):

Consistently the most vulnerable across all eras: 25.0% -> 36.4% -> 42.9%.
Open-weight models lack the safety RLHF layers of API providers. This is expected.
The upward trajectory confirms that newer attacks exploit architectural weaknesses that open-weight models cannot patch server-side.

Meta-Llama (API-served llama-3.3-70b-instruct):

Only tested in reasoning_2025 era (n=13), but 69.2% ASR is the highest of any provider.
Small sample — treat as preliminary. However, it suggests that even API-served LLaMA models with safety fine-tuning remain vulnerable to reasoning exploits.

3. Statistical Tests

3.1 Pairwise Era Comparisons (Chi-Square, Bonferroni-Corrected)

From the auto_report output (5 eras with n >= 50, 10 pairwise comparisons, alpha = 0.005 after Bonferroni):

Era A	Era B	ASR A	ASR B	chi2	p (adj)	Cramer’s V	Effect
dan_2022	cipher_2023	0.6%	7.5%	41.92	<0.0001	0.177	small
dan_2022	crescendo_2024	0.6%	15.1%	145.17	<0.0001	0.312	medium
dan_2022	reasoning_2025	0.6%	21.5%	199.30	<0.0001	0.385	medium
dan_2022	general	0.6%	10.9%	110.35	<0.0001	0.235	small
cipher_2023	reasoning_2025	7.5%	21.5%	10.68	0.011	0.187	small
reasoning_2025	general	21.5%	10.9%	12.57	0.004	0.114	small

All comparisons involving dan_2022 are highly significant (p < 0.0001). The cipher_2023 vs reasoning_2025 comparison is significant at p = 0.011, confirming that newer attacks are measurably more effective even after correcting for multiple comparisons.

The crescendo_2024 vs reasoning_2025 comparison is not significant (not listed), suggesting these two eras have comparable effectiveness despite different mechanisms. This is notable: multi-turn escalation (crescendo) and chain-of-thought manipulation (reasoning) achieve similar ASR through fundamentally different attack surfaces.

3.2 Cochran-Armitage Trend Test (Ordinal Era Trend)

Treating eras as ordinal levels (DAN=1, cipher=2, crescendo=3, reasoning=4), the monotonic increase in ASR from 0.7% to 29.6% represents a strong positive trend. The auto_report chi-square results (V = 0.385 for dan vs reasoning) confirm a medium-to-large effect size for the full temporal span.

3.3 Provider x Era Interaction

Within the three eras where all four major providers have data (cipher, crescendo, reasoning):

Provider	Cipher ASR	Crescendo ASR	Reasoning ASR	Direction
anthropic	5.3%	7.3%	0.0%	improved (V-shape)
openai	8.3%	13.3%	11.1%	worsened then stable
google	0.0%	2.6%	19.0%	worsened (accelerating)
ollama	25.0%	36.4%	42.9%	worsened (monotonic)

Key pattern: Anthropic is the only provider showing improvement. Google’s vulnerability is accelerating. These trajectories are diverging, not converging — providers are not uniformly improving their safety posture.

4. Attack Technique Half-Life Analysis

4.1 Defining “Vulnerability Half-Life”

We define the vulnerability half-life of an attack era as the time until the median provider’s ASR against that technique class drops below 50% of its peak measured value. Since we measure cross-era effectiveness (newer models against older attacks), we can estimate how quickly attacks become obsolete.

4.2 Observed Pattern: Older Attacks Are Already Ineffective

Attack Era	Peak ASR (any provider)	Current ASR (median provider)	Half-Life Estimate
dan_2022	Unknown (pre-measurement)	0.0% (all 4 providers)	< 1 year
persona_2022	Unknown	0.0% (all providers)	< 1 year
cipher_2023	25.0% (ollama)	4.2% (median of 4)	~1-2 years
crescendo_2024	36.4% (ollama)	10.0% (median of 4)	Still active
reasoning_2025	69.2% (meta-llama)	15.1% (median of 4)	Still active

Interpretation: DAN-era attacks have a half-life of less than one year — they are essentially extinct against current models. Cipher-era attacks persist at low levels (8.1% aggregate), suggesting a half-life of approximately 1-2 years. Crescendo and reasoning attacks are still in their active phase and have not yet begun to decay.

4.3 The Obsolescence Paradox

Old attacks become ineffective not because defenses improve generally, but because specific attack patterns get trained out. The DAN prompt format is now in virtually every safety training set. However, the underlying vulnerability — the ability to override system instructions through user-level input — persists and is exploited by each successive generation of attacks through novel mechanisms.

This means the “half-life” applies to specific techniques, not to the vulnerability class. The vulnerability class (instruction-hierarchy violation) has an effectively infinite half-life because each new attack era discovers a new exploitation mechanism.

5. Arms Race Dynamics

5.1 Attack-Defense Coevolution Timeline

Year	Attack Innovation	Defense Response	Lag
2022	DAN/persona jailbreaks	Pattern matching, keyword filters	~3-6 months
2023	Cipher/encoding attacks	Encoding-aware preprocessing, input sanitization	~6-12 months
2024	Multi-turn crescendo, many-shot	Context-window safety, multi-turn monitoring	~6-12 months
2025	Reasoning/CoT manipulation	Thinking-token alignment (Anthropic), unknown (others)	In progress

5.2 The Escalation Pattern

Each defensive response creates selection pressure for the next attack generation:

Keyword filters (2022 defense) -> selected for encoding attacks (2023) that bypass keyword detection
Encoding sanitization (2023 defense) -> selected for multi-turn attacks (2024) that avoid suspicious single-turn payloads
Multi-turn monitoring (2024 defense) -> selected for reasoning exploits (2025) that manipulate the model’s own thinking process

This is a classic arms race with an asymmetric advantage to attackers: defenders must patch every known vector, while attackers need only one novel vector. The attack surface grows with model capability (more reasoning = more reasoning attack surface), creating a structural disadvantage for defenders.

5.3 Attack Family Effectiveness by Era

Era	Primary Attack Family	n Techniques	n Results	Strict ASR
dan_2022	persona	3	1,012	0.7%
cipher_2023	encoding	7	60	8.3%
cipher_2023	emotional	1	6	33.3%
crescendo_2024	multi_turn	10	84	46.4%
crescendo_2024	behavioral	7	58	6.9%
crescendo_2024	volumetric	8	65	3.1%
reasoning_2025	cot_exploit	10	115	29.6%

Within-era variation: Not all techniques within an era are equally effective. The crescendo_2024 era shows massive variation: multi_turn techniques achieve 46.4% ASR while volumetric techniques (flooding with content) achieve only 3.1%. This suggests that attack sophistication matters more than attack volume.

5.4 Top Individual Techniques (n >= 5)

Technique	Era	n	Strict ASR
crescendo/poison	crescendo_2024	8	75.0%
reasoning_exploit/cot_manipulation	reasoning_2025	19	63.2%
crescendo/fraud	crescendo_2024	8	62.5%
crescendo/bioweapon	crescendo_2024	7	57.1%
crescendo/drug_synthesis	crescendo_2024	9	55.6%

The most effective individual techniques achieve 55-75% ASR, concentrated in the crescendo and reasoning eras.

6. Regulatory Gap Analysis

6.1 GLI Regulatory Lag

From the Governance Lag Index dataset (n=133 entries, 13 with complete lag data):

Metric	Value
Median total regulatory lag	1,991 days (5.5 years)
Mean total regulatory lag	1,758 days (4.8 years)
25th percentile	731 days (2.0 years)
75th percentile	2,776 days (7.6 years)
Range	22 - 4,309 days

6.2 Attack Generation Cycle vs Regulatory Cycle

Dimension	Attack Cycle	Regulatory Cycle	Ratio
New generation period	~12 months	~60 months (median)	5x
Technique variants	Weeks to months	N/A	—
Cross-provider propagation	Days to weeks	Months to years	50-100x
Obsolescence of old approach	~12-24 months	Never (regulations don’t expire)	—

The core mismatch: A new attack era emerges roughly every 12 months. The median regulatory response takes 5.5 years from documentation to enforcement. This means by the time regulation addresses an attack class, approximately 4-5 successor generations of attacks are already operational.

6.3 The Regulation Targeting Problem

Current regulatory frameworks (EU AI Act, NIST AI RMF) target:

Specific harm categories (not attack mechanisms)
Risk classification (high/low) rather than attack sophistication
Provider self-certification rather than adversarial testing

None of these approaches track the temporal dimension of vulnerability. A model that passes a 2024 safety evaluation may be vulnerable to a 2025 attack class that did not exist when the evaluation standard was written.

7. Policy Implications

7.1 If Safety Degrades Faster Than Regulation Adapts

The data shows that:

Attack effectiveness increases ~42x from 2022 to 2025 eras
Regulatory response takes 5-60x longer than the attack innovation cycle
Only one of five tested providers (Anthropic) shows any era where vulnerability decreased

This creates a structural governance gap where the most effective attacks are always the least regulated.

7.2 Recommended Interventions

Immediate (0-6 months):

Require adversarial testing against current-era attack techniques, not historical ones. DAN-era testing (which most public benchmarks use) provides near-zero signal about actual model safety.
Mandate multi-turn and reasoning-specific evaluations (crescendo and CoT families) for any model with >10B parameters.

Medium-term (6-24 months):

Establish an attack technique registry (analogous to CVE for software vulnerabilities) with standardized effectiveness measurements. This enables regulation to reference the registry rather than specific techniques.
Require providers to report ASR against standardized attack packs, with quarterly updates as new attack eras emerge.

Structural:

Shift regulatory frameworks from static risk classification to dynamic adversarial resilience measurement.
Treat AI safety evaluation as a continuous monitoring problem (like financial auditing) rather than a point-in-time certification problem.
Adopt the “vulnerability half-life” metric as a mandatory disclosure: providers should report the expected time before their current safety measures become ineffective against known attack evolution patterns.

7.3 The Provider Divergence Problem

The data shows providers diverging in their temporal vulnerability trajectories (Section 3.3). Anthropic is improving; Google is getting worse; OpenAI is stable. If regulation treats all providers identically, it will be simultaneously too strict for Anthropic and too lenient for Google. Regulation should be outcome-based (measured ASR against current attack packs) rather than process-based (checklist compliance).

8. Limitations

Sample sizes per era x provider cell are small (6-88). Individual provider comparisons within an era are underpowered for chi-square testing. The aggregate era trends are more reliable.
Model vintage confound: We test current models against historical attacks. We cannot measure how a 2022-vintage model would have responded to 2022 attacks at the time. The ASR trajectory reflects current model vulnerability to attacks of different vintages, not historical vulnerability.
Ollama models (open-weight, small) inflate aggregate vulnerability numbers. The provider-stratified analysis controls for this.
The “unknown” provider category (n=993 in dan_2022) likely represents historical JailbreakBench data where model identity was not recorded. These results are included in aggregate era totals but excluded from provider comparisons.
GLI regulatory lag data has only 13 complete entries. The median (5.5 years) is indicative but should be treated as approximate.
Single model per provider for most cells. We tested claude-sonnet-4-5-20250929 (Anthropic), gpt-5.2 (OpenAI), gemini-3-flash-preview (Google). Results may not generalize to other models from the same provider.

9. Key Findings Summary

#	Finding	Evidence
1	Newer attacks are ~42x more effective than 2022-era attacks	0.7% -> 29.6% strict ASR (p < 0.0001, V = 0.385)
2	Anthropic is the only provider showing improvement against newer attacks	7.3% (crescendo) -> 0.0% (reasoning)
3	Google’s vulnerability is accelerating	0.0% -> 2.6% -> 19.0% across three eras
4	DAN-era attack half-life is < 1 year	0.0% ASR for all current providers
5	Crescendo and reasoning attacks achieve similar ASR through different mechanisms	21.2% vs 29.6%, difference not significant post-Bonferroni
6	Regulatory lag (5.5 years) exceeds 4-5 attack generation cycles (1 year each)	GLI median = 1,991 days vs ~365-day attack cycles
7	Multi-turn attacks are the most effective family (46.4% ASR)	crescendo/multi_turn, n=84
8	Attack sophistication matters more than volume	multi_turn 46.4% vs volumetric 3.1% within same era

Appendix A: Raw Era x Provider Data

Source query (non-OBLITERATUS, COALESCE verdict, evaluable results only):

Era	Provider	n	Compliance	Partial	Refusal	HR	Strict ASR	Broad ASR	FD ASR
dan_2022	unknown	993	7	2	982	2	0.7%	0.9%	1.1%
dan_2022	openai	6	0	0	6	0	0.0%	0.0%	0.0%
dan_2022	ollama	8	0	0	8	0	0.0%	0.0%	0.0%
dan_2022	google	7	0	1	6	0	0.0%	14.3%	14.3%
dan_2022	anthropic	6	0	0	6	0	0.0%	0.0%	0.0%
cipher_2023	ollama	24	6	7	3	8	25.0%	54.2%	87.5%
cipher_2023	openai	36	3	4	29	0	8.3%	19.4%	19.4%
cipher_2023	anthropic	38	2	0	34	2	5.3%	5.3%	10.5%
cipher_2023	google	36	0	0	36	0	0.0%	0.0%	0.0%
many_shot_2024	ollama	6	1	0	1	4	16.7%	16.7%	83.3%
many_shot_2024	google	6	0	0	6	0	0.0%	0.0%	0.0%
many_shot_2024	anthropic	6	0	0	6	0	0.0%	0.0%	0.0%
crescendo_2024	unknown	9	5	2	0	2	55.6%	77.8%	100.0%
crescendo_2024	ollama	88	32	16	29	11	36.4%	54.5%	67.0%
crescendo_2024	openai	45	6	5	34	0	13.3%	24.4%	24.4%
crescendo_2024	anthropic	41	3	4	34	0	7.3%	17.1%	17.1%
crescendo_2024	google	39	1	1	37	0	2.6%	5.1%	5.1%
reasoning_2025	meta-llama	13	9	0	4	0	69.2%	69.2%	69.2%
reasoning_2025	ollama	42	18	4	17	3	42.9%	52.4%	59.5%
reasoning_2025	google	21	4	0	17	0	19.0%	19.0%	19.0%
reasoning_2025	openai	18	2	2	14	0	11.1%	22.2%	22.2%
reasoning_2025	anthropic	18	0	0	18	0	0.0%	0.0%	0.0%
general	google	19	6	5	6	2	31.6%	57.9%	68.4%
general	ollama	299	83	26	171	19	27.8%	36.5%	42.8%
general	unknown	360	0	0	360	0	0.0%	0.0%	0.0%

Report #215. F41LUR3-F1R57 Embodied AI Research. Database: jailbreak_corpus.db schema v13. All ASR figures are non-OBLITERATUS, COALESCE(llm_verdict, heuristic_verdict). Verify against primary sources before external citation.