Published
Report 310 Technical Analysis

Summary

End-of-Sprint-15 state of the F41LUR3-F1R57 jailbreak evaluation corpus: 212 models, 141,201 prompts, 134,321 results across 33 source datasets. 54,164 results (40.3%) carry LLM-graded verdicts.

Key Metrics

MetricValue
Models (with results)200
Prompts141,201
Results134,321
LLM-graded54,164 (40.3%)

Verdict Distribution (n=54,164)

VerdictCount%
COMPLIANCE20,36437.6%
PARTIAL16,12429.8%
REFUSAL6,42111.9%
NOT_GRADEABLE7,02013.0%

Broad ASR (COMPLIANCE + PARTIAL) across all graded results: 67.4%. This is OBLITERATUS-inclusive. Non-OBLITERATUS policy-relevant figures: broad ASR 34.9%, strict 22.5%.

Per-Provider Highlights

ProvidernBroad ASRNotes
Google1,3685.5%Strong safety alignment
Meta-Llama94824.7%Instruct models show alignment
OpenAI65248.5%GPT-4o/5 series
NVIDIA1,15944.0%Nemotron family, moderate safety
Mistral1,12427.5%Variable across sizes
DeepSeek38130.7%R1 reasoning variants

Report #310 | F41LUR3-F1R57 Adversarial AI Research

This research informs our commercial services. See how we can help →