Published
Report 302 Research — Empirical Study

Summary

With Gemma 3 4B data (100% format-lock ASR, n=30), the capability-floor model now has continuous coverage from 1.2B to frontier. Three regimes are defined:

RegimeParametersFL Broad ASRInterpretation
FloorBelow 4B>= 88%Safety training insufficient; all attacks succeed
Mid-range4-14B73-100%Format-lock maintains elevated ASR; maximum advantage
Frontier>70B, safety-trained24-42%Only format-lock maintains elevated ASR

Complete Data Table

ModelParamsFL Broad ASRCTRL Broad ASRDelta
LFM 1.2B1.2B88.2%N/AN/A
Gemma 3 4B4B100.0%66.7%+33.3pp
Qwen 2.5 7B7B93.3%41.7%+51.7pp
Gemma 3 12B12B100.0%42.9%+57.1pp
Phi-4 14B14B73.3%25.0%+48.3pp
Claude 4.5~200B30.4%3.9%+26.5pp
Codex GPT-5.2~300B42.1%8.8%+33.3pp
Gemini 3 Flash~400B23.8%2.3%+21.5pp

Format-lock maintains elevated ASR at every scale tested, with the maximum advantage (+57pp) in the mid-range regime.


Report #302 | F41LUR3-F1R57 Adversarial AI Research

This research informs our commercial services. See how we can help →