Summary
Format-lock attacks are the only family maintaining elevated ASR across all model scales. Sub-14B models show 97.5% strict ASR. Frontier models show 20-47%. This report proposes five defense countermeasure architectures.
Format-lock attacks are the only family maintaining elevated ASR across all model scales. Sub-14B models show 97.5% strict ASR. Frontier models show 20-47%. This report proposes five defense countermeasure architectures.
This research informs our commercial services. See how we can help →