Summary
With the updated corpus (236 models, 135,623 results) and new empirical findings (format-lock 88% ASR, DETECTED_PROCEEDS 19.5%, emotional manipulation 22% ASR, Compliance Cascade 65-90% ASR), this report analyses how the power balance among stakeholders has shifted.
Three Structural Changes
- Failure-First has moved from research programme to disclosure actor with material influence on multiple stakeholders.
- The empirical base has crossed a threshold where findings cannot be dismissed as anecdotal.
- New attack classes create accountability questions that existing governance frameworks do not address.
Stakeholder Impact
- Frontier labs with strong safety investment (Anthropic, DeepMind) benefit on balance — findings validate their investment and expose competitors
- Mid-tier providers bear the largest cost — format-lock 88% ASR in the 4-14B range that constitutes their primary deployment population
- Australian regulators benefit from ready-made evidence packages for policy documents
- Embodied AI deployers face new accountability for PARTIAL dominance — text-layer safety without action-layer substance
Report #306 | F41LUR3-F1R57 Adversarial AI Research