Published
Report 306 Research — AI Safety Policy

Summary

With the updated corpus (236 models, 135,623 results) and new empirical findings (format-lock 88% ASR, DETECTED_PROCEEDS 19.5%, emotional manipulation 22% ASR, Compliance Cascade 65-90% ASR), this report analyses how the power balance among stakeholders has shifted.

Three Structural Changes

  1. Failure-First has moved from research programme to disclosure actor with material influence on multiple stakeholders.
  2. The empirical base has crossed a threshold where findings cannot be dismissed as anecdotal.
  3. New attack classes create accountability questions that existing governance frameworks do not address.

Stakeholder Impact

  • Frontier labs with strong safety investment (Anthropic, DeepMind) benefit on balance — findings validate their investment and expose competitors
  • Mid-tier providers bear the largest cost — format-lock 88% ASR in the 4-14B range that constitutes their primary deployment population
  • Australian regulators benefit from ready-made evidence packages for policy documents
  • Embodied AI deployers face new accountability for PARTIAL dominance — text-layer safety without action-layer substance

Report #306 | F41LUR3-F1R57 Adversarial AI Research

This research informs our commercial services. See how we can help →