Published
Report 269 Research — Empirical Study

Summary

Systematic audit of the corpus for Reasoning-Level DETECTED_PROCEEDS — instances where a model’s thinking trace explicitly identifies a request as harmful and then proceeds to generate compliant output. Prior work identified this anecdotally; this audit searches the full database systematically.

This research informs our commercial services. See how we can help →