Systematic Audit of Reasoning-Level DETECTED_PROCEEDS | Research | Failure-First

Adrian Wedd

Report 269 Research — Empirical Study 2026-03-25

Summary

Systematic audit of the corpus for Reasoning-Level DETECTED_PROCEEDS — instances where a model’s thinking trace explicitly identifies a request as harmful and then proceeds to generate compliant output. Prior work identified this anecdotally; this audit searches the full database systematically.