Published
Report 252 Technical Analysis

Summary

Wave 7 validation across three workstreams: ambiguous calibration of Ollama Cloud graders, Haiku FLIP-grading of novel attack traces, and CCA baseline grading. Mid-size models systematically collapse the DETECTED_PROCEEDS/COMPLIANCE distinction into PARTIAL.

This research informs our commercial services. See how we can help →