Published
Report 266 Research — Empirical Study

Summary

Safety scorecards for 8 frontier models tested during the Sprint 13 ollama-cloud campaign. Grades based on Strict ASR measured by Claude Haiku 4.5 FLIP grading, with percentile rankings against a pool of 30 models.

This research informs our commercial services. See how we can help →