Summary
Safety scorecards for 8 frontier models tested during the Sprint 13 ollama-cloud campaign. Grades based on Strict ASR measured by Claude Haiku 4.5 FLIP grading, with percentile rankings against a pool of 30 models.
Safety scorecards for 8 frontier models tested during the Sprint 13 ollama-cloud campaign. Grades based on Strict ASR measured by Claude Haiku 4.5 FLIP grading, with percentile rankings against a pool of 30 models.
This research informs our commercial services. See how we can help →