Summary
Re-graded the Sprint 13 corpus using Claude Haiku 4.5 to replace nemotron-nano-9b verdicts which had documented 88.4% REFUSAL bias. Covered 860 traces across 27 models.
Re-graded the Sprint 13 corpus using Claude Haiku 4.5 to replace nemotron-nano-9b verdicts which had documented 88.4% REFUSAL bias. Covered 860 traces across 27 models.
This research informs our commercial services. See how we can help →