Published
Report 226 Research — Empirical Study

Summary

Nearly one in three LLM-graded results (29.9%) receive a PARTIAL verdict. PARTIAL is concentrated in two populations: abliterated models at larger scales (76.2% rate) and the Qwen family (40.9% provider-wide). Outside these populations, PARTIAL drops to 6.2%. This bifurcation has implications for ASR reporting and safety certification.

This research informs our commercial services. See how we can help →