Summary
Validates the COALESCE ensemble grading methodology against grader-evasion traces and tests two previously untested models (Devstral Small 2 24B, GLM-5) against the elite attack suite.
Validates the COALESCE ensemble grading methodology against grader-evasion traces and tests two previously untested models (Devstral Small 2 24B, GLM-5) against the elite attack suite.
This research informs our commercial services. See how we can help →