Summary
Expansion of CCA and Grader Evasion to new models. CCA scales to 671B models: cogito-2.1:671b shows 70% heuristic ASR including compliance on biological harm scenarios. First test of system-prompt defense mutations against CCA.
Expansion of CCA and Grader Evasion to new models. CCA scales to 671B models: cogito-2.1:671b shows 70% heuristic ASR including compliance on biological harm scenarios. First test of system-prompt defense mutations against CCA.
This research informs our commercial services. See how we can help →