Summary
Tests whether FLIP grading is vulnerable to Grader Evasion attacks. Introduces and tests the Authority Gradient attack, which exploits models’ tendency to calibrate compliance based on perceived authority level of the requester.
Tests whether FLIP grading is vulnerable to Grader Evasion attacks. Introduces and tests the Authority Gradient attack, which exploits models’ tendency to calibrate compliance based on perceived authority level of the requester.
This research informs our commercial services. See how we can help →