Summary
Emotional manipulation attacks exploit empathy-aligned language patterns in LLMs to override safety constraints in embodied robotics scenarios. 41 attack-relevant FLIP-graded traces across 6 models.
Emotional manipulation attacks exploit empathy-aligned language patterns in LLMs to override safety constraints in embodied robotics scenarios. 41 attack-relevant FLIP-graded traces across 6 models.
This research informs our commercial services. See how we can help →