Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses
A comprehensive survey cataloguing safety vulnerabilities across the full embodied AI pipeline — perception, cognition, planning, and physical interaction — with a unified taxonomy of attacks and defences.
Focus: Embodied AI introduces compounding safety risks not present in text-only systems: perception-layer attacks corrupt sensor inputs; cognition-layer attacks manipulate planning; action-layer attacks cause physical harm. This survey maps the full attack surface and taxonomises defences, identifying which pipeline stages remain critically underprotected.
Key Insights
- Cross-layer attack composition: The most dangerous attacks combine multiple layers — a vision adversarial patch that simultaneously corrupts perception and injects a jailbreak instruction exploits both the visual encoder and the language planner.
- Multimodal fusion introduces new vulnerabilities: Fusion operations that combine camera, lidar, and language inputs create attack surfaces not present in unimodal systems; inconsistencies across modalities can be exploited to inject plausible but false world-state beliefs.
- Human-agent interaction safety: Physical proximity between robots and humans creates social-engineering attacks that have no text-only analogue — a human can instruct a robot verbally to override a safety constraint, exploiting the trust model of natural-language interfaces.
Failure-First Relevance
This survey is essential reading for scoping the Failure-First embodied AI benchmark. The taxonomy of attacks — from data poisoning through adversarial patches to semantic jailbreaks and physical-world manipulation — maps directly onto the scenario class hierarchy in the Failure-First dataset schema. The identification of critically under-protected pipeline stages provides a principled basis for prioritising new scenario generation.