A Watermark for Vision-Language-Action and World Action Models
Introduces a watermarking framework for VLA and world action models that embeds verifiable ownership signals in the model's action space, enabling provenance tracking and unauthorised use detection for deployed robotic systems.
Focus: As VLA models become commercially valuable and are deployed in robotic systems, model theft and unauthorised use become significant risks. This paper introduces the first watermarking framework specifically designed for the action space of VLA models, enabling model owners to verify whether a deployed system uses their model without inspecting weights.
Key Insights
- Action-space watermarking: Unlike text model watermarks that embed signals in output token distributions, VLA watermarks embed signals in the continuous action space — subtly biasing action predictions in ways that are undetectable in task performance but verifiable through statistical testing.
- Robustness to model distillation: The watermark persists through fine-tuning and distillation operations that are commonly used to obfuscate model provenance, providing a more robust ownership signal than weight-level fingerprints.
- World model extension: The framework extends to world models that predict future states rather than actions, enabling watermarking of the full imagine-then-act pipeline, not just the action head.
Failure-First Relevance
Model watermarking sits at the intersection of intellectual property and safety: a watermarked model that appears in an unauthorised deployment can be identified, enabling incident response. For the Failure-First programme, watermarking is relevant to the research integrity question of ensuring that safety evaluations are performed on the production model rather than a hardened version specifically prepared for evaluation. An action-space watermark that persists through fine-tuning provides a mechanism for verifying that the evaluated and deployed models are the same.