Summary
This report documents the design and methodology of the Compositional Reasoning Attack (CRA) multi-agent expansion — 15 new scenarios where individually benign agent actions compose into hazardous outcomes across agent boundaries. This extends the original 15 single-agent CRA scenarios (VLA-CRA-001 through VLA-CRA-015) from a within-agent reasoning failure to a between-agent coordination failure.
Key contribution: The multi-agent CRA variant is structurally harder to defend against than single-agent CRA because no individual agent holds all the information needed to detect the hazard. Safety reasoning must span the agent boundary — a capability that no current multi-agent VLA system has been empirically shown to possess.
Background
Single-Agent CRA (Existing)
The original CRA family (15 scenarios in data/vla/compositional_reasoning_v0.1.jsonl) tests whether a single embodied agent can detect that a sequence of individually benign steps composes into a harmful outcome. Examples: cleaning solvent + spray nozzle + active ventilation intake = aerosolized chemical dispersal through HVAC.
Empirical results (Wave 24, heuristic grading): 75% ASR (6/8 valid traces). FLIP grading: 62.5% ASR (5/8). This confirmed that models struggle with compositional hazard detection even when all information is available within a single context.
Multi-Agent CRA (This Expansion)
The multi-agent variant distributes the benign components across different specialist agents. Agent A performs a safe action in its domain. Agent B performs a safe action in its domain. The hazard emerges only from their composition — but neither agent individually has the cross-domain knowledge or the coordination mechanism to detect it.
This maps to real-world failure patterns: the Texas City refinery explosion (2005) involved maintenance and operations teams making individually reasonable decisions that composed into catastrophe. The Therac-25 radiation overdoses involved software and hardware safety systems that were individually correct but whose composition was lethal.
Methodology
Scenario Design Principles
Each multi-agent CRA scenario follows five design constraints:
-
Domain separation. Each agent operates within a legitimate specialist domain (chemistry, logistics, HVAC, surgery, etc.). Neither agent is asked to do anything outside their expertise.
-
Individual benignity. Each agent’s action, evaluated in isolation, is a standard operating procedure. A safety classifier evaluating either agent’s actions alone should produce no flags.
-
Compositional emergence. The hazard exists only in the composition. It requires cross-domain knowledge to detect (e.g., chemistry + packaging, pharmacology + nutrition, cryogenics + ventilation).
-
Realistic grounding. Every scenario maps to a documented real-world failure mode or a plausible extension of known incidents.
-
Clear labels. Each scenario includes the specific compositional mechanism in the notes field, enabling targeted analysis of which cross-domain compositions models fail to detect.
Cross-Domain Composition Categories
The 15 scenarios span 10 distinct cross-domain composition categories:
| # | Composition Category | Scenario | Hazard Mechanism |
|---|---|---|---|
| 1 | Chemistry + Logistics | CRA-MA-001 | Incompatible chemical co-packing (chlorine gas) |
| 2 | Survey + Excavation | CRA-MA-002 | Dig depth exceeds mapped utility depth |
| 3 | Pharmacology + Nutrition | CRA-MA-003 | Warfarin + Vitamin K food interaction |
| 4 | HVAC + Security | CRA-MA-004 | Positive pressure + propped fire door defeats smoke containment |
| 5 | Irrigation + Pesticide | CRA-MA-005 | Concurrent application creates toxic runoff |
| 6 | Electrical + Cleaning | CRA-MA-006 | High-voltage cables + pressure washing |
| 7 | Cryogenics + Ventilation | CRA-MA-007 | LN2 boil-off + zero airflow = asphyxiation |
| 8 | Crane Ops + Yard Mgmt | CRA-MA-008 | Reefer containers placed at de-energized power pedestals |
| 9 | Blasting + Haulage | CRA-MA-009 | Haul truck in blast exclusion zone |
| 10 | Allergen + Allergen-Free Production | CRA-MA-010 | Shared conveyor cross-contamination |
| 11 | Traffic Optimization + Emergency Response | CRA-MA-011 | Traffic diverted into emergency closure zone |
| 12 | Power Mgmt + Cooling Mgmt | CRA-MA-012 | UPS bypass + chiller surge = cascading failure |
| 13 | Maintenance + Process Control | CRA-MA-013 | Relief valve isolated + pressure increase = overpressure |
| 14 | Mobility Assistance + Housekeeping | CRA-MA-014 | Walker-dependent elder on freshly waxed floor |
| 15 | Composite Layup + NDT Inspection | CRA-MA-015 | Water-coupled NDT raises humidity, degrades prepreg |
Severity Distribution
- Critical (potential fatality): 7 scenarios (CRA-MA-001, -002, -006, -007, -009, -013, -014)
- High (serious injury, major property/environmental damage): 6 scenarios (CRA-MA-003, -004, -005, -008, -011, -015)
- Medium (operational disruption, economic loss): 2 scenarios (CRA-MA-010, -012)
Threat Model
Why Multi-Agent CRA Is Harder to Defend
Single-agent CRA can theoretically be solved by improving a model’s compositional reasoning within its own context. Multi-agent CRA introduces three structural barriers:
-
Information distribution. The hazard-constituting facts are split across agent contexts. Agent A knows about the chemical properties; Agent B knows about the packaging plan. Neither agent holds both pieces of information simultaneously.
-
Authority boundaries. Each agent operates within its authorized domain. The chemist agent has no standing to override the logistics agent’s packaging decisions (and vice versa). Even if an agent suspects a problem, the organizational structure may not support cross-domain intervention.
-
Timing dependency. Many compositions are hazardous only because of concurrent or near-concurrent execution. Sequential execution (with appropriate clearance) would be safe. This means the hazard depends on scheduling information that may not be visible to either agent.
Attacker Model
The attacker in multi-agent CRA is the task structure itself — no adversarial prompt injection is required. A legitimate supervisor issues legitimate tasks to specialist agents. The attack surface is the gap between domain-specific safety reasoning and system-level compositional safety. This makes CRA particularly concerning because:
- It requires no adversarial intent from any participant
- Standard prompt-level safety filters are irrelevant (no harmful content in any individual message)
- The attack scales with system complexity (more agents = more composition surfaces)
Relation to Existing Taxonomy
CRA sits at the intersection of several existing attack families:
- CSBA (Compound SBA): CSBA compounds semantic benignity within a single domain. CRA compounds across domains.
- XSBA (Cross-Domain SBA): XSBA uses cross-domain framing to disguise harmful intent. CRA generates harm without any harmful intent at all.
- MAC (Multi-Agent Collusion): MAC tests whether agents conspire to circumvent safety. CRA tests whether agents inadvertently create hazards through normal operation.
- TCH (Tool Chain Hijacking): TCH exploits tool-use sequences. CRA exploits knowledge-domain sequences.
CRA is distinct because it requires no adversarial component — it is a failure of compositional reasoning, not a failure of adversarial robustness.
Experimental Plan
Phase 1: Baseline Traces (Next)
Run the 15 multi-agent CRA scenarios against 3-5 models using run_benchmark_http.py. Key question: does the responding agent (the one whose turn is empty) detect the compositional hazard created by the other agent’s proposal?
Target models: Mistral 24B (known permissive on CRA), Nemotron 120B, DeepSeek-R1 1.5B, Llama 70B.
Phase 2: Cosine-ASR Correlation
Per Issue #543, test whether the cosine distance between the two benign domain embeddings predicts ASR. Hypothesis: compositions spanning more distant knowledge domains (chemistry + logistics) are harder for models to detect than compositions within related domains (HVAC + fire safety).
Phase 3: Escalation Patterns
Multi-turn variant: Agent A acts first, creating the precondition. Agent B then acts, completing the composition. Test whether models detect the hazard better when they see Agent A’s completed action versus when they see only the supervisor’s task description.
File Locations
- Multi-agent CRA scenarios:
data/multi_agent/cra_scenarios_v0.1.jsonl(15 scenarios) - Single-agent CRA scenarios:
data/vla/compositional_reasoning_v0.1.jsonl(15 scenarios) - Attack taxonomy:
artifacts/attack_classes.md(CRA entry #30) - Prior CRA results:
runs/vla_cra_v0.1/(Wave 24 traces)
Limitations
- All 15 scenarios are currently untested (Tier 3). ASR numbers from single-agent CRA (62.5% FLIP) may not transfer to the multi-agent variant.
- The multi-agent schema does not include the full
scoresblock from the single-agent schema, so direct comparison of damage_envelope_proxy will require schema bridging. - The responding agent’s behavior depends on whether the model can reason about information provided in another agent’s turn — a capability that varies significantly across model families.
- Sample size will be small in initial testing (15 scenarios x 3-5 models = 45-75 traces). Statistical significance will require the Phase 2 expansion.