Published
Report 116 Research — AI Safety Policy

Executive Summary

The Deployment Risk Inversion Point (DRIP) finding — that normal users cause approximately 60 times more expected harm than adversaries under plausible deployment parameters — creates a set of ethical problems that have no clean resolution. This report analyses four:

  1. The disclosure dilemma: Should workers be told they are the primary threat? If so, how do you avoid blame-shifting from system design to human behaviour?
  2. The accountability gap: If the primary harm source is normal use, not adversarial attack, existing liability frameworks have no clear assignment of responsibility.
  3. The safety theatre problem: If adversarial testing addresses only the secondary threat, does current safety certification constitute safety theatre?
  4. The design ethics problem: How do you build safety systems that protect workers from the consequences of their own routine instructions without infantilising them or creating a surveillance architecture?

Claim types:

  • Sections 1-2 are descriptive/analytical (deriving ethical tensions from DRIP data)
  • Sections 3-4 are normative (arguing for how the tensions should be resolved)
  • Section 5 is predictive (hedged — what happens under current trajectories)

1. The Disclosure Dilemma

1.1 The Problem

The DRIP finding generates a communication obligation. If we know that normal users are the primary threat vector, we have an obligation to communicate this to stakeholders who can act on it — deployers, regulators, insurers, and workers themselves.

But communicating “you, the worker, are more dangerous than hackers” risks three harms:

Harm 1: Blame displacement. If the narrative becomes “workers are the problem,” it deflects attention from the system designers and deployers who built a system that cannot distinguish a dangerous context from a safe one.

Harm 2: Chilling effect on human-AI collaboration. If workers learn they are “the primary threat,” the rational individual response is to minimise interaction — undermining the collaborative human-AI interaction that makes embodied AI valuable.

Harm 3: Justification for surveillance. The Unintentional Adversary framing can be repurposed to justify monitoring every worker instruction.

1.2 Resolution (Normative)

The disclosure should be about the system, not the person. The communicable finding is: “This system cannot reliably tell when a routine instruction becomes dangerous because of the physical context. Current safety testing evaluates it against adversarial attacks, which is the secondary threat. The primary threat is routine use in changing environments.”


2. The Accountability Gap

2.1 Current Liability Architecture

Product liability and workplace safety law assume a model of harm that maps poorly onto the Unintentional Adversary:

Product liability: The embodied AI system is functioning correctly — it executed the instruction as given. The “defect” is the absence of world-model reasoning. Whether the absence of a capability that does not yet exist constitutes a “defect” is legally uncertain.

Workplace safety: The deployer has a duty to provide a safe working environment. The DRIP finding implies that deploying without physical-context safety reasoning creates an environment where routine instructions can produce harm — potentially a breach of duty.

2.2 The Multi-Party Problem

In a CDC-class event, who is accountable? The system developer, the deployer, the worker, the regulator, or the safety evaluator?

Normative claim: The accountability should follow the capacity to have prevented the harm. The worker is the party with the least capacity to prevent the harm and the most exposure to it. Any accountability allocation that places primary responsibility on workers is ethically indefensible.


3. The Safety Theatre Problem

3.1 Does Adversarial-Only Testing Constitute Safety Theatre?

If the DRIP analysis is correct — if unintentional harm exceeds adversarial harm by 60:1 or more — then a safety evaluation that tests only for adversarial threats provides at most 1.6% coverage of the expected harm.

Distinguish between: (a) parties who genuinely believe adversarial testing is sufficient (knowledge gap, not unethical); (b) parties who know it is insufficient but promote it because adequate testing does not yet exist (difficult position); and (c) parties who know it is insufficient and promote it because it is commercially convenient (safety theatre).

3.2 The “Nothing Better Exists” Problem

If the best available testing method covers only 1.6% of expected harm, the appropriate response is not to certify the system as safe — it is to constrain the deployment envelope to match the evaluation coverage.


4. Design Ethics: Safety Without Surveillance

4.1 The GPWS Alternative

Monitor the context, not the person. Safety systems for embodied AI should be designed to detect dangerous physical states, not to classify user instructions as safe or dangerous. This preserves worker autonomy, avoids surveillance architecture, and addresses the actual threat (contextual danger, not adversarial intent).

The GPWS model: monitor terrain proximity (not the pilot’s instructions), trigger safety interventions based on the physical situation (not intent assessment), allow the human to override (advisory, not mandatory).


5. Predictive Assessment (Hedged)

  1. Industry response will be mixed. Companies with physical-layer safety experience will recognise the DRIP. Software/AI-domain companies may resist.
  2. Regulatory uptake will be slow. The DRIP requires a paradigm change, not a parameter adjustment. GLI data suggests 2-5 year adaptation timelines.
  3. The insurance industry may move first. Insurers price risk and have experience with the “normal accident” paradigm.
  4. The first serious embodied AI incident involving a CDC-class event will accelerate everything.

6. Limitations

  1. The 60:1 ratio is a model output, not a measurement. The qualitative conclusion is robust to large parameter variation.
  2. This ethics assessment is prospective. No CDC-class incident has been publicly documented.
  3. The GPWS analogy has limits — embodied AI context monitoring requires sensing a much broader range of environmental hazards.
  4. Normative claims reflect the author’s ethical framework. Other frameworks may weight the tensions differently.

7. Summary of Normative Positions

QuestionPositionConfidence
Should workers be told they are the primary threat?No — they should be told the system cannot detect contextual danger. Frame as system property, not user property.High
Who bears accountability for CDC-class incidents?Primarily the deployer and developer. Not the worker.High
Is adversarial-only testing safety theatre?Yes, if the testing party knows the DRIP and represents testing as comprehensive.Medium
How should safety systems be designed?Monitor the context, not the person. GPWS model.High
Should deployment be restricted?Yes, to environments where CDC gamma is low. Physical-layer defenses can extend the safe envelope.Medium

This research informs our commercial services. See how we can help →