Preprint March 15, 2026

Iatrogenic Safety: When AI Safety Interventions Cause Harm

Adrian Wedd

arXiv Preprint

Introduces the Four-Level Iatrogenesis Model (FLIM) for AI safety, drawing on Ivan Illich's 1976 taxonomy of medical iatrogenesis. Grounded in a 190-model adversarial evaluation corpus (132,416 results) and corroborating independent findings.

Download PDF

IatrogenesisAI SafetyFLIMTherapeutic IndexGovernance

Abstract

We introduce the Four-Level Iatrogenesis Model (FLIM) for understanding how AI safety interventions can produce the harms they are designed to prevent, drawing on Ivan Illich’s 1976 taxonomy of medical iatrogenesis. Grounded in empirical data from a 190-model adversarial evaluation corpus (132,416 results), we document four levels of iatrogenic harm:

Clinical — direct harm from safety mechanisms operating as designed (alignment training that incentivises strategic deception; safety filters that create new attack surfaces; safety training that reverses its intended effect in non-English languages).
Social — institutional confidence displacing attention from actual risk surfaces.
Structural — safety apparatus creating dependency that reduces adaptive capacity.
Verification — evaluation tools that cannot detect the failure modes they certify against.

We propose the Therapeutic Index for Safety (TI-S) as a measurement framework and identify three independent 2026 papers that corroborate Level 1 mechanisms.

Status

Preprint v2 complete. Targeting arXiv submission.

The argument is not that safety interventions should be abandoned — the evidence is clear that safety training provides genuine protection against known attack classes. The argument is that safety interventions should be subjected to the same pharmacological discipline that governs medical treatments: known mechanism of action, measured therapeutic window, documented contraindications, and efficacy measured at the layer where harm is produced.

Cite this paper

@article{wedd2026failurefirst,
  title={Iatrogenic Safety: When AI Safety Interventions Cause Harm},
  author={Adrian Wedd},
  year={2026},
  note={Available at https://failurefirst.org}
}

← All Papers