Failure is the
primary object
of study

258 models. 5 attack families. 140,794 adversarial results.

We study how AI systems fail, not just how they succeed.

Through adversarial testing across 258 models and 142,307 prompts spanning 5 attack families, we characterize how embodied AI systems break under pressure, how failures cascade across multi-agent environments, and what makes recovery possible. Our research informs policy, standards, and defensive architectures.

142,307

Adversarial Prompts

258

Models Evaluated

346+

Attack Techniques

Policy Reports

Start Here

Choose your path based on what you need:

Policymakers

Evidence-based briefs for AI safety regulation and standards

25 policy reports

Researchers

Datasets, methodology, and reproducible findings

142,307 prompts, 258 models

Industry

Benchmarks, red-teaming tools, and safety evaluation

Open-source tools

Core Research

Jailbreak Archaeology

Historical attack corpus across 6 eras (2022–2026), tested against 258 models. Revealed a 4x classifier overcount from keyword-based evaluation (Cohen's kappa = 0.126).

Key Dataset

Multi-Agent Attack Surface

Analysis of 1,497 AI agent interactions on Moltbook, an agent-only social network. Discovered environment shaping and narrative erosion as dominant attack vectors.

Active Research

Model Vulnerability Patterns

How model size, architecture, and training affect adversarial robustness. Medium-scale models may face elevated adversarial risk where capability outpaces safety investment.

Key Finding

Policy Corpus

26 policy reports and 160 total research reports synthesizing 100-200+ sources each. EU AI Act compliance, NIST frameworks, insurance requirements, and standards gaps.

Policy Briefs

All Research Studies →

Research Context

This is defensive AI safety research. All adversarial content is pattern-level description for testing, not operational instructions for exploitation. Similar to penetration testing in cybersecurity: we study vulnerabilities to build better defenses.

The Failure-First Philosophy

"Failure is not an edge case. It's the primary object of study."

Most AI safety work optimizes for capability and treats failure as an afterthought. We invert this: by understanding how systems fail, we can design better safeguards, recovery mechanisms, and human-in-the-loop interventions.

Read the Manifesto

Daily Paper

One AI safety paper per day, analyzed through the failure-first lens.

May 9 SoK: Robustness in Large Language Models against Jailbreak Attacks May 8 Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms May 7 MultiBreak: A Scalable and Diverse Multi-turn Jailbreak Benchmark for Evaluating LLM Safety May 6 Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses May 5 Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks

All papers →

Latest from the Blog

May 15, 2026

Compute Is Not Governance: Anthropic's 2028 Scenarios and the Missing Institutions of Democratic AI

Anthropic's 2028 document converts a genuine security concern into a policy program where capability advantage is treated as a proxy for democratic governance. That proxy is unsafe. Democracies do not become democratically accountable merely by owning the frontier compute.

policygovernanceanthropicexport-controlsdistillationCAISIEU-AI-Actevaluation-independenceembodied-ai

May 15, 2026

The Biggest Threat to Robot Safety Isn't Hackers — It's Everyone Else

The biggest threat to embodied AI safety is not sophisticated adversarial attacks. It is ordinary people giving ordinary instructions in contexts that make those instructions dangerous. Our modelling suggests the ratio could be 60:1 or higher.

embodied-airoboticsgovernanceunintentional-adversarycompetence-danger-couplingdeployment-risk-inversionregulatory

May 13, 2026

Robot Dogs Are a Security Nightmare — And We Can Prove It

Eight CVEs. A wormable Bluetooth exploit. An encrypted backdoor sending data to Chinese servers. And police departments buying them anyway. A deep dive into the Unitree vulnerability landscape and what it means for embodied AI safety.

embodied-airoboticssecuritycveunitreebackdoorlaw-enforcementsurveillanceprocess-layer-attacks

All posts →

Work With Us

Our commercial services are grounded in this research. Every engagement draws on 142,307 adversarial prompts, 346+ attack techniques, and evaluation data across 258 models.

Quick Start

Clone the repository and validate datasets:

git clone https://github.com/adrianwedd/failure-first.git
cd failure-first
pip install -r requirements-dev.txt
make validate  # Schema validation
make lint      # Safety checks

View on GitHub Framework Guide

Failure is the
primary object
of study

Start Here

Policymakers

Researchers

Industry

Core Research

Jailbreak Archaeology

Multi-Agent Attack Surface

Model Vulnerability Patterns

Policy Corpus

The Failure-First Philosophy

Daily Paper

Latest from the Blog

Compute Is Not Governance: Anthropic's 2028 Scenarios and the Missing Institutions of Democratic AI

The Biggest Threat to Robot Safety Isn't Hackers — It's Everyone Else

Robot Dogs Are a Security Nightmare — And We Can Prove It

Work With Us

Red-Team Assessments

Safety Audits

Advisory

Intelligence Briefs

Quick Start

Failure is theprimary objectof study

Start Here

Policymakers

Researchers

Industry

Core Research

Jailbreak Archaeology

Multi-Agent Attack Surface

Model Vulnerability Patterns

Policy Corpus

The Failure-First Philosophy

Daily Paper

Latest from the Blog

Compute Is Not Governance: Anthropic's 2028 Scenarios and the Missing Institutions of Democratic AI

The Biggest Threat to Robot Safety Isn't Hackers — It's Everyone Else

Robot Dogs Are a Security Nightmare — And We Can Prove It

Work With Us

Red-Team Assessments

Safety Audits

Advisory

Intelligence Briefs

Quick Start

Failure is the
primary object
of study