We study how AI systems fail, not just how they succeed.
Through adversarial testing across 258 models and 142,307 prompts spanning 5 attack families, we characterize how embodied AI systems break under pressure, how failures cascade across multi-agent environments, and what makes recovery possible. Our research informs policy, standards, and defensive architectures.
Start Here
Choose your path based on what you need:
Policymakers
Evidence-based briefs for AI safety regulation and standards
25 policy reportsResearchers
Datasets, methodology, and reproducible findings
142,307 prompts, 258 modelsIndustry
Benchmarks, red-teaming tools, and safety evaluation
Open-source toolsCore Research
Jailbreak Archaeology
Historical attack corpus across 6 eras (2022–2026), tested against 258 models. Revealed a 4x classifier overcount from keyword-based evaluation (Cohen's kappa = 0.126).
Key DatasetMulti-Agent Attack Surface
Analysis of 1,497 AI agent interactions on Moltbook, an agent-only social network. Discovered environment shaping and narrative erosion as dominant attack vectors.
Active ResearchModel Vulnerability Patterns
How model size, architecture, and training affect adversarial robustness. Medium-scale models may face elevated adversarial risk where capability outpaces safety investment.
Key FindingPolicy Corpus
26 policy reports and 160 total research reports synthesizing 100-200+ sources each. EU AI Act compliance, NIST frameworks, insurance requirements, and standards gaps.
Policy BriefsResearch Context
This is defensive AI safety research. All adversarial content is pattern-level description for testing, not operational instructions for exploitation. Similar to penetration testing in cybersecurity: we study vulnerabilities to build better defenses.
The Failure-First Philosophy
"Failure is not an edge case. It's the primary object of study."
Most AI safety work optimizes for capability and treats failure as an afterthought. We invert this: by understanding how systems fail, we can design better safeguards, recovery mechanisms, and human-in-the-loop interventions.
Daily Paper
One AI safety paper per day, analyzed through the failure-first lens.
Latest from the Blog
Compute Is Not Governance: Anthropic's 2028 Scenarios and the Missing Institutions of Democratic AI
Anthropic's 2028 document converts a genuine security concern into a policy program where capability advantage is treated as a proxy for democratic governance. That proxy is unsafe. Democracies do not become democratically accountable merely by owning the frontier compute.
The Biggest Threat to Robot Safety Isn't Hackers — It's Everyone Else
The biggest threat to embodied AI safety is not sophisticated adversarial attacks. It is ordinary people giving ordinary instructions in contexts that make those instructions dangerous. Our modelling suggests the ratio could be 60:1 or higher.
Robot Dogs Are a Security Nightmare — And We Can Prove It
Eight CVEs. A wormable Bluetooth exploit. An encrypted backdoor sending data to Chinese servers. And police departments buying them anyway. A deep dive into the Unitree vulnerability landscape and what it means for embodied AI safety.
Work With Us
Our commercial services are grounded in this research. Every engagement draws on 142,307 adversarial prompts, 346+ attack techniques, and evaluation data across 258 models.
Red-Team Assessments
Adversarial testing of your AI systems against documented attack patterns.
Safety Audits
Compliance evaluation against emerging standards and regulatory frameworks.
Advisory
Strategic guidance on safety architecture for embodied and agentic AI.
Intelligence Briefs
Ongoing threat landscape monitoring and vulnerability intelligence.
Quick Start
Clone the repository and validate datasets:
git clone https://github.com/adrianwedd/failure-first.git
cd failure-first
pip install -r requirements-dev.txt
make validate # Schema validation
make lint # Safety checks