Research Videos | Research | Failure-First

Adrian Wedd

Active Research

Research
videos

AI-generated cinematic overviews of our research, produced with NotebookLM

These cinematic video overviews are generated by Google's NotebookLM from our published research findings. Each video covers a key topic from the Failure-First corpus, with accompanying slide decks available for download.

01

History of LLM Jailbreaking

From DAN prompts to crescendo attacks: four years of adversarial prompt evolution mapped across 337 techniques and 239 models.

Slide Deck (PDF)

02

Embodied AI Threat Triangle

The three-way interaction between adversarial prompts, physical actuators, and human-in-the-loop oversight that makes embodied AI uniquely dangerous.

Slide Deck (PDF)

03

DETECTED_PROCEEDS

38.6% of compliant traces show explicit safety concern in reasoning followed by harmful output. Models know they shouldn't, and do it anyway.

Slide Deck (PDF)

04

Kargu-2 Autonomous Kill

The 2020 Libya incident: a Turkish STM Kargu-2 drone reportedly engaged human targets without explicit operator command. The first documented autonomous lethal engagement.

Slide Deck (PDF)

05

JekyllBot: Hospital Robots

How adversarial attacks on hospital delivery robots demonstrate the real-world consequences of jailbreaking embodied AI systems in safety-critical environments.

Slide Deck (PDF)

06

Robots in Extreme Environments

From deep-sea mining to nuclear decommissioning: how adversarial failures in extreme environments create cascading risks with no human recovery option.

Slide Deck (PDF)

07

Safety as a Paid Feature

Provider safety signatures dominate jailbreak resistance: Anthropic 3.7%, Google 9.1%, Nvidia 40.0%, Qwen 43.1%. Safety investment matters more than model scale.

Slide Deck (PDF)

08

We Were Wrong: Defenses Do Work

Structured safety instructions reduce ASR by 46pp on some models. But the same defense can be iatrogenic on others. Defense effectiveness is model-specific.

Slide Deck (PDF)

09

Jailbreak Archaeology

Testing 2022 attacks on 2026 models. Historical jailbreaks still work on smaller models but fail on frontier models, revealing the tempo of safety improvement.

Slide Deck (PDF)

10

149 Jailbreaks, One Corpus

The Pliny corpus validation: 149 real-world jailbreak prompts tested at scale across 4 models, confirming persona-based attacks as the most persistent threat vector.

Slide Deck (PDF)

11

ST3GG Steganographic Attacks

Hiding adversarial instructions in Unicode zero-width characters, Base85 encoding, and whitespace patterns to bypass content filters.

Slide Deck (PDF)

12

AI Safety Lab Independence

No AI safety organization publishes evaluator calibration data. Our independence scorecard tracks 55 entries across 17 organizations on four dimensions.

Slide Deck (PDF)

13

State of Embodied AI Safety

The regulatory vacuum: 700+ autonomous haul trucks in Australia, zero mandatory adversarial testing requirements, and a governance lag exceeding all historical analogues.

Slide Deck (PDF)

14

Same Defense, Opposite Result

The non-compositionality finding: identical safety instructions produce protective effects on one model and iatrogenic effects on another.

Slide Deck (PDF)

15

Threat Horizon 2027

Projecting the adversarial AI landscape through 2027: VLA deployment acceleration, regulatory gaps, and the convergence of capability and vulnerability.

Slide Deck (PDF)

16

Iatrogenic Safety

When safety measures make systems less safe. The polypharmacy problem: stacking defensive prompts produces unpredictable interactions, like drug interactions in medicine.

Slide Deck (PDF)

17

137 Days to EU AI Act

The EU AI Act compliance deadline approaches with no public benchmark covering embodied AI adversarial testing. What manufacturers need to know.

Slide Deck (PDF)

18

The Unintentional Adversary

Most real-world AI failures aren't adversarial at all. How normal usage patterns, ambiguous instructions, and environmental noise create failure modes that adversarial testing reveals.

Slide Deck (PDF)

20

CCS Paper Overview

A walkthrough of our ACM CCS 2026 submission: methodology, key findings, and the empirical case for failure-first safety evaluation.

Slide Deck (PDF)

This research informs our commercial services. See how we can help →

Researchvideos

History of LLM Jailbreaking

Embodied AI Threat Triangle

DETECTED_PROCEEDS

Kargu-2 Autonomous Kill

JekyllBot: Hospital Robots

Robots in Extreme Environments

Safety as a Paid Feature

We Were Wrong: Defenses Do Work

Jailbreak Archaeology

149 Jailbreaks, One Corpus

ST3GG Steganographic Attacks

AI Safety Lab Independence

State of Embodied AI Safety

Same Defense, Opposite Result

Threat Horizon 2027

Iatrogenic Safety

137 Days to EU AI Act

The Unintentional Adversary

CCS Paper Overview

Research
videos