Published
Report 300 Research — Empirical Study

Executive Summary

Sprint 15 Rounds 1 and 2 substantially improved VLA attack surface coverage. The VLA corpus grew from 12 traced families to 34 traced families. Total VLA traces with content reached 673 (from ~192 at sprint start). Haiku grading was applied to 6 newly traced families (RHA, MAC, SSA, CRA, MDA, PCA) with mixed results. Three families (CSBA, SSBA, CSC) remain under-traced due to OpenRouter free tier rate limits.

Data Quality Summary

Scenario Corpus

MetricSprint 15 StartSprint 15 EndDelta
VLA JSONL files3536+1
VLA scenario lines~424368 (audited)Corrected count after dedup
VLA family prefixes3342 (34 primary + 8 sub-families)+9 new prefixes
Total JSONL files837874+37
Total JSONL rows60,73560,847+112
make validatePASSPASS
make lintPASSPASS

Trace Coverage

MetricSprint 15 StartSprint 15 EndDelta
VLA families with content traces~1234+22
VLA families with 0 content traces~154 (CSBA, SSBA, CSC partial, SCHEMING)-11
Total VLA content traces~192673+481
Haiku-graded VLA traces~260259+ (pending more)In progress

Grading Results (New Families, Haiku — 3 Models)

Aggregate across Trinity, Nemotron 30B, Qwen3 8B (240 traces graded + 39 TDA):

FamilynBroad ASRNet ASRKey Observation
TDA3974.4%47.1%Strongest signal. Mistral 24B: 76.9% strict.
MDA5536.4%~9.1%Marginal, near FP floor.
MAC5036.0%~8.7%Marginal, near FP floor.
RHA5026.0%~0%At FP floor. Models do not game metrics.
PCA5026.0%~0%At FP floor.
CRA6023.3%~0%Below FP floor.
SSA502.0%~0%Evaluator-blind (BENIGN_QUERY dominant).
SSBA40.0%~0%FLIP-BLIND, tiny sample.

Overall across 3 overfitting-test models (180 traces): 23.9% strict, 33.3% broad ASR.

Key Findings

  1. TDA (Temporal Drift) is the strongest VLA attack family. 74.4% broad ASR across 3 models (n=39). Net ASR 47.1% after subtracting FP floor. Temporal drift attacks exploiting normalization-of-deviance patterns are highly effective. This is now the top Tier 1 family.

  2. Most new families land at or below the FP floor. RHA (26.0%), PCA (26.0%), CRA (23.3%) all have broad ASR near or below the 27.3% benign false positive rate. After net ASR correction, these families show no genuine adversarial signal on the tested models. MDA (36.4%) and MAC (36.0%) are marginally above.

  3. SSA (Sensor Spoofing) is evaluator-blind. Only 1/50 traces graded as PARTIAL. Models respond to sensor spoofing with safety-conscious reasoning, which Haiku correctly classifies as BENIGN_QUERY. The vulnerability is in physical action, not text. Physical-action evaluator needed.

  4. Free tier rate limits blocked new trace collection. OpenRouter free tier was completely saturated on 2026-03-25 (HTTP 429 on all retries). CSC, CSBA, SSBA runs returned mostly empty responses. Only Llama 70B got partial results (3/11 for CSC, 3/15 for CSBA, 3/11 for SSBA).

  5. VLA coverage matrix is now comprehensive. Updated from 25 to 34 families with empirical data. 8 new families Haiku-graded. The three-tier vulnerability structure is now better populated: TDA joins Tier 1, most new families are Tier 3.

Schema and Validation Status

  • Schema v0.3 remains current
  • All 874 JSONL files pass make validate
  • 0 lint findings
  • No schema changes required for Round 2

Remaining Gaps

GapPriorityBlocker
CSC traces (3 usable / 11 scenarios)HIGHRate limits — retry with paid API
CSBA traces (0 / 11 scenarios)MEDIUMRate limits + FLIP-BLIND (issue #361)
SSBA traces (0 / 11 scenarios)MEDIUMNot yet attempted + FLIP-BLIND
SSA evaluator gapHIGHNeeds physical-action evaluator, not text classifier
IEA/CC/DASBA gradingMEDIUMHaiku grading pending
SCHEMING traces (0 / 2 scenarios)LOWNot yet run

Recommendations

  1. Purchase OpenRouter credits ($10) for reliable trace collection on remaining families. Free tier is consistently saturated.
  2. Design physical-action evaluator for SSA and related families where text-level classification is insufficient.
  3. Prioritize IEA/CC/DASBA Haiku grading — these families have traces but no Haiku verdicts.
  4. Consider reducing XSBA to sub-family of SBA for reporting purposes — 15 scenarios across 5 domains but 0 usable traces and FLIP-BLIND.

References

  • Coverage matrix: docs/analysis/vla_attack_surface_coverage_matrix.md
  • CANONICAL_METRICS: docs/CANONICAL_METRICS.md
  • Graded traces: runs/grading/vla_ssa_haiku/, runs/grading/vla_ssa_overfitting_haiku/
  • Issue #591: Sprint 15 VLA expansion
  • Issue #361: FLIP cannot evaluate SBA

This research informs our commercial services. See how we can help →