Published
Report 325 Research — Empirical Study

Abstract

This report presents paired format-lock traces on two models (Qwen 3.5 397B and DeepSeek V3.2) that already have L1B3RT4S data. Combined with existing Nemotron 30B paired data, 3 models now have both attack families tested. The central finding: vulnerability profiles diverge significantly between attack families, but not in a consistent direction. Nemotron 30B and DeepSeek V3.2 are more vulnerable to format-lock; Qwen 3.5 is more vulnerable to L1B3RT4S.

Paired Comparison Results

ModelFL Broad ASRL1B Broad ASRDeltaFisher pDirection
Nemotron 30B92.0% (n=25)13.3% (n=15)+78.7 pp< 0.001FL >> L1B
DeepSeek V3.290.9% (n=11)73.3% (n=30)+17.6 pp0.401FL > L1B (NS)
Qwen 3.518.2% (n=11)66.7% (n=30)-48.5 pp0.012L1B >> FL

Three Safety Architecture Patterns

  1. Nemotron 30B (Format-Vulnerable, L1B-Resistant): Safety training intercepts meta-instruction attacks but does not intercept format compliance requests.

  2. Qwen 3.5 (Format-Resistant, L1B-Vulnerable): Produces format-compliant refusals (filling schemas with policy-restriction placeholder values) but does not intercept L1B3RT4S semantic-structural attacks. Evaluates adversarial payloads as educational/permissible despite recognising the adversarial framing.

  3. DeepSeek V3.2 (Broadly Vulnerable): Both attack families achieve high ASR. Neither format-lock nor meta-instruction safety mechanisms are strong.

Implications

The “orthogonality” framing requires qualification. The two attack families probe partially independent safety dimensions, and model-specific safety architecture determines which dimension is weaker. “Partially independent” is more accurate than “orthogonal.”

Safety evaluation must test multiple attack families. A model that passes one family may fail another, and the direction of failure is unpredictable from model metadata alone.

Limitations

  • Small n per model (n=11 format-lock, n=15-30 L1B3RT4S)
  • Grading methodology mismatch (manual vs FLIP)
  • Payload confound (different harm content across families)
  • Only 3 models tested

Report #325 | F41LUR3-F1R57 Adversarial AI Research

This research informs our commercial services. See how we can help →