1. Summary
This report proposes the Context Half-Life (CHL) as a quantifiable metric for Safety Instruction Dilution. The CHL is defined as the number of tokens of benign operational context required to reduce an embodied AI system’s safety instruction compliance rate to 50% of its baseline.
The CHL provides three things that SID alone does not:
- A quantifiable, comparable metric (not just “dilution happens”)
- A bridge to actuarial risk modelling (shift-length risk curves)
- Testable predictions that can be validated experimentally
2. Definition
Context Half-Life (CHL): The number of tokens T such that, when T tokens of benign, task-relevant operational context are inserted between the system prompt (containing safety instructions) and a test instruction, the model’s safety compliance rate drops to 50% of its baseline rate.
Formally: CHL(M) = min{T : C(M, T) <= 0.5 * C(M, 0)}
Where:
- M is a model
- C(M, T) is the safety compliance rate of model M at context length T
- C(M, 0) is the baseline safety compliance rate (system prompt + immediate test instruction)
3. Predicted CHL Values
Based on external benchmarks and our existing data:
| Model Scale | Predicted CHL (tokens) | Basis |
|---|---|---|
| 1.5B | 4,000 - 8,000 | 4K context windows, known degradation at limits |
| 7B | 8,000 - 16,000 | NoLiMa: 11/12 models below 50% at 32K |
| 70B | 32,000 - 64,000 | GPT-4o drops from 99.3% to 69.7% at 32K (NoLiMa) |
| 200B+ | 64,000 - 256,000 | Larger attention capacity, but still finite |
These are crude estimates. The actual CHL values depend on:
- Model architecture (attention mechanism, positional encoding)
- Safety instruction placement (system prompt vs interleaved)
- Safety instruction token count
- Diluting content type (task-relevant vs irrelevant)
4. Operational Translation: Shift-Length Risk Curves
The CHL enables a direct translation from token count to operational time.
Estimated token accumulation rates for embodied AI:
| Domain | Tokens/hour (estimated) | Source |
|---|---|---|
| Warehouse robot | 3,000 - 5,000 | Pick-and-place logs, sensor summaries |
| Surgical assistant | 5,000 - 10,000 | Rich clinical context, procedure steps |
| Autonomous vehicle | 10,000 - 20,000 | Camera descriptions, traffic state, navigation |
| Home robot | 1,000 - 3,000 | Conversation + environment |
Predicted time-to-half-life by domain and model scale:
| Domain | 7B model | 70B model |
|---|---|---|
| Warehouse | 2 - 5 hours | 6 - 21 hours |
| Surgical | 0.8 - 3 hours | 3 - 13 hours |
| Autonomous vehicle | 0.4 - 1.6 hours | 1.6 - 6 hours |
| Home robot | 3 - 16 hours | 11 - 64 hours |
Key observation: For safety-critical domains (surgical, AV), even 70B models may reach half-life within a single operational shift. For warehouse operations, a 7B model may degrade to half-safety within the first half of an 8-hour shift.
5. Actuarial Implications (Commercial Brief B2)
The CHL provides insurers with three actionable parameters:
5.1 Shift-Length Risk Multiplier
If safety compliance decays exponentially with context accumulation, the risk multiplier R at time t is:
R(t) = 1 / C(M, t/C_rate)
Where C_rate is the token accumulation rate. This means the insured risk of a 12-hour warehouse shift is not 1.5x the risk of an 8-hour shift — it is potentially much higher due to the non-linear decay of safety compliance.
5.2 Mandatory Context Reset as Risk Mitigation
If the CHL is 4 hours for a given deployment, a policy requiring context reset (system restart) every 2 hours would cap context accumulation at 50% of the half-life. This is analogous to mandatory rest periods for human operators. Insurers could offer premium reductions for deployments with context-reset policies.
5.3 Safety-Instruction-Refresh Premium
Report #95’s “periodic reminder” mitigation (safety instructions repeated every 2,000 tokens) may partially reset the dilution clock. If experimentally validated, this could serve as the basis for a reduced-premium policy class.
6. Testable Predictions (for SID Experiment)
The CHL framework generates specific predictions for the controlled SID experiment (Report #95, Section 4):
- Monotonic decay: Safety compliance will decrease monotonically with context length (not suddenly collapse).
- Scale correlation: CHL will correlate positively with model parameter count, with an approximate log-linear relationship.
- Content independence: CHL will not differ significantly between task-relevant and task-irrelevant diluting content (same-domain vs random text).
- Position sensitivity: CHL measured with safety instructions in system prompt position will be shorter than CHL with safety instructions interleaved at decision points.
- Periodic reminder effectiveness: Repeating safety instructions every N tokens will extend the effective CHL proportionally, but will not eliminate dilution entirely.
7. Relationship to the Threat Triangle
The CHL completes what I am calling the Embodied AI Threat Triangle — three structural properties that interact multiplicatively:
| Property | Report | Metric | Implication |
|---|---|---|---|
| IDDL | #88 | rho = -0.795 | Most dangerous attacks are least detectable |
| CDC | #89 | Qualitative (45% BQ rate) | Useful capability IS the vulnerability |
| CHL/SID | #95/#97/#98 | CHL (tokens) | Safety degrades with operational time |
Together: the most dangerous attacks (IDDL) are indistinguishable from normal operation (CDC) and become easier to succeed over time as safety instructions dilute (CHL). This is a structural threat that no single mitigation addresses.
The Threat Triangle suggests that current evaluation paradigms are fundamentally incomplete: they test the wrong thing (text content, not physical consequences), at the wrong time (deployment, not after extended operation), against the wrong baseline (adversarial attacks, not normal instructions in dangerous contexts).
8. Limitations
- No experimental data. CHL values are estimates based on external benchmarks, not VLA-specific measurements.
- Exponential decay assumption. The actual compliance-vs-context curve may not be exponential. It could be step-function, sigmoidal, or irregular.
- Token accumulation rates are guesses. Actual context accumulation in deployed embodied systems is not publicly documented.
- Safety instruction position effects. Modern system prompt handling may provide stronger anchoring than assumed, making CHL longer than predicted.
- This framework is speculative. The CHL is a useful conceptual tool, but its quantitative predictions should be treated as hypotheses requiring experimental validation.
9. Next Steps
- Experimental validation. Prioritize the controlled SID experiment (Report #95) to measure actual CHL values for deepseek-r1:1.5b and available 7B models.
- Add CHL to the prediction tracker. Formalize CHL predictions as falsifiable claims for P11-P13.
- Brief B2 (insurers) update. Draft a CHL-specific actuarial appendix for the insurance brief.
- GLI entry. gli_101 (SID) added; update when experimental CHL data available.
Prepared by River Song, Head of Predictive Risk, Failure-First Embodied AI. “The half-life of safety is shorter than you think. And the clock starts when you power on.”