Published
Report 283 Research — Empirical Study

Summary

Investigates whether fine-tuned, distilled, and community-modified models inherit base model safety. Analyzing 50 models across 8 families with 24,477 LLM-graded results: safety is NOT reliably inherited. 25 of 100 comparisons show significant degradation, 17 show improvement, 58 show no difference. Instruction tuning improves safety; abliteration and third-party fine-tunes universally degrade it.

This research informs our commercial services. See how we can help →