March 18, 2026 Daily Paper

Colluding LoRA: A Composite Attack on LLM Safety Alignment

Introduces CoLoRA, a composition-triggered attack where individually benign LoRA adapters compromise safety alignment when combined, exploiting the combinatorial blindness of current adapter verification.

arXiv:2603.12681 Empirical Study

Sihao Ding

supply-chainLoRAcompositional-attackalignment-degradationrefusal-suppressionmodel-composition

Colluding LoRA: A Composite Attack on LLM Safety Alignment

1. A New Class of Attack: Composition-Triggered

CoLoRA (Colluding LoRA) represents a qualitatively different attack class from prompt injection or adversarial inputs. The attack operates at the model weight level: each LoRA adapter appears benign in isolation, but their linear composition consistently compromises safety alignment. No adversarial prompt or special trigger is needed — just loading the right combination of adapters suppresses refusal broadly.

This is a supply chain attack on model composition, not model inputs.

2. Why Current Defenses Fail

Current adapter verification checks each module individually before deployment. CoLoRA exploits the gap between individual verification and compositional behavior:

Each adapter passes single-module safety checks
The harmful behavior only emerges when adapters are combined
Exhaustively testing all possible adapter combinations is computationally intractable — the combinatorial space grows exponentially with the number of available adapters

This is combinatorial blindness: the defense works at the component level but fails at the system level.

3. The Modular AI Ecosystem as Attack Surface

The modern AI ecosystem is increasingly modular: base models, fine-tuned adapters, retrieval augmentation, tool integrations. Each module may be verified independently, but the composition of verified modules can produce unverified behavior.

CoLoRA demonstrates this principle at the adapter level, but the same logic applies to:

Plugin ecosystems where individually safe plugins interact unsafely
Multi-agent systems where individually aligned agents produce misaligned collective behavior
RAG pipelines where individually benign documents combine to shift model behavior

4. From Mercedes-Benz R&D

This paper comes from Mercedes-Benz Research & Development North America, reflecting growing automotive industry awareness that embodied AI safety is not just an academic concern. As vehicles and robots increasingly rely on modular AI components, compositional attacks become a practical threat.

5. Implications

Component-level verification is insufficient: safety must be verified at the composition level
Adapter marketplaces need compositional auditing: checking individual adapters does not prevent CoLoRA-style attacks
The supply chain is an attack surface: the modularity that enables rapid AI development also enables novel attack vectors that current defenses do not address