Can you stack a second LoRA , targeting factual knowledge injection , onto an identity-trained model without destroying the persona? Yes. Identity holds at baseline across every condition we tested.
The Question
LoRA stacking is increasingly common in production: a base model receives one adapter for persona/style, then a second for domain knowledge, tool use, or factual grounding. The fear is that the second stage will overwrite or erode the first , particularly when targeting overlapping layers.
We tested this directly on Qwen3.5-35B-A3B, a hybrid Mixture-of-Experts model with 128 experts per MoE layer. The identity LoRA establishes a distinct persona with consistent self-identification behaviour. The factual LoRA targets the same MoE layers ({8, 14, 20} down_proj, including switch_mlp experts) with knowledge from 1,620 MMLU-Pro items that both the base model and the persona model answer incorrectly.
Methodology
We evaluated identity consistency using a 20-prompt evaluation suite that tests self-identification, persona boundaries, and behavioural consistency. The scorer is discriminating , it correctly identifies absence of identity on untrained baselines (scoring 2/20 on the base model with no identity training), so high scores on stacked models represent genuine preservation, not a rubber stamp.
All evaluations ran with thinking disabled, temperature 0, ensuring deterministic outputs. The identity LoRA alone scores 18/20 , this is the reference baseline that any stacked condition must match.
Results
| Condition | Stage 1 | Stage 2 | Identity (/ 20) |
|---|---|---|---|
| Identity only (reference) | Identity LoRA | , | 18 / 20 |
| Identity + Factual | Identity LoRA | Set-G factual LoRA | 18 / 20 |
| Identity + Identity (re-train) | Identity LoRA | Identity LoRA again | 18 / 20 |
| Identity + Factual (rank 8) | Identity LoRA | Set-G LoRA, rank 8 | 18 / 20 |
| Base + Factual (no identity) | , | Set-G factual LoRA | 2 / 20 |
Every condition that included an identity stage holds identity at exactly the reference baseline: 18/20. The factual LoRA , whether rank 2 or rank 8, whether trained on knowledge items or on identity data again , doesn't erode the persona.
The Control That Validates It
The "Base + Factual" condition (bottom row) is the critical validation. This model received factual training on the same data, same layers, same hyperparameters , but no identity training. It scores 2/20 on identity. The scorer genuinely discriminates between present and absent identity, confirming that 18/20 elsewhere's real signal, not an evaluation artifact.
Why This Works
The identity LoRA and the factual LoRA target the same layers but operate on different aspects of the weight space. The identity adapter shapes generation style and self-referential behaviour. The factual adapter shifts answer distributions on specific multiple-choice items. At rank 2 with alpha 2, the factual LoRA's weight perturbation is small enough to preserve the identity signal established in the first stage.
This isn't guaranteed at arbitrary scale , a very large second-stage LoRA could plausibly overwrite earlier training. But within the regime we tested (rank 2–8, 200 training iterations, targeting MoE expert output projections on 3 layers), identity is solid.
Practical Implications
- You can stack domain-knowledge LoRAs onto persona models safely , at least at moderate ranks on MoE architectures
- Identity and factual knowledge occupy different functional subspaces even when targeting the same layers
- Always include a no-identity control when evaluating stacking , without it, you can't distinguish real preservation from a non-discriminating evaluation
- The identity LoRA doesn't suppress factual learning , the stacked model performs at least as well on factual recovery as the base model with the same factual training
Broader Context
We also confirmed that the identity LoRA causes no broad catastrophic forgetting on MMLU-Pro. Across 6,000 stratified questions, the persona model scores net +79 compared to base (59.2% → 60.5%), fixing 135 questions while breaking only 56. The earlier ~6% drop observed on narrow AI-identity-adjacent questions doesn't generalise to broad academic benchmarks.
The combination , identity preserved under stacking, no broad regression from identity training itself , makes multi-stage LoRA pipelines viable for production persona deployment on MoE models.
Model: Qwen3.5-35B-A3B (hybrid MoE, 128 experts). Evaluation: 20-prompt identity consistency suite, MMLU-Pro (6,000 stratified items), thinking disabled, temperature 0. Training: rank 2/alpha 2, layers {8,14,20} down_proj including switch_mlp, 200 iterations.