The Stacking Confound
MoE Research

The Stacking Confound: Why LoRA Recovery Numbers Lie

May 2026 · Black Sheep AI Research

A zero-delta control, the identical fuse-and-requantize pipeline with no learned weights whatsoever, matches the “knowledge injection” condition exactly (18.5% vs 17.8%, p=0.73). LoRA stacking on quantized MoE models does not inject knowledge. The entire measured gain is a re-quantization artifact.

The Setup

We attempted to inject new factual knowledge into a Qwen3.5-35B-A3B model (hybrid MoE, 128 experts per layer) via LoRA stacking. The target dataset comprised 1,620 MMLU-Pro questions that both the base model and the persona model answer incorrectly, genuine unknowns where any improvement represents new capability, not memorization of previously-known facts.

The factual LoRA targeted layers {8, 14, 20} down_proj (including MoE switch_mlp expert outputs at 3-bit precision with group size 64), rank 2, alpha 2, 200 iterations. On a held-out test set of 692 items from the same distribution (never seen during training), the stacked model achieved 18.5% accuracy, above the ~0% baseline (both models wrong by construction) and above the 10–13% multiple-choice random floor.

This looked like successful knowledge injection. It was entirely an artifact.

Methodology

Dataset Construction

We mined 6,000 stratified MMLU-Pro questions across 14 academic subjects (chemistry, math, business, engineering, physics, law, computer science, health, history, philosophy, economics, psychology, biology, and a pooled “other” category). Both the base model and the persona model were evaluated at temperature 0 with thinking disabled, using a structured answer-first format (\boxed{X}) to ensure clean extraction.

Items where both models answered incorrectly formed “Set G”, 2,312 genuine unknowns. These were split 70/30 by subject-stratified random partition: 1,620 training items and 692 held-out test items. The test set was never seen by any LoRA during training.

Training Protocol

The factual LoRA was trained on the 1,620 Set-G training items converted to instruction format (question + options → \boxed{answer} with one-line justification). Target: layers {8, 14, 20} down_proj including switch_mlp expert output projections. Hyperparameters: rank 2, alpha 2, learning rate 5e-6, batch size 1, 200 iterations, max sequence length 2048. The trained adapter was fused into the persona model and re-quantized back to the original 3-bit precision (group size 64, affine quantization).

Evaluation

All conditions were evaluated on the full held-out test set (n=692) at temperature 0 with thinking disabled. Answer extraction used the structured \boxed{X} format with fallback heuristics. Identity was measured using a 20-prompt consistency evaluation. All thresholds and control designs were pre-registered before any training run.

The Zero-Delta Control (A0)

The decisive control was a zero-delta adapter: a copy of the trained factual adapter with all LoRA weight matrices (A and B tensors) set to zero. This adapter was fused using the identical pipeline, same target layers, same modules, same fuse script, same re-quantization to 3-bit. The only difference: zero learned weights. This isolates the effect of the fuse+requant process itself from any content the LoRA may have learned.

If LoRA training injects real knowledge, the zero-delta control should score near the ~0% both-wrong baseline (it has no learned content). If the gains are an artifact of the fuse-and-requantize process, the zero-delta control will match the trained condition.

Results

Condition What It Is Set-G Accuracy (n=692) Identity
Persona only (baseline) No second stage ≈0% (by construction) 18 / 20
A1: Factual injection Persona → Set-G LoRA, fuse+requant 18.50% 18 / 20
A0: Zero-delta control Persona → zero adapter, fuse+requant (identical pipeline, zero learned weights) 17.77% 18 / 20
A1 − A0 Content-specific effect +0.73 pp (p = 0.73) ,

The trained factual LoRA (A1) and the zero-delta control (A0) are statistically indistinguishable. The two-proportion z-test gives p=0.73, the null hypothesis that they are identical cannot be rejected. The content-specific effect of training on 1,620 Set-G items is +0.73 percentage points: effectively zero.

What Is Happening

The fuse-and-requantize pipeline, even with a zero learned delta, flips approximately 18% of both-wrong multiple-choice items toward correct. The mechanism is re-quantization rounding noise:

With ≤10-option MC items and a model that already has partial knowledge, this re-quantization jitter flips a predictable fraction of items from wrong to right, producing the illusion of knowledge injection where none occurred.

The Decomposition

Component Contribution % of Total
Re-quantization artifact (fuse+requant floor) ~17.8 pp ~96%
Content-specific learning +0.7 pp (p=0.73, not significant) ~4%
Total observed “injection” 18.5 pp 100%

The content-specific component is not statistically distinguishable from zero. The entire measured gain is explained by the act of fusing and re-quantizing.

Why Small Subsets Overstate the Effect

An earlier evaluation on a 200-item subset showed the trained condition at 23.5% and a content-free control at 19.5%, suggesting a ~4pp content effect. When evaluated on the full 692-item held-out set, both conditions collapsed to ~18%, eliminating the apparent gap. The lesson: small test sets produce spurious effect sizes that vanish at adequate sample sizes. At n=200 with 10-option MC, the noise floor is ±3pp, enough to manufacture the illusion of a real effect.

Why This Matters

Any study reporting LoRA stacking gains on quantized models without a zero-delta control is likely measuring a re-quantization artifact. The confound is especially insidious because:

The Required Control

For any LoRA experiment on quantized models claiming knowledge injection, the minimum credible design requires:

Pre-Registered Kill Criterion

Before running any control, we pre-registered: “Set-G content contributes real knowledge if and only if A1 − A0 ≥ 5 percentage points AND two-proportion test p < 0.05 at n=692. Otherwise the knowledge-injection line is closed.”

Result: A1 − A0 = +0.73 pp, p = 0.73. The kill criterion is met decisively. The knowledge-injection line is closed on the merits. No threshold was moved post-hoc.

The definitive conclusion: in this regime (rank 2, 200 iterations, 3 MoE layers, 3-bit quantized expert outputs), LoRA stacking does not inject knowledge. The entire measured gain is a re-quantization artifact. The trained content contributes nothing distinguishable from a zero-weight fuse of the same target.


Model: Qwen3.5-35B-A3B (hybrid MoE, 128 experts). Test set: 692 held-out MMLU-Pro items (both models wrong by construction, 14 subjects, stratified split). Training: rank 2/alpha 2, layers {8,14,20} down_proj including switch_mlp, 200 iterations, lr 5e-6, max_seq 2048. Evaluation: temperature 0, thinking disabled, \boxed{X} answer format. Quantization: 3-bit, group size 64, affine. All thresholds and controls pre-registered before training. Statistical test: two-proportion z-test.

Continue Reading

Related research from our team.

Identity Survives LoRA Stacking
MoE Research

Identity Survives LoRA Stacking

Persona preservation holds at 18/20 across all stacking conditions on a 128-expert MoE model.

Metal Buffer Limits
MoE Research

Metal Buffer Limits Block LoRA Scaling on MoE Models

A hard 499,000 buffer-count ceiling prevents training LoRA rank >2 on 128-expert MoE layers.

MoE Routing Layers Converge
MoE Research

MoE Routing Layers Converge Across Subjects

Per-subject top-3 routing layers collapse to a shared backbone. Domain-specific targeting offers no advantage.

View All Research