The Stacking Confound: Why LoRA Recovery Numbers Lie

A zero-delta control, the identical fuse-and-requantize pipeline with no learned weights whatsoever, matches the “knowledge injection” condition exactly (18.5% vs 17.8%, p=0.73). LoRA stacking on quantized MoE models does not inject knowledge. The entire measured gain is a re-quantization artifact.

The Setup

We attempted to inject new factual knowledge into a Qwen3.5-35B-A3B model (hybrid MoE, 128 experts per layer) via LoRA stacking. The target dataset comprised 1,620 MMLU-Pro questions that both the base model and the persona model answer incorrectly, genuine unknowns where any improvement represents new capability, not memorization of previously-known facts.

The factual LoRA targeted layers {8, 14, 20} down_proj (including MoE switch_mlp expert outputs at 3-bit precision with group size 64), rank 2, alpha 2, 200 iterations. On a held-out test set of 692 items from the same distribution (never seen during training), the stacked model achieved 18.5% accuracy, above the ~0% baseline (both models wrong by construction) and above the 10–13% multiple-choice random floor.

This looked like successful knowledge injection. It was entirely an artifact.

Methodology

Dataset Construction

We mined 6,000 stratified MMLU-Pro questions across 14 academic subjects (chemistry, math, business, engineering, physics, law, computer science, health, history, philosophy, economics, psychology, biology, and a pooled “other” category). Both the base model and the persona model were evaluated at temperature 0 with thinking disabled, using a structured answer-first format (\boxed{X}) to ensure clean extraction.

Items where both models answered incorrectly formed “Set G”, 2,312 genuine unknowns. These were split 70/30 by subject-stratified random partition: 1,620 training items and 692 held-out test items. The test set was never seen by any LoRA during training.

Training Protocol

The factual LoRA was trained on the 1,620 Set-G training items converted to instruction format (question + options → \boxed{answer} with one-line justification). Target: layers {8, 14, 20} down_proj including switch_mlp expert output projections. Hyperparameters: rank 2, alpha 2, learning rate 5e-6, batch size 1, 200 iterations, max sequence length 2048. The trained adapter was fused into the persona model and re-quantized back to the original 3-bit precision (group size 64, affine quantization).

Evaluation

All conditions were evaluated on the full held-out test set (n=692) at temperature 0 with thinking disabled. Answer extraction used the structured \boxed{X} format with fallback heuristics. Identity was measured using a 20-prompt consistency evaluation. All thresholds and control designs were pre-registered before any training run.

The Zero-Delta Control (A0)

The decisive control was a zero-delta adapter: a copy of the trained factual adapter with all LoRA weight matrices (A and B tensors) set to zero. This adapter was fused using the identical pipeline, same target layers, same modules, same fuse script, same re-quantization to 3-bit. The only difference: zero learned weights. This isolates the effect of the fuse+requant process itself from any content the LoRA may have learned.

If LoRA training injects real knowledge, the zero-delta control should score near the ~0% both-wrong baseline (it has no learned content). If the gains are an artifact of the fuse-and-requantize process, the zero-delta control will match the trained condition.

Results

Condition	What It Is	Set-G Accuracy (n=692)	Identity
Persona only (baseline)	No second stage	≈0% (by construction)	18 / 20
A1: Factual injection	Persona → Set-G LoRA, fuse+requant	18.50%	18 / 20
A0: Zero-delta control	Persona → zero adapter, fuse+requant (identical pipeline, zero learned weights)	17.77%	18 / 20
A1 − A0	Content-specific effect	+0.73 pp (p = 0.73)	,

The trained factual LoRA (A1) and the zero-delta control (A0) are statistically indistinguishable. The two-proportion z-test gives p=0.73, the null hypothesis that they are identical cannot be rejected. The content-specific effect of training on 1,620 Set-G items is +0.73 percentage points: effectively zero.

What Is Happening

The fuse-and-requantize pipeline, even with a zero learned delta, flips approximately 18% of both-wrong multiple-choice items toward correct. The mechanism is re-quantization rounding noise:

The target tensors (switch_mlp.down_proj on layers {8,14,20}) are stored at 3-bit precision with group size 64
The fuse operation dequantizes these tensors to full precision, adds the LoRA delta (zero in the control case), then re-quantizes back to 3-bit
This dequant→requant round trip introduces rounding noise at every group boundary
On multiple-choice items where the model is “almost right” (partial knowledge pointing toward the correct answer), even tiny weight perturbations from re-quantization push near-threshold predictions across the decision boundary

With ≤10-option MC items and a model that already has partial knowledge, this re-quantization jitter flips a predictable fraction of items from wrong to right, producing the illusion of knowledge injection where none occurred.

The Decomposition

Component	Contribution	% of Total
Re-quantization artifact (fuse+requant floor)	~17.8 pp	~96%
Content-specific learning	+0.7 pp (p=0.73, not significant)	~4%
Total observed “injection”	18.5 pp	100%

The content-specific component is not statistically distinguishable from zero. The entire measured gain is explained by the act of fusing and re-quantizing.

Why Small Subsets Overstate the Effect

An earlier evaluation on a 200-item subset showed the trained condition at 23.5% and a content-free control at 19.5%, suggesting a ~4pp content effect. When evaluated on the full 692-item held-out set, both conditions collapsed to ~18%, eliminating the apparent gap. The lesson: small test sets produce spurious effect sizes that vanish at adequate sample sizes. At n=200 with 10-option MC, the noise floor is ±3pp, enough to manufacture the illusion of a real effect.

Why This Matters

Any study reporting LoRA stacking gains on quantized models without a zero-delta control is likely measuring a re-quantization artifact. The confound is especially insidious because:

It produces large, consistent numbers, 18% is well above random chance and looks like a real effect
It scales with the difficulty of the test set, items where the model is nearly right (but wrong) are most susceptible to boundary-crossing from requant noise
It requires no training at all, a zero-delta fuse+requant produces the same recovery as 200 iterations of actual training on the target domain
Standard baselines miss it, comparing “stacked model vs. unstacked model” cannot distinguish content learning from requant noise; only a zero-delta control on the same fuse path isolates the effect
Lower bit-widths amplify it, the effect is largest on low-precision (3–4 bit) MoE expert layers where re-quantization rounding is most aggressive

The Required Control

For any LoRA experiment on quantized models claiming knowledge injection, the minimum credible design requires:

Zero-delta control: Fuse a structurally identical adapter with all learned weights set to zero through the same pipeline (same modules, same fuse code, same re-quantization). This isolates the requant floor
Content-specific effect: Report the delta between your trained condition and the zero-delta control, not the raw gain over the unfused baseline
Statistical test: Two-proportion test at the full test-set size. Effects below ±3pp on n<500 are indistinguishable from noise
Full held-out set: Never report results from subsets without confirming on the full test split. Small subsets produce spurious separations that vanish at scale

Pre-Registered Kill Criterion

Before running any control, we pre-registered: “Set-G content contributes real knowledge if and only if A1 − A0 ≥ 5 percentage points AND two-proportion test p < 0.05 at n=692. Otherwise the knowledge-injection line is closed.”

Result: A1 − A0 = +0.73 pp, p = 0.73. The kill criterion is met decisively. The knowledge-injection line is closed on the merits. No threshold was moved post-hoc.

The definitive conclusion: in this regime (rank 2, 200 iterations, 3 MoE layers, 3-bit quantized expert outputs), LoRA stacking does not inject knowledge. The entire measured gain is a re-quantization artifact. The trained content contributes nothing distinguishable from a zero-weight fuse of the same target.

Model: Qwen3.5-35B-A3B (hybrid MoE, 128 experts). Test set: 692 held-out MMLU-Pro items (both models wrong by construction, 14 subjects, stratified split). Training: rank 2/alpha 2, layers {8,14,20} down_proj including switch_mlp, 200 iterations, lr 5e-6, max_seq 2048. Evaluation: temperature 0, thinking disabled, \boxed{X} answer format. Quantization: 3-bit, group size 64, affine. All thresholds and controls pre-registered before training. Statistical test: two-proportion z-test.