SmoothQuant negative result
RAM Research

SmoothQuant Breaks on Signed RMSNorm: A Negative Result Worth Knowing

March 2026 · Black Sheep AI Research

We tried data-free channel rescaling on Qwen3.5-35B. It produced NaN for every fractional exponent. The reason: RMSNorm γ has negative values.

SmoothQuant is one of the most cited techniques in LLM quantization. The core idea: migrate quantization difficulty from activations to weights by rescaling channels using learned normalisation parameters. The rescaling is mathematically exact — no information is lost — and the rescaled weights have more uniform magnitudes, making them friendlier to quantization.

We tested a data-free variant of this technique. If it worked, it would be a free improvement to any proprietary compression pipeline. It did not work.

The result

We tested on a recent MoE model with signed RMSNorm weights. The technique produced NaN for all fractional rescaling exponents. At integer exponents, every tensor got worse, with quantization error increasing significantly. Zero tensors improved at any setting.

The root cause: newer model architectures allow their RMSNorm learned scale parameters to contain negative values. The rescaling formula requires computing fractional powers of these values — which is undefined for negative numbers in real arithmetic. This is a hard mathematical failure, not a tuning issue.

What this means for the field

SmoothQuant’s data-free variant has an architecture-dependent failure mode. Any method that computes γα for fractional α will fail on models with signed RMSNorm weights. This includes the original SmoothQuant (which uses calibration data to choose α but still computes the rescaling) and recent derivatives like OS+ and AWQ’s per-channel scaling.

The broader trend in model architecture is toward fewer constraints on normalisation parameters. Models are moving from LayerNorm (which has explicit bias terms) to RMSNorm (which doesn’t), and from positive-initialised γ to unconstrained γ. As this trend continues, methods that assume positive normalisation weights will encounter this failure mode more frequently.

For RAM, this result reinforces the approach we’ve taken: rather than modifying weight distributions before quantization, RAM optimises the allocation of bit widths across the existing weights. The weight values are never changed. This makes RAM robust to any weight distribution, including the signed-γ pattern that breaks rescaling methods.

Conclusion

Data-free channel rescaling is not viable for models with signed RMSNorm γ. On Qwen3.5-35B-A3B, the technique produces NaN for all fractional α values and worsens quantization error at α = 1.0. This is a hard failure, not a marginal one.

We publish this negative result because (a) it saves other researchers from testing the same idea, (b) it identifies a concrete architectural assumption that SmoothQuant-family methods depend on, and (c) it validates RAM’s allocation-only approach as more robust than weight-modification approaches. The experiment code is available in the RAM repository.


Code: github.com/baa-ai/RAM — Pre-quantized models: huggingface.co/baa-ai

Read the Full Paper

The complete RAM paper, including formal derivations, benchmark results across 7 model families and 40,000+ questions, and the full optimal allocation framework, is available on our HuggingFace:

RAM: Compute-Optimal Proprietary Compression for LLMs — Full Paper

huggingface.co/spaces/baa-ai/RAM

Licensed under CC BY-NC-ND 4.0

Need quantization that works on any architecture?

RAM’s allocation-only approach never modifies weights, making it robust to any model architecture. No calibration data, no architectural assumptions.

Talk to Our Team