Research.

Original research in model compression, capability governance, and sovereign AI deployment.

Gemma 4 at Half the Size, But Full Performance
Latest April 2026

Gemma 4 at Half the Size, But Full Performance

RAM compression matches BF16 quality at 50% of the original model size. Our MMLU-Pro evaluation of Gemma 4 demonstrates that compute-optimal quantization preserves the capabilities that matter.

March 2026

What 100 Prompts Reveal About Expert Routing in 256-Expert MoE Models
46

What 100 Prompts Reveal About Expert Routing in 256-Expert MoE Models

March 2026

Why You Can't Prune MoE Experts, Even the Ones Nobody Uses
44

Why You Can't Prune MoE Experts, Even the Ones Nobody Uses

March 2026

SmoothQuant Breaks on Signed RMSNorm: A Negative Result
43

SmoothQuant Breaks on Signed RMSNorm: A Negative Result

March 2026

Why a Mostly-6-Bit Model Runs Faster Than a Mixed 4/8-Bit Model
42

Why a Mostly-6-Bit Model Runs Faster Than a Mixed 4/8-Bit Model

March 2026

When Better Is Worse: What We Learned by Trying to Improve Our Quantizer
40

When Better Is Worse: What We Learned by Trying to Improve Our Quantizer

March 2026

Five Things Our Benchmarks Reveal That Nobody Expected
39

Five Things Our Benchmarks Reveal That Nobody Expected

March 2026

RAM Benchmark Results: 7 Models, 40,000+ Questions, One Winner
38

RAM Benchmark Results: 7 Models, 40,000+ Questions, One Winner

March 2026

Beyond Perplexity: Downstream Benchmarks Confirm RAM Beats All Quantization Strategies
35

Beyond Perplexity: Downstream Benchmarks Confirm RAM Beats All Quantization Strategies

March 2026

Does Quantization Actually Regularize? We Tested It.
34

Does Quantization Actually Regularize? We Tested It.

March 2026

Mean Perplexity Is Lying to You
33

Mean Perplexity Is Lying to You

March 2026

MLX Quantization on Apple Silicon: How RAM Turns a Mac into a Compression Lab
32

MLX Quantization on Apple Silicon: How RAM Turns a Mac into a Compression Lab

March 2026

When Data-Free Beats the Gold Standard
27

When Data-Free Beats the Gold Standard

March 2026

The GPU Hours Nobody Needed to Spend
23

The GPU Hours Nobody Needed to Spend

March 2026

The Quantization Bottleneck Is About to Break
22

The Quantization Bottleneck Is About to Break

March 2026

What RAM Actually Delivers: Evidence from Four Models
21

What RAM Actually Delivers: Evidence from Four Models

March 2026

RAM Evaluation Results: Four Models, Three Architectures
20

RAM Evaluation Results: Four Models, Three Architectures

March 2026

Why RAM Matters: The Future of Model Deployment
19

Why RAM Matters: The Future of Model Deployment

March 2026

When Quantization Beats Full Precision: Anatomy of a Perplexity Anomaly
18

When Quantization Beats Full Precision: Anatomy of a Perplexity Anomaly

March 2026

February 2026

AI Without Permission: Privacy, Sovereignty, and Local Inference
15

AI Without Permission: Privacy, Sovereignty, and Local Inference

February 2026

The End of Calibration Data
13

The End of Calibration Data

February 2026

AI Sovereignty on Commodity Hardware
12

AI Sovereignty on Commodity Hardware

February 2026

RAM and the Humanoid Intelligence Problem
10

RAM and the Humanoid Intelligence Problem

February 2026

RAM for Enterprise: Deploying Without the GPU Bill
09

RAM for Enterprise: Deploying Without the GPU Bill

February 2026

RAM on Apple Silicon: 400B Parameters on a Single Mac
08

RAM on Apple Silicon: 400B Parameters on a Single Mac

February 2026

Why Collapse Tests Are Insufficient
06

Why Collapse Tests Are Insufficient

February 2026

MLX Quantization Pitfalls and Workarounds
04

MLX Quantization Pitfalls and Workarounds

February 2026

Expert Pruning: When Dead Experts Aren't Dead
03

Expert Pruning: When Dead Experts Aren't Dead

February 2026

Per-Expert Mixed-Bit Quantization
02

Per-Expert Mixed-Bit Quantization

February 2026

Profiling Expert Activation Patterns in 512-Expert MoE Models
01

Profiling Expert Activation Patterns in 512-Expert MoE Models

February 2026