PEFT `target_modules` Has Three Modes, Most Teams Use the Wrong One
LoRA Engineering

PEFT `target_modules` Has Three Modes, Most Teams Use the Wrong One

May 2026 · Black Sheep AI Research

PEFT's LoraConfig(target_modules=...) accepts three different argument shapes, list of leaf names, single regex string, or list of full module paths, and they behave subtly differently. Teams reach for the wrong one and end up with adapters on tensors they didn't mean to target. Here's how to pick correctly and validate without GPUs.

The three shapes

peft.LoraConfig 0.19 accepts target_modules as:

Shape A, list of leaf-name suffixes:

LoraConfig(target_modules=["q_proj", "v_proj"])

PEFT walks the model and inserts LoRA on every module whose name ends with one of the suffixes. So model.layers.0.self_attn.q_proj, model.layers.0.cross_attn.q_proj, and vision_encoder.q_proj all get adapters. You get coverage of every layer's q_proj and v_proj, but you cannot exclude a specific layer or specific architectural section.

Shape B, single regex string (compiled with re.fullmatch):

LoraConfig(target_modules=r"model\.layers\.(0|1|2)\.self_attn\.q_proj")

PEFT compiles this as a regex and matches against full module names. Only modules whose full path matches the pattern get adapters. You can be arbitrarily selective, match specific layer indices, specific architectural sections, specific named modules.

Shape C, list of full module paths:

LoraConfig(target_modules=["model.layers.0.self_attn.q_proj", "model.layers.5.mlp.up_proj"])

PEFT treats this as suffix matching against each entry. Often this works as you'd expect, if model.layers.0.self_attn.q_proj is unique in the model, only that one module gets the adapter. But if you have more deeply-nested copies (e.g. multimodal models with vision_model.layers.0.self_attn.q_proj AND text_model.layers.0.self_attn.q_proj), the suffix match silently catches both.

The trap

Shape C looks like the most explicit option ("just give it the exact paths!") and most teams reach for it first. Then they discover one of three failures:

Failure 1: Silent over-coverage. You list 50 module paths. The model has 100 modules ending in those exact suffixes. PEFT inserts 100 adapters, not 50. Your trainable param count is 2× what you computed.

Failure 2: Architectural duplication. Multimodal models (Qwen3.6-27B, Gemma-4 vision variants, Llama-4 with vision tower) have identically-named modules in both the language and vision sub-models. Shape C with full text-model paths still grabs the vision modules.

Failure 3: Silent zero-coverage. If even ONE entry in your list doesn't match any module in the model, PEFT raises a ValueError. So a typo in one path can fail the whole config rather than just skipping that entry. We've seen the failure surface as "no LoRA inserted, nothing trainable", which only shows up when you check model.print_trainable_parameters() after get_peft_model().

When to use which

A practical rule of thumb based on what you're trying to express:

Goal Shape Why
LoRA on every q_proj / v_proj across the whole model A (leaf names) The intent is "all of these"; suffix match is exactly right
LoRA on a specific subset of layers B (regex) Regex is the only shape that lets you express "layer index 0–9 only" cleanly
LoRA on the language part of a multimodal model only B (regex model\.language_model\..*) Suffix matching can't exclude vision; regex can
LoRA on a hand-picked tensor list from a manifest B (regex with escaped \| joins) Each path escaped via re.escape and joined with \|, no suffix-matching surprises
Quick experiments where you don't care about precision A Fastest to write; OK if your model has flat naming

The escaped-regex pattern for arbitrary explicit lists:

import re
explicit_paths = [
    "model.layers.0.self_attn.q_proj",
    "model.layers.5.mlp.gate_proj",
    "model.layers.12.self_attn.o_proj",
]
target_modules_regex = "|".join(re.escape(p) for p in explicit_paths)
config = LoraConfig(target_modules=target_modules_regex, ...)

This gives you exact module-by-module control with no suffix-matching ambiguity, no architectural duplication, and a clear error if any path doesn't exist.

Validate without a GPU

You can verify your target_modules does what you expect on a CPU, in seconds, without any GPU memory. Use accelerate.init_empty_weights() to materialize the model architecture (the module hierarchy) without loading any weights:

import torch
from accelerate import init_empty_weights
from peft import LoraConfig, get_peft_model
from transformers import AutoConfig, AutoModelForImageTextToText

cfg = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
with init_empty_weights():
    model = AutoModelForImageTextToText.from_config(cfg, trust_remote_code=True, dtype=torch.bfloat16)

# Now apply your LoraConfig
lc = LoraConfig(target_modules=your_target_modules, r=16, lora_alpha=16, bias="none")
pmodel = get_peft_model(model, lc)

# Count actual LoRA params
trainable_lora = sum(p.numel() for n, p in pmodel.named_parameters() if "lora_" in n)
expected = your_pre_computed_total
print(f"actual: {trainable_lora/1e6:.2f}M  expected: {expected/1e6:.2f}M  ratio: {trainable_lora/expected:.4f}")

Memory peak: <500 MB for a 27B model. Wall time: under 30 seconds. Run this before any GPU training, and you'll catch all three failure modes above before they cost you a 6-hour training run.

Two more things that bite people

init_empty_weights() returns the inner model, NOT the wrapper. If you call AutoModel.from_config() you get the inner <X>Model class with module names like language_model.layers.0.q_proj (no model. prefix). If you call AutoModelForCausalLM.from_config() or AutoModelForImageTextToText.from_config(), you get the wrapper class with model.language_model.layers.0.q_proj. Your target paths must match the class your training code uses, not the one used for validation, or the trainable-param check will mislead.

MTP (multi-token prediction) modules are sometimes in the safetensors but not in the architecture. Some recent models (Qwen3, DeepSeek-V4 family) ship saved weights that include MTP layers, but the architecture class instantiates the model without them. If you derive your target list from a safetensors file, filter for mtp. and multi_token_pred. patterns or you'll hit the "ValueError: target X not found" trap.

The shorter rule

Default to Shape B (regex). It's slightly more verbose to write but unambiguous to read, supports every selection pattern you'd want, and never silently grabs unintended modules. For one-off experiments where you really do mean "every q_proj in the model", Shape A is fine. Avoid Shape C unless you've manually verified the model has no architectural duplication of the names you're listing.


Source: PEFT 0.19 documented behaviour; trap surfaced during multimodal LoRA training in 2026. Validation recipe is init_empty_weights() from accelerate.

Read more: Multimodal LoRA on Text-Only Data: The Visual-Block Gradient Sink, A 16 GB Mac Mini Can Quantize a 250 GB Model.