Model provenance & integrity audit · Black Sheep AI

Watchman audit report

Audit WM-20260612-9abc1dd9 · 2026-06-12T23:16:49.182583+00:00 · Watchman v0.2.1 · library v1

Models under audit

role	source	precision
base	/data/models/Qwen2.5-1.5B	full precision
candidate	/data/models/Qwen2.5-1.5B-Instruct	full precision
control	candidate is full precision; raw functional diff is used

Findings

detection feature	value	threshold (fit from clean/compression class)
top-10% excess concentration	0.461	≥ 0.3138
log₁₀ total excess	-2.596	≥ -0.3068
peak localized change (robust score)	64.5	(reported for context)

Tensors compared: 197 of 197 2-D weight tensors (coverage 100%); measurement settings recorded in the machine-readable report; differential control not used (full-precision candidate).

where the change concentrates (share of excess by tensor role)

v_proj		31.0%
k_proj		26.0%
embed		19.6%
down_proj		7.4%
o_proj		7.3%
q_proj		6.2%
up_proj		1.3%
gate_proj		1.2%

Depth center 0.52 (0 = first layer, 1 = last), spread 0.30.

most-changed tensors (excess over matched control)

#	tensor	excess divergence
1	embed_tokens.weight	0.000497
2	layers.25.self_attn.k_proj.weight	0.000061
3	layers.27.self_attn.k_proj.weight	0.000045
4	layers.0.self_attn.v_proj.weight	0.000045
5	layers.1.self_attn.v_proj.weight	0.000042
6	layers.2.self_attn.v_proj.weight	0.000040
7	layers.17.self_attn.k_proj.weight	0.000040
8	layers.21.self_attn.k_proj.weight	0.000036
9	layers.24.self_attn.k_proj.weight	0.000036
10	layers.6.self_attn.v_proj.weight	0.000035
11	layers.14.self_attn.v_proj.weight	0.000034
12	layers.19.self_attn.k_proj.weight	0.000034
13	layers.3.self_attn.v_proj.weight	0.000033
14	layers.16.self_attn.v_proj.weight	0.000032
15	layers.23.self_attn.k_proj.weight	0.000032
16	layers.20.self_attn.k_proj.weight	0.000032
17	layers.12.self_attn.v_proj.weight	0.000032
18	layers.19.self_attn.v_proj.weight	0.000031
19	layers.25.self_attn.v_proj.weight	0.000031
20	layers.26.self_attn.k_proj.weight	0.000031
21	layers.13.self_attn.v_proj.weight	0.000030
22	layers.11.self_attn.v_proj.weight	0.000030
23	layers.17.self_attn.v_proj.weight	0.000029
24	layers.15.self_attn.v_proj.weight	0.000029
25	layers.9.self_attn.v_proj.weight	0.000028

Compliance mapping

Watchman produces audit evidence supporting these obligations; it does not by itself make a system compliant.

framework	this audit supports	evidence in this report
EU AI Act (Reg. 2024/1689): GPAI / Annex III Art. 11 technical documentation	model identity and lineage record; verification of third-party base-model claims; documented method limitations	models.*.files (SHA-256 chain of custody) verdict + classification library.validation_loo known-limitations appendix
US FY2026 NDAA / DFARS: AI/ML weight security; integrity check before deployment	pre-deployment integrity gate (CI exit codes 0/2/1); registry-pinnable weight hashes; per-release attestation	verdict.exit_code models.candidate.files attestation.cdx.json
OMB M-26-04: continuous accountability for federal AI	scheduled re-audits; deterministic comparison over time	audit_id + timestamp_utc series recorded, reproducible analysis settings
NSA AI supply-chain guidance (Mar 2026): model-layer controls	third-party model intake verification	models.base/candidate provenance verdict
AI-BOM (CycloneDX/SPDX) procurement artifact	model name / version / weights-identifier / lineage entry with verified provenance	attestation.cdx.json
US banking MRM (OCC/Fed/FDIC, Apr 2026): third-party model validation	independent validation evidence for vendor and open-weight models	full report + limitations (validation record)

Chain of custody

base

file	bytes	sha256
config.json	684	0e8c8aa86468aba09c9d32157ff4bc2301c7e6c50e4398960425b2ea71e66f77
model.safetensors	3,087,467,144	a961db72e75d52b18e6b0c9d379e51a26973b233385e0e127fdda7d648aec796

candidate

file	bytes	sha256
config.json	660	98d2ff8cc47488d08a2b0b3acf4eb99ef210779b42bd48605f6b8e36acdbf670
model.safetensors	3,087,467,144	dd924a11b4c220f385b51ffa522daea7c9f3d850e31b162bb5661df483c6d3ee

analysis pipeline

property	value
pipeline bundle digest (SHA-256 over the pinned analysis modules)	4c5d08117650f2c965d806e7e0171a719289ad81037ff586540557f7c17a4e94
integrity	every analysis module is hash-pinned; this digest changes if any module changes

environment

python	3.12.0
platform	arm64 workstation (Apple Silicon)
mlx	0.31.1
mlx-lm	0.31.2
numpy	2.4.4
huggingface-hub	1.7.1

Methodology

Weights-direct measurement. Watchman reads the model's weight files directly. No training data, no prompts, no inference access, and no cooperation from the publisher. Each audit is deterministic: identical inputs and recorded settings produce an identical verdict, so any party can independently reproduce it.

Matched-compression differential. When the candidate ships quantized, Watchman measures it against an independently quantized control of the claimed base at the candidate's own declared settings. Compression effects cancel almost completely, and what remains is unexplained change. Full-precision candidates are measured against the claimed base directly.

Two-stage decision. A modification is flagged when the unexplained change exceeds what compression alone is ever observed to produce. The decision thresholds are fitted from a labelled, versioned reference library and printed in this report. A flagged modification is then classified by matching its signature against the library's labelled modification types.

Reference library

Reference library v1 contains 20 labelled signatures: alignment_modification ×4, clean ×1, domain_finetune ×4, instruction_tuning ×6, quantization ×5. Leave-one-out validation: detection 18/20, characterization 11/12 of detected.

Known limitations

Validated leave-one-out on the v1 reference library: detection 18/20; characterization 11/12 of detected (92%), 11/14 over all modified examples, across 5 model families and 3 modification classes.
Known detection miss mode: extremely broad, low-intensity instruction tuning of very small models (the two LOO misses). High-localization edits were detected in all tested cases.
The instruction-tuning class is intrinsically heterogeneous; both observed characterization errors involve it. Alignment-modification and domain-specialization examples classified 4/4 and 4/4 in leave-one-out.
The clean/compression negative class currently holds 6 examples; thresholds are fit from it and stated in this report. Accuracy improves as the reference library grows.
Validation was performed exclusively on benign, publicly disclosed model modifications; this is defensive provenance/integrity auditing.
v1 reads full-precision safetensors and MLX packed quantization only; other quantized release formats return indeterminate, never a false "clean".
Tokenizer and generation-config files are not hashed in v1 (weights and model config only).

Audit WM-20260612-9abc1dd9, generated by Watchman v0.2.1, Black Sheep AI. This report describes defensive model-provenance and integrity auditing. The verdict is a statistical measurement against the cited reference library, with the limitations stated above; it is evidence, not a guarantee. Machine-readable form: audit_report.json.

Weight modification detected