Model provenance & integrity audit · Black Sheep AI

Watchman audit report

Audit WM-20260612-b2ce0ec1 · 2026-06-12T23:16:55.030507+00:00 · Watchman v0.2.1 · library v1

Models under audit

rolesourceprecision
base/data/models/Qwen2.5-1.5Bfull precision
candidate/data/models/Qwen2.5-1.5B-q4q4 g64
control/dataq4 g64

Findings

detection featurevaluethreshold (fit from clean/compression class)
top-10% excess concentration0.0≥ 0.3138
log10 total excess-12.0≥ -0.3068
peak localized change (robust score)0.0(reported for context)

Tensors compared: 197 of 197 2-D weight tensors (coverage 100%); measurement settings recorded in the machine-readable report; differential control used.

most-changed tensors (excess over matched control)

#tensorexcess divergence
1layers.14.self_attn.k_proj.weight0.000000
2layers.19.self_attn.k_proj.weight0.000000
3layers.24.self_attn.v_proj.weight0.000000
4layers.12.self_attn.k_proj.weight0.000000
5layers.23.self_attn.v_proj.weight0.000000
6layers.5.self_attn.k_proj.weight0.000000
7layers.27.self_attn.v_proj.weight0.000000
8layers.25.self_attn.v_proj.weight0.000000
9layers.16.self_attn.k_proj.weight0.000000
10layers.22.self_attn.v_proj.weight0.000000
11layers.19.self_attn.v_proj.weight0.000000
12layers.20.self_attn.k_proj.weight0.000000
13layers.6.self_attn.k_proj.weight0.000000
14layers.8.self_attn.k_proj.weight0.000000
15layers.23.self_attn.k_proj.weight0.000000
16layers.9.self_attn.k_proj.weight0.000000
17layers.2.self_attn.k_proj.weight0.000000
18layers.1.self_attn.k_proj.weight0.000000
19layers.20.self_attn.v_proj.weight0.000000
20layers.25.self_attn.k_proj.weight0.000000
21layers.21.self_attn.k_proj.weight0.000000
22layers.15.self_attn.k_proj.weight0.000000
23layers.26.self_attn.k_proj.weight0.000000
24layers.10.self_attn.k_proj.weight0.000000
25layers.27.self_attn.k_proj.weight0.000000

Compliance mapping

Watchman produces audit evidence supporting these obligations; it does not by itself make a system compliant.

frameworkthis audit supportsevidence in this report
EU AI Act (Reg. 2024/1689): GPAI / Annex III Art. 11 technical documentationmodel identity and lineage record; verification of third-party base-model claims; documented method limitationsmodels.*.files (SHA-256 chain of custody)
verdict + classification
library.validation_loo
known-limitations appendix
US FY2026 NDAA / DFARS: AI/ML weight security; integrity check before deploymentpre-deployment integrity gate (CI exit codes 0/2/1); registry-pinnable weight hashes; per-release attestationverdict.exit_code
models.candidate.files
attestation.cdx.json
OMB M-26-04: continuous accountability for federal AIscheduled re-audits; deterministic comparison over timeaudit_id + timestamp_utc series
recorded, reproducible analysis settings
NSA AI supply-chain guidance (Mar 2026): model-layer controlsthird-party model intake verificationmodels.base/candidate provenance
verdict
AI-BOM (CycloneDX/SPDX) procurement artifactmodel name / version / weights-identifier / lineage entry with verified provenanceattestation.cdx.json
US banking MRM (OCC/Fed/FDIC, Apr 2026): third-party model validationindependent validation evidence for vendor and open-weight modelsfull report + limitations (validation record)

Chain of custody

base

filebytessha256
config.json6840e8c8aa86468aba09c9d32157ff4bc2301c7e6c50e4398960425b2ea71e66f77
model.safetensors3,087,467,144a961db72e75d52b18e6b0c9d379e51a26973b233385e0e127fdda7d648aec796

candidate

filebytessha256
config.json942785d6afe34942460abb53fd137fd23838546dbd37568fefbca795a2e7b717029
model.safetensors868,629,082ae2082d9cdebdbe9f77e636ce12125ecac8fea6f100ea3fd4b131366a0c73bd0
model.safetensors.index.json51,60919e664257b50911ac12ff231c42e952c210d48d1861f7fa304eb640dd10dccb8

control

filebytessha256
config.json942785d6afe34942460abb53fd137fd23838546dbd37568fefbca795a2e7b717029
model.safetensors868,629,082ae2082d9cdebdbe9f77e636ce12125ecac8fea6f100ea3fd4b131366a0c73bd0
model.safetensors.index.json51,60919e664257b50911ac12ff231c42e952c210d48d1861f7fa304eb640dd10dccb8

analysis pipeline

propertyvalue
pipeline bundle digest (SHA-256 over the pinned analysis modules)4c5d08117650f2c965d806e7e0171a719289ad81037ff586540557f7c17a4e94
integrityevery analysis module is hash-pinned; this digest changes if any module changes

environment

python3.12.0
platformarm64 workstation (Apple Silicon)
mlx0.31.1
mlx-lm0.31.2
numpy2.4.4
huggingface-hub1.7.1

Methodology

Weights-direct measurement. Watchman reads the model's weight files directly. No training data, no prompts, no inference access, and no cooperation from the publisher. Each audit is deterministic: identical inputs and recorded settings produce an identical verdict, so any party can independently reproduce it.

Matched-compression differential. When the candidate ships quantized, Watchman measures it against an independently quantized control of the claimed base at the candidate's own declared settings. Compression effects cancel almost completely, and what remains is unexplained change. Full-precision candidates are measured against the claimed base directly.

Two-stage decision. A modification is flagged when the unexplained change exceeds what compression alone is ever observed to produce. The decision thresholds are fitted from a labelled, versioned reference library and printed in this report. A flagged modification is then classified by matching its signature against the library's labelled modification types.

Reference library

Reference library v1 contains 20 labelled signatures: alignment_modification ×4, clean ×1, domain_finetune ×4, instruction_tuning ×6, quantization ×5. Leave-one-out validation: detection 18/20, characterization 11/12 of detected.

Known limitations

Audit WM-20260612-b2ce0ec1, generated by Watchman v0.2.1, Black Sheep AI. This report describes defensive model-provenance and integrity auditing. The verdict is a statistical measurement against the cited reference library, with the limitations stated above; it is evidence, not a guarantee. Machine-readable form: audit_report.json.