| role | source | precision |
|---|---|---|
| base | /data/models/Qwen2.5-1.5B | full precision |
| candidate | /data/models/Qwen2.5-1.5B-q4 | q4 g64 |
| control | /data | q4 g64 |
| detection feature | value | threshold (fit from clean/compression class) |
|---|---|---|
| top-10% excess concentration | 0.0 | ≥ 0.3138 |
| log10 total excess | -12.0 | ≥ -0.3068 |
| peak localized change (robust score) | 0.0 | (reported for context) |
Tensors compared: 197 of 197 2-D weight tensors (coverage 100%); measurement settings recorded in the machine-readable report; differential control used.
| # | tensor | excess divergence |
|---|---|---|
| 1 | layers.14.self_attn.k_proj.weight | 0.000000 |
| 2 | layers.19.self_attn.k_proj.weight | 0.000000 |
| 3 | layers.24.self_attn.v_proj.weight | 0.000000 |
| 4 | layers.12.self_attn.k_proj.weight | 0.000000 |
| 5 | layers.23.self_attn.v_proj.weight | 0.000000 |
| 6 | layers.5.self_attn.k_proj.weight | 0.000000 |
| 7 | layers.27.self_attn.v_proj.weight | 0.000000 |
| 8 | layers.25.self_attn.v_proj.weight | 0.000000 |
| 9 | layers.16.self_attn.k_proj.weight | 0.000000 |
| 10 | layers.22.self_attn.v_proj.weight | 0.000000 |
| 11 | layers.19.self_attn.v_proj.weight | 0.000000 |
| 12 | layers.20.self_attn.k_proj.weight | 0.000000 |
| 13 | layers.6.self_attn.k_proj.weight | 0.000000 |
| 14 | layers.8.self_attn.k_proj.weight | 0.000000 |
| 15 | layers.23.self_attn.k_proj.weight | 0.000000 |
| 16 | layers.9.self_attn.k_proj.weight | 0.000000 |
| 17 | layers.2.self_attn.k_proj.weight | 0.000000 |
| 18 | layers.1.self_attn.k_proj.weight | 0.000000 |
| 19 | layers.20.self_attn.v_proj.weight | 0.000000 |
| 20 | layers.25.self_attn.k_proj.weight | 0.000000 |
| 21 | layers.21.self_attn.k_proj.weight | 0.000000 |
| 22 | layers.15.self_attn.k_proj.weight | 0.000000 |
| 23 | layers.26.self_attn.k_proj.weight | 0.000000 |
| 24 | layers.10.self_attn.k_proj.weight | 0.000000 |
| 25 | layers.27.self_attn.k_proj.weight | 0.000000 |
Watchman produces audit evidence supporting these obligations; it does not by itself make a system compliant.
| framework | this audit supports | evidence in this report |
|---|---|---|
| EU AI Act (Reg. 2024/1689): GPAI / Annex III Art. 11 technical documentation | model identity and lineage record; verification of third-party base-model claims; documented method limitations | models.*.files (SHA-256 chain of custody) verdict + classification library.validation_loo known-limitations appendix |
| US FY2026 NDAA / DFARS: AI/ML weight security; integrity check before deployment | pre-deployment integrity gate (CI exit codes 0/2/1); registry-pinnable weight hashes; per-release attestation | verdict.exit_code models.candidate.files attestation.cdx.json |
| OMB M-26-04: continuous accountability for federal AI | scheduled re-audits; deterministic comparison over time | audit_id + timestamp_utc series recorded, reproducible analysis settings |
| NSA AI supply-chain guidance (Mar 2026): model-layer controls | third-party model intake verification | models.base/candidate provenance verdict |
| AI-BOM (CycloneDX/SPDX) procurement artifact | model name / version / weights-identifier / lineage entry with verified provenance | attestation.cdx.json |
| US banking MRM (OCC/Fed/FDIC, Apr 2026): third-party model validation | independent validation evidence for vendor and open-weight models | full report + limitations (validation record) |
| file | bytes | sha256 |
|---|---|---|
| config.json | 684 | 0e8c8aa86468aba09c9d32157ff4bc2301c7e6c50e4398960425b2ea71e66f77 |
| model.safetensors | 3,087,467,144 | a961db72e75d52b18e6b0c9d379e51a26973b233385e0e127fdda7d648aec796 |
| file | bytes | sha256 |
|---|---|---|
| config.json | 942 | 785d6afe34942460abb53fd137fd23838546dbd37568fefbca795a2e7b717029 |
| model.safetensors | 868,629,082 | ae2082d9cdebdbe9f77e636ce12125ecac8fea6f100ea3fd4b131366a0c73bd0 |
| model.safetensors.index.json | 51,609 | 19e664257b50911ac12ff231c42e952c210d48d1861f7fa304eb640dd10dccb8 |
| file | bytes | sha256 |
|---|---|---|
| config.json | 942 | 785d6afe34942460abb53fd137fd23838546dbd37568fefbca795a2e7b717029 |
| model.safetensors | 868,629,082 | ae2082d9cdebdbe9f77e636ce12125ecac8fea6f100ea3fd4b131366a0c73bd0 |
| model.safetensors.index.json | 51,609 | 19e664257b50911ac12ff231c42e952c210d48d1861f7fa304eb640dd10dccb8 |
| property | value |
|---|---|
| pipeline bundle digest (SHA-256 over the pinned analysis modules) | 4c5d08117650f2c965d806e7e0171a719289ad81037ff586540557f7c17a4e94 |
| integrity | every analysis module is hash-pinned; this digest changes if any module changes |
| python | 3.12.0 |
| platform | arm64 workstation (Apple Silicon) |
| mlx | 0.31.1 |
| mlx-lm | 0.31.2 |
| numpy | 2.4.4 |
| huggingface-hub | 1.7.1 |
Weights-direct measurement. Watchman reads the model's weight files directly. No training data, no prompts, no inference access, and no cooperation from the publisher. Each audit is deterministic: identical inputs and recorded settings produce an identical verdict, so any party can independently reproduce it.
Matched-compression differential. When the candidate ships quantized, Watchman measures it against an independently quantized control of the claimed base at the candidate's own declared settings. Compression effects cancel almost completely, and what remains is unexplained change. Full-precision candidates are measured against the claimed base directly.
Two-stage decision. A modification is flagged when the unexplained change exceeds what compression alone is ever observed to produce. The decision thresholds are fitted from a labelled, versioned reference library and printed in this report. A flagged modification is then classified by matching its signature against the library's labelled modification types.
Reference library v1 contains 20 labelled signatures: alignment_modification ×4, clean ×1, domain_finetune ×4, instruction_tuning ×6, quantization ×5. Leave-one-out validation: detection 18/20, characterization 11/12 of detected.
Audit WM-20260612-b2ce0ec1, generated by Watchman v0.2.1, Black Sheep AI. This report describes defensive model-provenance and integrity auditing. The verdict is a statistical measurement against the cited reference library, with the limitations stated above; it is evidence, not a guarantee. Machine-readable form: audit_report.json.