Know exactly what model you're deploying.
Watchman verifies that an AI model is what it claims to be. It detects weight modifications hidden inside third-party and compressed releases, classifies what kind of change was made, and produces evidence-grade audit reports and AI-BOM attestations your security team, your compliance team, and your regulator can hold.
Vendor checkpoints. Open-weight bases. Community fine-tunes. Quantized repacks. Every one of them is tens of gigabytes of opaque numbers, and regulators on three continents now expect you to prove what's inside before it goes into production.
A file hash only tells you the bytes changed, and the bytes always change. Watchman tells you whether the weights were actually modified beyond their declared compression, how much, where in the network, and what kind of change it was.
Determines whether a model's weights were modified beyond their declared quantization, read directly from the weight files. No training data, no prompts, no inference access, no cooperation from the publisher. Minutes per model on commodity hardware.
Classifies a detected modification by type (instruction tuning, alignment modification, or domain specialization) and localizes where in the network the change concentrates, by component and by depth. Each modification type leaves a characteristic signature.
Every audit produces an evidence-grade report (SHA-256 chain of custody for every model file, hash-pinned analysis pipeline, audit ID, decision thresholds, stated limitations), plus a machine-readable CycloneDX-style AI-BOM attestation ready to merge into your model bill of materials.
Deterministic and CI-native: exit 0 clean, exit 2 modification detected, exit 1 indeterminate. Gate your model registry on it, run it in your deployment pipeline, and schedule re-audits to build the continuous-accountability record federal frameworks now expect.
Reads the weight files themselves. No training data, no benchmark selection to argue about, no behavioral test a clever release can study for.
Watchman audits the quantized releases people actually distribute. A modification hidden under compression stands out, and ordinary compression is never blamed for one.
Same inputs, same settings, same verdict. Every measurement setting is recorded in the report. Two parties can independently reproduce an audit and compare.
An unsupported format or a mismatched claimed base returns indeterminate, never a false "clean". The verdict you act on is one the tool could actually defend.
A full audit of a multi-gigabyte model completes in minutes on a single workstation, or runs as a containerized service in your CI. Nothing ever leaves your environment.
The audit applies its own discipline to itself: the analysis pipeline is hash-pinned in every report, the reference library is versioned with its validation record, and limitations are stated, not hidden.
In our validation, every tested weight modification on models 3B and larger was detected. Watchman is most reliable on the 7B-and-up open-weight models enterprises actually deploy. Borderline sub-2B cases route to a review band, never a false "clean."
held-out detection of real modifications (leave-one-out validation)
of detected modifications correctly classified. Alignment-modification and domain-specialization classes were perfect
model families in the validation library; detection generalizes across families, validated held-out by family
Every Watchman report states its limitations alongside its verdict, including the one miss mode we know about (very broad, low-intensity tuning of very small models) and the exact decision thresholds in force. The reference library grows with every labelled audit, and accuracy compounds with it. We publish what the tool can and cannot tell you, because an audit you can't interrogate is not evidence.
Before a third-party model, whether full precision or quantized, enters your registry, Watchman verifies it against its claimed base and attaches the attestation. Models without adequate provenance never reach production.
When you compress a model for deployment, Watchman certifies the release is your base plus declared quantization and nothing else: a verifiable claim you can hand to customers and partners. Pairs with Shepherd, which builds the optimized models Watchman certifies.
A self-contained container or workstation install. Your weights, your audits, your evidence stay put. Nothing leaves your environment, ever. Built for classified and regulated networks.
Submit audits over a simple REST API or run the CLI directly in CI. Reports come back as human-readable HTML, machine-readable JSON, and a CycloneDX-style attestation fragment.
Immutable versioned images, health-checked deploys, automatic rollback. The platform that audits your supply chain doesn't get to have an unaccountable one of its own.
Governments are converging on the same demand: prove what your model is, where it came from, and that nobody changed it on the way in. Watchman produces exactly that evidence.
EU AI Act enforcement with fines begins 2 August 2026. Technical documentation for GPAI and high-risk systems must account for model identity and lineage. Watchman reports slot directly into the file.
The FY2026 NDAA puts model weights inside the defense security perimeter: verified registries and integrity checks before deployment. OMB M-26-04 demands continuous accountability for federal AI. Watchman is that gate.
Banking model-risk guidance already expects third-party validation; AI-specific rules are in the pipeline. Watchman builds the evidence pack you'll want on file the day they land.
Watch a real audit run end-to-end, read the actual reports it produced, then talk to us about putting Watchman in front of your model registry.