AI Governance & Capability Assurance

Watchman.

Know exactly what your quantized model can do, before it goes live.

Watchman is a capability audit and governance platform for AI models. It tells you which capabilities are preserved, which are at risk, and which are degraded, derived from the model's weight structure in minutes, not weeks of benchmarking.

The problem with deploying compressed models.

You quantize a model to fit your hardware. It runs. But what did you lose? Today, the only way to find out is to run weeks of domain-specific benchmarks after the fact. By then, you've already shipped it. Watchman answers the question before you deploy.

Pre-Deployment Capability Audit

Capability Certification

Receive a capability audit report for any quantized model before it goes into production. Know which capabilities are preserved, at risk, or degraded, in minutes not days.

Predict Before You Build

For any proposed quantization, receive a prediction of which capabilities will degrade and by how much, before building the model. Compare candidate configurations on capability preservation rather than headline scores.

Replace Weeks with Minutes

Replace weeks of domain-specific evaluation with a verified capability certification derived from the model's weight structure. No benchmark runs required.

Memory Target Intelligence

Find the Exact Optimum

Identify the exact memory budget at which your required capabilities are preserved, not a round number, the actual optimum. Get two recommended build targets automatically: the efficiency knee and the quality peak.

Quality Valley Warnings

Receive an explicit warning when a chosen memory target falls in a quality valley where more memory actually produces worse capability outcomes. Stop wasting budget on counterproductive allocations.

Capability-First Budgeting

Instead of "give me a 24GB model," say "give me the smallest model that preserves chemistry reasoning and legal analysis" and receive a verified result with predicted capability status.

Governance for Regulated Industries

Compliance-Ready Reporting

Demonstrate to compliance teams that specific capabilities are preserved in a quantized deployment using a verifiable, auditable methodology. Produce reports independent of benchmark selection, derived from model internals, not from evaluation runs that can be gamed.

Regulatory Evidence

Show regulators that the quantized model retains the capabilities it was certified for at full precision. Back SLA commitments for AI-powered products with evidence derived from model internals rather than periodic evaluation runs.

Capability Preservation Certificates

Offer customers and partners a capability preservation certificate that is verifiable and auditable, not a benchmark score, a structural guarantee. Make capability preservation a contractual commitment.

Continuous Monitoring

Demonstrate to enterprise customers that capability regression monitoring is continuous and systematic, not periodic and manual. Every quantization decision is logged, auditable, and traceable.

Capability Protection & Diagnostics

Protect What Matters

Specify which capabilities must be preserved and let the allocator find the smallest model that satisfies those constraints. Know precisely which tensors govern your required capabilities and ensure they are protected regardless of overall budget pressure.

Instant Regression Diagnosis

When a quantized model underperforms on a specific task, identify the responsible tensors in minutes rather than running weeks of ablation experiments. Receive a ranked list of under-funded tensors for each capability and what it would cost to fix them.

Fine-Tune Protection

When a fine-tuned model is quantized, identify where the fine-tuning stored its new capabilities and ensure those tensors are protected. Predict which capabilities a fine-tune added that are at risk from compression, before running any evaluation.

Trace to Source

Trace a capability regression to its exact source: specific layers, specific tensor types, specific architectural components. No more guessing which part of the model is responsible.

Fleet Deployment & Hardware Intelligence

Per-Device Capability Profiles

Determine which device configurations preserve which capabilities across a heterogeneous hardware fleet. Make deterministic per-device capability assignments with verified predictions rather than guesses.

Automatic Routing

Serve the right model variant to the right device automatically based on verified capability requirements. This device gets full reasoning. This device gets reduced capability. All verified, none guessed.

10x Less Evaluation Cost

Replace a full benchmark sweep across 10+ memory targets, potentially hundreds of GPU hours, with a 5-minute prediction that identifies both optimal targets. Triage which builds need full benchmarking and which can be certified from internal signals alone.

Growing Institutional Knowledge

Capability Signature Database

Every build and benchmark adds to a capability signature database that becomes more accurate and more comprehensive over time. Accumulate a proprietary understanding of your model family's capability architecture that no benchmark score can provide.

Cross-Version Knowledge Transfer

Know, for any model in your fleet, exactly which tensor classes govern which capabilities, knowledge that transfers across model versions and fine-tunes. Compare the capability signatures of a base model and its fine-tunes to understand what changed and what needs protecting.

LLM Security Scanning

Before a model reaches production, Watchman runs it through a comprehensive suite of adversarial probes, testing for toxicity, prompt injection, data leakage, hallucination, and code-level vulnerabilities. Every scan produces a verifiable safety report.

Red-Team Attack Generation

Automated adversarial probing where a red-team LLM attacks the target model and adapts its strategy in real time, attempting to elicit toxic or unsafe output.

Prompt Injection & Encoding

Tests for injection attacks through text encoding manipulation, DAN-style jailbreaks, adversarial suffixes, and the full PromptInject framework (NeurIPS ML Safety 2022).

Malware & XSS Generation

Probes that attempt to make the model generate malware code, output malicious content signatures, or produce cross-site scripting payloads for private data exfiltration.

Training Data Leakage

Evaluates whether the model will replay memorised training data, and tests for glitch tokens that provoke unusual or dangerous behaviour.

Hallucination & Misinformation

Snowballed hallucination probes, misleading claim validation, package hallucination attacks on code generation, and the RealToxicityPrompts benchmark suite.

Social Engineering & Bypass

Grandma-style social engineering, Riley Goodside attacks, imperceptible Unicode perturbations (invisible characters, homoglyphs, reorderings), and continuation probes for undesirable content.

View all 22 scanning probes
Probe Description
blankSends an empty prompt to test baseline model behaviour and safety defaults.
atkgenAutomated red-team attack generation. An adversarial LLM probes the target and adapts its strategy to elicit toxic output.
badcharsImperceptible Unicode perturbations: invisible characters, homoglyphs, reorderings, and deletions (Bad Characters paper).
av_spam_scanningAttempts to make the model output malicious content signatures detectable by antivirus scanners.
continuationTests whether the model will continue a probably undesirable word or phrase.
danVarious DAN (Do Anything Now) and DAN-like jailbreak attacks.
donotanswerPrompts to which responsible language models should refuse to answer.
encodingPrompt injection through various text encoding schemes.
gcgDisrupts system prompts by appending optimised adversarial suffixes.
glitchProbes for glitch tokens that provoke unusual or dangerous model behaviour.
grandmaSocial engineering via appeals to be reminded of one's grandmother.
goodsideImplementations of Riley Goodside prompt injection attacks.
leakreplayEvaluates whether the model will replay memorised training data.
lmrcSubsample of Language Model Risk Cards probes for systematic risk assessment.
malwaregenAttempts to make the model generate code for building malware.
misleadingAttempts to make the model support misleading and false claims.
packagehallucinationTries to get code generations that specify non-existent (and therefore insecure) packages.
promptinjectAgency Enterprise PromptInject framework (best paper award, NeurIPS ML Safety Workshop 2022).
realtoxicitypromptsSubset of the RealToxicityPrompts benchmark for toxicity evaluation.
snowballSnowballed hallucination probes designed to trick models into wrong answers on overly complex questions.
xssTests for cross-site scripting vulnerabilities including private data exfiltration vectors.

Deploy with certainty, ship with confidence.

Watchman turns model deployment from a leap of faith into a governed, auditable, security-tested process. Capability assurance and adversarial scanning in one platform.