6.2 AI Bill of Materials (AI BOM)

Transparency Requirements for Upstream Models and Datasets

Supply Chain Transparency for AI

Just as a Software Bill of Materials (SBOM) documents software components for security and compliance, an AI Bill of Materials (AI BOM) provides comprehensive documentation of an AI system's components—models, datasets, algorithms, and their provenance. As AI systems increasingly build on foundation models and third-party components, understanding the full supply chain becomes essential for risk management, compliance, and accountability.

6.2.1 The Case for AI BOMs

Regulatory Drivers

EU AI Act

  • Article 53 (GPAI): General-purpose AI providers must make available technical documentation including training data description
  • Article 11 (High-Risk): Technical documentation must include "detailed description of the elements of the AI system and of the process for its development"
  • Annex IV: Specifies required documentation including data requirements, design specifications, and development processes

US Executive Order 14110

  • Calls for standards to document AI systems and their components
  • Encourages adoption of risk management practices including supply chain transparency
  • NIST AI RMF 1.0 includes supply chain considerations

Industry Standards

  • NIST AI RMF: Map function includes understanding AI supply chain
  • ISO/IEC 42001: AI Management System requires component documentation
  • IEEE P2863: Organizational Governance of AI standard

Business Drivers

Risk Management

Understand vulnerabilities, biases, and failure modes inherited from upstream components

License Compliance

Track license obligations for models and training data throughout the supply chain

Incident Response

Quickly identify affected systems when vulnerabilities or issues are discovered upstream

Audit Readiness

Demonstrate compliance and due diligence to regulators, customers, and stakeholders

6.2.2 AI BOM Components

A comprehensive AI BOM documents all components that contribute to an AI system's behavior, from foundation models to training datasets to fine-tuning processes.

AI BOM Schema

1. System Identification

Field Description Example
System Name Unique identifier for the AI system customer-service-assistant-v2.1
Version Semantic version of the system 2.1.0
BOM Version Version of this BOM document 1.0.0
Created Date When this BOM was generated 2025-01-15T10:30:00Z
Owner Team/individual responsible AI Platform Team
Risk Classification EU AI Act risk tier Limited Risk

2. Model Components

Field Description Required
Foundation Model Base model used (name, version, provider) Yes
Model Architecture Architecture type and parameters Yes
Model License License governing model use Yes
Fine-Tuning Details Custom training applied If applicable
Adapter/LoRA Parameter-efficient fine-tuning components If applicable
Quantization Compression/optimization applied If applicable
Model Card Reference Link to full model documentation Yes

3. Training Data Components

Field Description Required
Dataset Name Identifier for each training dataset Yes
Data Source Origin (vendor, public, internal) Yes
Data Type Text, image, audio, structured, etc. Yes
License License governing data use Yes
Personal Data Whether PII is present Yes
Collection Method How data was gathered Yes
Date Range Temporal coverage of data Yes
Quality Metrics Data quality scores/assessments Recommended
Bias Assessment Known bias or representation issues Recommended
Datasheet Reference Link to full dataset documentation Recommended

4. Processing Pipeline Components

Field Description Required
Preprocessing Data cleaning, transformation steps Yes
Feature Engineering Feature extraction and selection If applicable
Training Framework ML framework and version Yes
Training Infrastructure Hardware, cloud provider Recommended
Post-processing Output filtering, guardrails Yes
RAG Components Retrieval systems and knowledge bases If applicable

5. Third-Party Dependencies

Field Description Required
API Dependencies External AI APIs called Yes
Software Libraries ML/AI libraries and versions Yes
Vendor Components Third-party AI components integrated Yes
SBOM Reference Link to software bill of materials Recommended

6. Safety & Compliance Components

Field Description Required
Guardrails Content filters, safety systems Yes
Human Oversight HITL/HOTL mechanisms For high-risk
Monitoring Drift, bias, performance monitoring Yes
Watermarking Content authenticity mechanisms If applicable

6.2.3 AI BOM Template

AI BOM JSON Schema


{
  "$schema": "https://aibom.example.org/schema/v1.0",
  "bomFormat": "AI-BOM",
  "specVersion": "1.0",
  "serialNumber": "urn:uuid:3e671687-395b-41f5-a30f-a58921a69b79",
  "version": 1,
  "metadata": {
    "timestamp": "2025-01-15T10:30:00Z",
    "tools": [
      {
        "vendor": "Acme Corp",
        "name": "AI BOM Generator",
        "version": "1.0.0"
      }
    ],
    "authors": [
      {
        "name": "AI Platform Team",
        "email": "ai-platform@acme.com"
      }
    ],
    "component": {
      "type": "ai-system",
      "name": "customer-service-assistant",
      "version": "2.1.0"
    }
  },
  "aiSystem": {
    "name": "Customer Service Assistant",
    "version": "2.1.0",
    "description": "AI-powered customer service chatbot for product support",
    "intendedUse": "Answering customer questions about products and services",
    "riskClassification": "limited-risk",
    "owner": {
      "name": "Customer Experience Team",
      "contact": "cx-team@acme.com"
    }
  },
  "models": [
    {
      "id": "model-001",
      "name": "claude-3-sonnet",
      "version": "20240229",
      "provider": "Anthropic",
      "type": "foundation-model",
      "architecture": "transformer",
      "parameters": "undisclosed",
      "license": {
        "id": "Anthropic-Commercial",
        "url": "https://anthropic.com/legal/aup"
      },
      "modelCard": "https://anthropic.com/claude-3-sonnet-model-card",
      "relationship": "api-dependency"
    },
    {
      "id": "model-002",
      "name": "product-classifier",
      "version": "1.3.0",
      "provider": "internal",
      "type": "fine-tuned",
      "architecture": "bert-base",
      "parameters": "110M",
      "baseModel": "bert-base-uncased",
      "license": {
        "id": "Apache-2.0"
      },
      "modelCard": "internal://model-cards/product-classifier-v1.3",
      "relationship": "internal-component"
    }
  ],
  "datasets": [
    {
      "id": "dataset-001",
      "name": "customer-support-logs",
      "version": "2024-Q4",
      "source": "internal",
      "type": "text",
      "size": "2.3M conversations",
      "dateRange": {
        "start": "2022-01-01",
        "end": "2024-12-31"
      },
      "license": {
        "id": "proprietary-internal"
      },
      "personalData": true,
      "personalDataDetails": {
        "types": ["names", "email addresses", "account numbers"],
        "processing": "pseudonymized",
        "lawfulBasis": "legitimate-interest"
      },
      "biasAssessment": {
        "performed": true,
        "date": "2024-11-15",
        "findings": "Slight overrepresentation of English-speaking customers",
        "mitigations": "Augmented with translated conversations"
      },
      "datasheet": "internal://datasheets/customer-support-logs-2024"
    },
    {
      "id": "dataset-002",
      "name": "product-catalog",
      "version": "2025-01",
      "source": "internal",
      "type": "structured",
      "size": "50,000 products",
      "license": {
        "id": "proprietary-internal"
      },
      "personalData": false
    }
  ],
  "pipeline": {
    "preprocessing": [
      {
        "step": "pii-redaction",
        "tool": "Microsoft Presidio",
        "version": "2.2.0"
      },
      {
        "step": "text-normalization",
        "tool": "internal-normalizer",
        "version": "1.0.0"
      }
    ],
    "training": {
      "framework": "PyTorch",
      "version": "2.1.0",
      "infrastructure": "AWS SageMaker"
    },
    "inference": {
      "framework": "vLLM",
      "version": "0.3.0",
      "infrastructure": "AWS EKS"
    },
    "postProcessing": [
      {
        "step": "content-filter",
        "tool": "internal-guardrails",
        "version": "2.0.0"
      },
      {
        "step": "pii-filter",
        "tool": "Microsoft Presidio",
        "version": "2.2.0"
      }
    ]
  },
  "ragComponents": [
    {
      "id": "rag-001",
      "name": "product-knowledge-base",
      "type": "vector-store",
      "provider": "Pinecone",
      "embeddingModel": "text-embedding-3-small",
      "sourceDatasets": ["dataset-002"],
      "updateFrequency": "daily"
    }
  ],
  "guardrails": [
    {
      "id": "guardrail-001",
      "name": "topic-filter",
      "type": "input-filter",
      "description": "Blocks off-topic and prohibited queries",
      "provider": "internal"
    },
    {
      "id": "guardrail-002",
      "name": "pii-output-filter",
      "type": "output-filter",
      "description": "Prevents PII leakage in responses",
      "provider": "Microsoft Presidio"
    },
    {
      "id": "guardrail-003",
      "name": "hallucination-check",
      "type": "output-validation",
      "description": "Validates responses against RAG sources",
      "provider": "internal"
    }
  ],
  "thirdPartyDependencies": [
    {
      "name": "Anthropic Claude API",
      "type": "api",
      "version": "2024-02-29",
      "license": "commercial",
      "vendorAssessment": "completed",
      "vendorAssessmentDate": "2024-06-15"
    },
    {
      "name": "Pinecone",
      "type": "service",
      "version": "latest",
      "license": "commercial",
      "vendorAssessment": "completed"
    }
  ],
  "compliance": {
    "euAiActClassification": "limited-risk",
    "transparencyMeasures": [
      "User informed of AI interaction at conversation start",
      "AI disclosure in terms of service"
    ],
    "humanOversight": {
      "type": "human-over-the-loop",
      "description": "Human review of flagged conversations; escalation paths"
    },
    "monitoring": {
      "drift": true,
      "bias": true,
      "performance": true,
      "frequency": "daily"
    }
  },
  "vulnerabilities": [
    {
      "id": "vuln-001",
      "description": "Base model may generate incorrect information without RAG grounding",
      "severity": "medium",
      "mitigation": "RAG-based response generation with citation requirements"
    }
  ]
}
                

6.2.4 AI BOM Management

Lifecycle Integration

Development

  • Generate initial BOM during design
  • Document all component selections
  • Track license compliance
  • Version control BOM with code

Deployment

  • Validate BOM completeness
  • Include BOM in deployment package
  • Register in central repository
  • Link to model registry entry

Operations

  • Update BOM on changes
  • Monitor for upstream vulnerabilities
  • Track dependency updates
  • Audit BOM accuracy periodically

Retirement

  • Archive BOM per retention policy
  • Document decommissioning
  • Maintain for audit purposes

BOM Repository Requirements

Centralized Storage

  • Single source of truth for all AI BOMs
  • Searchable inventory of components
  • Version history and change tracking
  • Access controls and audit logging

Automated Generation

  • CI/CD integration for BOM updates
  • Automatic dependency scanning
  • Validation against schema
  • Completeness checks

Vulnerability Tracking

  • Integration with vulnerability databases
  • Alerts for affected systems
  • Impact assessment automation
  • Remediation tracking

Compliance Reporting

  • Generate regulatory documentation
  • License compliance reports
  • Audit-ready exports
  • Supply chain visualization

Vendor BOM Requirements

Information to Request from AI Vendors

Component Minimum Disclosure Preferred Disclosure
Foundation Model Model family, approximate size, training cutoff Full model card with architecture details
Training Data High-level source categories Detailed data card with demographics and bias analysis
Fine-Tuning Whether custom training occurred Fine-tuning methodology and data sources
Safety Measures General safety approach System card with red teaming results
Third-Party Components Acknowledgment of dependencies Nested BOM for upstream components
Licenses Usage license terms Full license chain including training data

6.2.5 Supply Chain Risk Management

AI Supply Chain Risks

Upstream Risks

Risk Description BOM-Enabled Mitigation
Foundation Model Bias Biases in base models propagate downstream Track base model; monitor for bias; evaluate alternatives
Training Data Contamination Problematic data in upstream training Document data sources; assess vendor practices
License Violations Upstream license terms violated Full license chain documentation; compliance checking
Security Vulnerabilities Exploits in models or dependencies Component inventory; vulnerability monitoring
Vendor Dependency Critical reliance on single provider Dependency mapping; alternative identification

Incident Response with BOM

When a vulnerability or issue is discovered in an upstream component:

  1. Query BOM repository to identify all systems using the affected component
  2. Assess impact based on how the component is used in each system
  3. Prioritize remediation based on system risk classification
  4. Track remediation across all affected systems
  5. Update BOMs with patched component versions

Implementation Checklist

AI BOM Implementation Steps

Schema & Standards

Tooling & Infrastructure

Process Integration

Governance

Key Deliverables

AI BOM Schema

Organization-specific schema defining required components

BOM Templates

Pre-filled templates for common AI system types

BOM Repository

Centralized storage and management system

Generation Tools

Automated BOM creation integrated with development

Vendor Requirements

Standard BOM disclosure requirements for procurement

Incident Response Procedures

BOM-enabled vulnerability response workflows