6.2 AI Bill of Materials (AI BOM)
Transparency Requirements for Upstream Models and Datasets
Just as a Software Bill of Materials (SBOM) documents software components for security and compliance, an AI Bill of Materials (AI BOM) provides comprehensive documentation of an AI system's components—models, datasets, algorithms, and their provenance. As AI systems increasingly build on foundation models and third-party components, understanding the full supply chain becomes essential for risk management, compliance, and accountability.
6.2.1 The Case for AI BOMs
Regulatory Drivers
EU AI Act
- Article 53 (GPAI): General-purpose AI providers must make available technical documentation including training data description
- Article 11 (High-Risk): Technical documentation must include "detailed description of the elements of the AI system and of the process for its development"
- Annex IV: Specifies required documentation including data requirements, design specifications, and development processes
US Executive Order 14110
- Calls for standards to document AI systems and their components
- Encourages adoption of risk management practices including supply chain transparency
- NIST AI RMF 1.0 includes supply chain considerations
Industry Standards
- NIST AI RMF: Map function includes understanding AI supply chain
- ISO/IEC 42001: AI Management System requires component documentation
- IEEE P2863: Organizational Governance of AI standard
Business Drivers
Risk Management
Understand vulnerabilities, biases, and failure modes inherited from upstream components
License Compliance
Track license obligations for models and training data throughout the supply chain
Incident Response
Quickly identify affected systems when vulnerabilities or issues are discovered upstream
Audit Readiness
Demonstrate compliance and due diligence to regulators, customers, and stakeholders
6.2.2 AI BOM Components
A comprehensive AI BOM documents all components that contribute to an AI system's behavior, from foundation models to training datasets to fine-tuning processes.
AI BOM Schema
1. System Identification
| Field | Description | Example |
|---|---|---|
| System Name | Unique identifier for the AI system | customer-service-assistant-v2.1 |
| Version | Semantic version of the system | 2.1.0 |
| BOM Version | Version of this BOM document | 1.0.0 |
| Created Date | When this BOM was generated | 2025-01-15T10:30:00Z |
| Owner | Team/individual responsible | AI Platform Team |
| Risk Classification | EU AI Act risk tier | Limited Risk |
2. Model Components
| Field | Description | Required |
|---|---|---|
| Foundation Model | Base model used (name, version, provider) | Yes |
| Model Architecture | Architecture type and parameters | Yes |
| Model License | License governing model use | Yes |
| Fine-Tuning Details | Custom training applied | If applicable |
| Adapter/LoRA | Parameter-efficient fine-tuning components | If applicable |
| Quantization | Compression/optimization applied | If applicable |
| Model Card Reference | Link to full model documentation | Yes |
3. Training Data Components
| Field | Description | Required |
|---|---|---|
| Dataset Name | Identifier for each training dataset | Yes |
| Data Source | Origin (vendor, public, internal) | Yes |
| Data Type | Text, image, audio, structured, etc. | Yes |
| License | License governing data use | Yes |
| Personal Data | Whether PII is present | Yes |
| Collection Method | How data was gathered | Yes |
| Date Range | Temporal coverage of data | Yes |
| Quality Metrics | Data quality scores/assessments | Recommended |
| Bias Assessment | Known bias or representation issues | Recommended |
| Datasheet Reference | Link to full dataset documentation | Recommended |
4. Processing Pipeline Components
| Field | Description | Required |
|---|---|---|
| Preprocessing | Data cleaning, transformation steps | Yes |
| Feature Engineering | Feature extraction and selection | If applicable |
| Training Framework | ML framework and version | Yes |
| Training Infrastructure | Hardware, cloud provider | Recommended |
| Post-processing | Output filtering, guardrails | Yes |
| RAG Components | Retrieval systems and knowledge bases | If applicable |
5. Third-Party Dependencies
| Field | Description | Required |
|---|---|---|
| API Dependencies | External AI APIs called | Yes |
| Software Libraries | ML/AI libraries and versions | Yes |
| Vendor Components | Third-party AI components integrated | Yes |
| SBOM Reference | Link to software bill of materials | Recommended |
6. Safety & Compliance Components
| Field | Description | Required |
|---|---|---|
| Guardrails | Content filters, safety systems | Yes |
| Human Oversight | HITL/HOTL mechanisms | For high-risk |
| Monitoring | Drift, bias, performance monitoring | Yes |
| Watermarking | Content authenticity mechanisms | If applicable |
6.2.3 AI BOM Template
AI BOM JSON Schema
{
"$schema": "https://aibom.example.org/schema/v1.0",
"bomFormat": "AI-BOM",
"specVersion": "1.0",
"serialNumber": "urn:uuid:3e671687-395b-41f5-a30f-a58921a69b79",
"version": 1,
"metadata": {
"timestamp": "2025-01-15T10:30:00Z",
"tools": [
{
"vendor": "Acme Corp",
"name": "AI BOM Generator",
"version": "1.0.0"
}
],
"authors": [
{
"name": "AI Platform Team",
"email": "ai-platform@acme.com"
}
],
"component": {
"type": "ai-system",
"name": "customer-service-assistant",
"version": "2.1.0"
}
},
"aiSystem": {
"name": "Customer Service Assistant",
"version": "2.1.0",
"description": "AI-powered customer service chatbot for product support",
"intendedUse": "Answering customer questions about products and services",
"riskClassification": "limited-risk",
"owner": {
"name": "Customer Experience Team",
"contact": "cx-team@acme.com"
}
},
"models": [
{
"id": "model-001",
"name": "claude-3-sonnet",
"version": "20240229",
"provider": "Anthropic",
"type": "foundation-model",
"architecture": "transformer",
"parameters": "undisclosed",
"license": {
"id": "Anthropic-Commercial",
"url": "https://anthropic.com/legal/aup"
},
"modelCard": "https://anthropic.com/claude-3-sonnet-model-card",
"relationship": "api-dependency"
},
{
"id": "model-002",
"name": "product-classifier",
"version": "1.3.0",
"provider": "internal",
"type": "fine-tuned",
"architecture": "bert-base",
"parameters": "110M",
"baseModel": "bert-base-uncased",
"license": {
"id": "Apache-2.0"
},
"modelCard": "internal://model-cards/product-classifier-v1.3",
"relationship": "internal-component"
}
],
"datasets": [
{
"id": "dataset-001",
"name": "customer-support-logs",
"version": "2024-Q4",
"source": "internal",
"type": "text",
"size": "2.3M conversations",
"dateRange": {
"start": "2022-01-01",
"end": "2024-12-31"
},
"license": {
"id": "proprietary-internal"
},
"personalData": true,
"personalDataDetails": {
"types": ["names", "email addresses", "account numbers"],
"processing": "pseudonymized",
"lawfulBasis": "legitimate-interest"
},
"biasAssessment": {
"performed": true,
"date": "2024-11-15",
"findings": "Slight overrepresentation of English-speaking customers",
"mitigations": "Augmented with translated conversations"
},
"datasheet": "internal://datasheets/customer-support-logs-2024"
},
{
"id": "dataset-002",
"name": "product-catalog",
"version": "2025-01",
"source": "internal",
"type": "structured",
"size": "50,000 products",
"license": {
"id": "proprietary-internal"
},
"personalData": false
}
],
"pipeline": {
"preprocessing": [
{
"step": "pii-redaction",
"tool": "Microsoft Presidio",
"version": "2.2.0"
},
{
"step": "text-normalization",
"tool": "internal-normalizer",
"version": "1.0.0"
}
],
"training": {
"framework": "PyTorch",
"version": "2.1.0",
"infrastructure": "AWS SageMaker"
},
"inference": {
"framework": "vLLM",
"version": "0.3.0",
"infrastructure": "AWS EKS"
},
"postProcessing": [
{
"step": "content-filter",
"tool": "internal-guardrails",
"version": "2.0.0"
},
{
"step": "pii-filter",
"tool": "Microsoft Presidio",
"version": "2.2.0"
}
]
},
"ragComponents": [
{
"id": "rag-001",
"name": "product-knowledge-base",
"type": "vector-store",
"provider": "Pinecone",
"embeddingModel": "text-embedding-3-small",
"sourceDatasets": ["dataset-002"],
"updateFrequency": "daily"
}
],
"guardrails": [
{
"id": "guardrail-001",
"name": "topic-filter",
"type": "input-filter",
"description": "Blocks off-topic and prohibited queries",
"provider": "internal"
},
{
"id": "guardrail-002",
"name": "pii-output-filter",
"type": "output-filter",
"description": "Prevents PII leakage in responses",
"provider": "Microsoft Presidio"
},
{
"id": "guardrail-003",
"name": "hallucination-check",
"type": "output-validation",
"description": "Validates responses against RAG sources",
"provider": "internal"
}
],
"thirdPartyDependencies": [
{
"name": "Anthropic Claude API",
"type": "api",
"version": "2024-02-29",
"license": "commercial",
"vendorAssessment": "completed",
"vendorAssessmentDate": "2024-06-15"
},
{
"name": "Pinecone",
"type": "service",
"version": "latest",
"license": "commercial",
"vendorAssessment": "completed"
}
],
"compliance": {
"euAiActClassification": "limited-risk",
"transparencyMeasures": [
"User informed of AI interaction at conversation start",
"AI disclosure in terms of service"
],
"humanOversight": {
"type": "human-over-the-loop",
"description": "Human review of flagged conversations; escalation paths"
},
"monitoring": {
"drift": true,
"bias": true,
"performance": true,
"frequency": "daily"
}
},
"vulnerabilities": [
{
"id": "vuln-001",
"description": "Base model may generate incorrect information without RAG grounding",
"severity": "medium",
"mitigation": "RAG-based response generation with citation requirements"
}
]
}
6.2.4 AI BOM Management
Lifecycle Integration
Development
- Generate initial BOM during design
- Document all component selections
- Track license compliance
- Version control BOM with code
Deployment
- Validate BOM completeness
- Include BOM in deployment package
- Register in central repository
- Link to model registry entry
Operations
- Update BOM on changes
- Monitor for upstream vulnerabilities
- Track dependency updates
- Audit BOM accuracy periodically
Retirement
- Archive BOM per retention policy
- Document decommissioning
- Maintain for audit purposes
BOM Repository Requirements
Centralized Storage
- Single source of truth for all AI BOMs
- Searchable inventory of components
- Version history and change tracking
- Access controls and audit logging
Automated Generation
- CI/CD integration for BOM updates
- Automatic dependency scanning
- Validation against schema
- Completeness checks
Vulnerability Tracking
- Integration with vulnerability databases
- Alerts for affected systems
- Impact assessment automation
- Remediation tracking
Compliance Reporting
- Generate regulatory documentation
- License compliance reports
- Audit-ready exports
- Supply chain visualization
Vendor BOM Requirements
Information to Request from AI Vendors
| Component | Minimum Disclosure | Preferred Disclosure |
|---|---|---|
| Foundation Model | Model family, approximate size, training cutoff | Full model card with architecture details |
| Training Data | High-level source categories | Detailed data card with demographics and bias analysis |
| Fine-Tuning | Whether custom training occurred | Fine-tuning methodology and data sources |
| Safety Measures | General safety approach | System card with red teaming results |
| Third-Party Components | Acknowledgment of dependencies | Nested BOM for upstream components |
| Licenses | Usage license terms | Full license chain including training data |
6.2.5 Supply Chain Risk Management
AI Supply Chain Risks
Upstream Risks
| Risk | Description | BOM-Enabled Mitigation |
|---|---|---|
| Foundation Model Bias | Biases in base models propagate downstream | Track base model; monitor for bias; evaluate alternatives |
| Training Data Contamination | Problematic data in upstream training | Document data sources; assess vendor practices |
| License Violations | Upstream license terms violated | Full license chain documentation; compliance checking |
| Security Vulnerabilities | Exploits in models or dependencies | Component inventory; vulnerability monitoring |
| Vendor Dependency | Critical reliance on single provider | Dependency mapping; alternative identification |
Incident Response with BOM
When a vulnerability or issue is discovered in an upstream component:
- Query BOM repository to identify all systems using the affected component
- Assess impact based on how the component is used in each system
- Prioritize remediation based on system risk classification
- Track remediation across all affected systems
- Update BOMs with patched component versions
Implementation Checklist
AI BOM Implementation Steps
Schema & Standards
Tooling & Infrastructure
Process Integration
Governance
Key Deliverables
AI BOM Schema
Organization-specific schema defining required components
BOM Templates
Pre-filled templates for common AI system types
BOM Repository
Centralized storage and management system
Generation Tools
Automated BOM creation integrated with development
Vendor Requirements
Standard BOM disclosure requirements for procurement
Incident Response Procedures
BOM-enabled vulnerability response workflows