6.1 Vendor Due Diligence
Third-Party AI Procurement Assessment Framework
Organizations increasingly rely on third-party AI systems—from foundation model APIs to embedded AI features in enterprise software. Under the EU AI Act, deployers of high-risk AI systems share regulatory responsibility with providers. Comprehensive vendor due diligence is not optional; it's a legal and operational necessity that determines whether third-party AI can be safely and compliantly integrated into your operations.
6.1.1 Assessing Vendor Model Training Data
Training data represents the foundation of any AI system's behavior, biases, and capabilities. Vendors must provide sufficient transparency about training data to enable deployers to assess fitness for purpose and regulatory compliance.
EU AI Act Requirements for Deployers
- Article 26(1): Deployers must use high-risk AI systems in accordance with instructions of use
- Article 26(5): Deployers must ensure input data is relevant and sufficiently representative
- Article 27: Deployers must conduct fundamental rights impact assessments before deployment
- Implication: Deployers must understand vendor's training data to fulfill these obligations
Training Data Assessment Framework
1. Data Source Transparency
| Question | Why It Matters | Red Flags |
|---|---|---|
| What data sources were used for training? | Determines potential biases, legal exposure, domain suitability | Vague answers; "proprietary" without detail; known problematic sources |
| Was web-scraped data included? | Copyright risk; quality concerns; potential harmful content | Large-scale web scraping without curation; Common Crawl without filtering |
| Are synthetic data sources used? | May introduce model collapse; quality verification needed | Heavy reliance on AI-generated training data |
| What human-labeled data was used? | Labeler demographics affect model perspectives; labor practices | Unknown labeling workforce; no quality controls described |
2. Legal & Compliance Status
| Question | Why It Matters | Red Flags |
|---|---|---|
| What licenses cover the training data? | Downstream liability for unlicensed use; copyright claims | Unable to provide license documentation; "fair use" for commercial |
| How are opt-out requests handled? | Regulatory compliance (EU AI Act, GDPR); lawsuit exposure | No opt-out mechanism; non-compliance with robots.txt |
| Is personal data included in training? | GDPR compliance; privacy risks; potential data extraction | PII in training without consent; no anonymization process |
| What copyright clearance process exists? | Infringement risk; ongoing litigation exposure | Active lawsuits against vendor; no clearance process |
3. Data Quality & Representativeness
| Question | Why It Matters | Red Flags |
|---|---|---|
| What is the demographic distribution of training data? | Bias risks; performance disparities across populations | No demographic analysis; heavy skew toward specific groups |
| How were harmful/toxic samples handled? | Output quality; safety; brand risk | No filtering; reliance on post-hoc safety training only |
| What data quality controls were applied? | Model reliability; garbage in, garbage out | No quality metrics; automated-only filtering |
| Is training data current or dated? | Knowledge recency; factual accuracy | Old cutoff dates; no update mechanism |
Training Data Documentation Request
Standard Information Request to Vendors
Include in RFP/vendor questionnaire:
- Data Sources Summary: Categories of data used (web, books, licensed, proprietary, synthetic) with approximate percentages
- Data Card/Datasheet: Formal documentation per Datasheets for Datasets standard
- Geographic & Linguistic Coverage: Regions and languages represented
- Demographic Representation: Analysis of human subjects in training data
- Temporal Range: Date range of training data; knowledge cutoff
- Licensing Summary: Overview of licenses governing training data
- Copyright Compliance: Process for respecting opt-outs and handling disputes
- Personal Data Handling: Presence of PII; anonymization measures; GDPR compliance
- Quality Control: Filtering, deduplication, and curation processes applied
- Known Limitations: Documented gaps, biases, or quality issues
6.1.2 Vendor Liability & Indemnification Clauses
AI procurement requires careful attention to contractual risk allocation. Unlike traditional software, AI systems can cause harms that are difficult to predict, attribute, and quantify. Contract terms must address these unique characteristics.
Key Contractual Considerations
1. Indemnification Scope
IP Indemnification
Standard: Vendor indemnifies against third-party IP claims arising from use of the AI system
Key Issues:
- Does it cover training data infringement claims?
- Are AI-generated outputs covered?
- What about derivative works?
- Carve-outs for customer inputs/prompts?
Negotiate for: Explicit coverage of training data IP; output indemnification
Regulatory Indemnification
Standard: Often absent or heavily limited
Key Issues:
- Who bears EU AI Act compliance risk?
- Coverage for regulatory fines?
- Data protection violation coverage?
Negotiate for: Vendor warranty of compliance; shared responsibility clarity
Harm/Safety Indemnification
Standard: Typically excluded or heavily capped
Key Issues:
- Coverage for AI-caused harm to end users?
- Bias/discrimination claims?
- Reputational damage?
Negotiate for: Reasonable coverage for foreseeable harms; clear liability allocation
2. Warranty Provisions
| Warranty Type | Typical Vendor Position | Deployer Should Seek |
|---|---|---|
| Performance Warranties | "As-is"; no accuracy guarantees | Minimum accuracy thresholds; SLAs for critical metrics |
| Compliance Warranties | Compliance with "applicable law" | Specific EU AI Act compliance; GDPR compliance; sector-specific regulations |
| Security Warranties | Industry-standard security | Specific certifications (SOC 2, ISO 27001); penetration testing; vulnerability disclosure |
| Training Data Warranties | Often absent | No known infringement; lawful data collection; opt-out compliance |
| Bias/Fairness Warranties | Usually excluded | Testing documentation; fairness metrics; remediation commitments |
3. Liability Caps & Exclusions
Typical Vendor Limitations
- Liability capped at fees paid (often 12 months)
- Exclusion of consequential/indirect damages
- Exclusion of lost profits, data loss
- Carve-outs for customer misuse
Deployer Negotiation Targets
- Higher caps for IP and regulatory indemnity
- Uncapped liability for gross negligence/willful misconduct
- Data breach notification and remediation obligations
- Clear definition of "misuse" carve-outs
Sample Contract Language
IP Indemnification (Deployer-Favorable)
"Provider shall defend, indemnify, and hold harmless Customer from any third-party claim alleging that (a) the AI System, (b) the training data used to develop the AI System, or (c) outputs generated by the AI System when used in accordance with the Documentation, infringe any intellectual property right. This indemnification extends to claims arising from Provider's use of web-scraped, licensed, or other third-party data in training."
Compliance Warranty (Deployer-Favorable)
"Provider warrants that the AI System, including its development, training, and operation, complies with (a) the EU Artificial Intelligence Act, including transparency and documentation requirements for [high-risk/general-purpose] AI systems, (b) the General Data Protection Regulation with respect to any personal data processed in training or inference, and (c) all applicable data protection and AI-specific regulations in jurisdictions where Customer is authorized to deploy the System."
Audit Rights
"Upon reasonable notice, Customer may audit, or engage a qualified third party to audit, Provider's compliance with this Agreement, including (a) security controls and certifications, (b) AI system documentation and model cards, (c) training data compliance and sourcing practices, and (d) bias testing and fairness assessment results. Provider shall cooperate with such audits and provide access to relevant personnel and documentation."
Incident Notification
"Provider shall notify Customer within 24 hours of discovering any (a) security incident affecting the AI System or Customer data, (b) material performance degradation or bias issue, (c) regulatory inquiry or enforcement action relating to the AI System, or (d) intellectual property claim relating to training data or outputs. Provider shall cooperate with Customer's incident response procedures."
6.1.3 Comprehensive Vendor Due Diligence Checklist
The following checklist provides a structured approach to evaluating AI vendors across all critical dimensions. Adapt depth of assessment to the risk level of the intended use case.
A. Company & Governance Assessment
B. Technical & Model Assessment
C. Training Data Assessment
D. Security Assessment
E. Privacy & Data Protection Assessment
F. Regulatory Compliance Assessment
G. Operational Assessment
H. Contractual Assessment
Vendor Risk Scoring Matrix
| Assessment Area | Weight | Score (1-5) | Weighted Score |
|---|---|---|---|
| Company & Governance | 10% | [____] | [____] |
| Technical & Model Quality | 15% | [____] | [____] |
| Training Data Compliance | 15% | [____] | [____] |
| Security | 20% | [____] | [____] |
| Privacy & Data Protection | 15% | [____] | [____] |
| Regulatory Compliance | 15% | [____] | [____] |
| Contractual Terms | 10% | [____] | [____] |
| TOTAL SCORE | 100% | [____] / 5.0 | |
Score Interpretation
Key Deliverables
Vendor Assessment Template
Standardized questionnaire for AI vendor evaluation
Risk Scoring Matrix
Weighted scoring system for vendor comparison
Contract Checklist
Required clauses for AI procurement agreements
Training Data Request
Standard information request for training data transparency
Sample Contract Language
Deployer-favorable clause templates
Ongoing Monitoring Plan
Periodic reassessment requirements for approved vendors