6.1 Vendor Due Diligence

Third-Party AI Procurement Assessment Framework

The Procurement Imperative

Organizations increasingly rely on third-party AI systems—from foundation model APIs to embedded AI features in enterprise software. Under the EU AI Act, deployers of high-risk AI systems share regulatory responsibility with providers. Comprehensive vendor due diligence is not optional; it's a legal and operational necessity that determines whether third-party AI can be safely and compliantly integrated into your operations.

6.1.1 Assessing Vendor Model Training Data

Training data represents the foundation of any AI system's behavior, biases, and capabilities. Vendors must provide sufficient transparency about training data to enable deployers to assess fitness for purpose and regulatory compliance.

EU AI Act Requirements for Deployers

  • Article 26(1): Deployers must use high-risk AI systems in accordance with instructions of use
  • Article 26(5): Deployers must ensure input data is relevant and sufficiently representative
  • Article 27: Deployers must conduct fundamental rights impact assessments before deployment
  • Implication: Deployers must understand vendor's training data to fulfill these obligations

Training Data Assessment Framework

1. Data Source Transparency

Question Why It Matters Red Flags
What data sources were used for training? Determines potential biases, legal exposure, domain suitability Vague answers; "proprietary" without detail; known problematic sources
Was web-scraped data included? Copyright risk; quality concerns; potential harmful content Large-scale web scraping without curation; Common Crawl without filtering
Are synthetic data sources used? May introduce model collapse; quality verification needed Heavy reliance on AI-generated training data
What human-labeled data was used? Labeler demographics affect model perspectives; labor practices Unknown labeling workforce; no quality controls described

2. Legal & Compliance Status

Question Why It Matters Red Flags
What licenses cover the training data? Downstream liability for unlicensed use; copyright claims Unable to provide license documentation; "fair use" for commercial
How are opt-out requests handled? Regulatory compliance (EU AI Act, GDPR); lawsuit exposure No opt-out mechanism; non-compliance with robots.txt
Is personal data included in training? GDPR compliance; privacy risks; potential data extraction PII in training without consent; no anonymization process
What copyright clearance process exists? Infringement risk; ongoing litigation exposure Active lawsuits against vendor; no clearance process

3. Data Quality & Representativeness

Question Why It Matters Red Flags
What is the demographic distribution of training data? Bias risks; performance disparities across populations No demographic analysis; heavy skew toward specific groups
How were harmful/toxic samples handled? Output quality; safety; brand risk No filtering; reliance on post-hoc safety training only
What data quality controls were applied? Model reliability; garbage in, garbage out No quality metrics; automated-only filtering
Is training data current or dated? Knowledge recency; factual accuracy Old cutoff dates; no update mechanism

Training Data Documentation Request

Standard Information Request to Vendors

Include in RFP/vendor questionnaire:

  1. Data Sources Summary: Categories of data used (web, books, licensed, proprietary, synthetic) with approximate percentages
  2. Data Card/Datasheet: Formal documentation per Datasheets for Datasets standard
  3. Geographic & Linguistic Coverage: Regions and languages represented
  4. Demographic Representation: Analysis of human subjects in training data
  5. Temporal Range: Date range of training data; knowledge cutoff
  6. Licensing Summary: Overview of licenses governing training data
  7. Copyright Compliance: Process for respecting opt-outs and handling disputes
  8. Personal Data Handling: Presence of PII; anonymization measures; GDPR compliance
  9. Quality Control: Filtering, deduplication, and curation processes applied
  10. Known Limitations: Documented gaps, biases, or quality issues

6.1.2 Vendor Liability & Indemnification Clauses

AI procurement requires careful attention to contractual risk allocation. Unlike traditional software, AI systems can cause harms that are difficult to predict, attribute, and quantify. Contract terms must address these unique characteristics.

Key Contractual Considerations

1. Indemnification Scope

IP Indemnification

Standard: Vendor indemnifies against third-party IP claims arising from use of the AI system

Key Issues:

  • Does it cover training data infringement claims?
  • Are AI-generated outputs covered?
  • What about derivative works?
  • Carve-outs for customer inputs/prompts?

Negotiate for: Explicit coverage of training data IP; output indemnification

Regulatory Indemnification

Standard: Often absent or heavily limited

Key Issues:

  • Who bears EU AI Act compliance risk?
  • Coverage for regulatory fines?
  • Data protection violation coverage?

Negotiate for: Vendor warranty of compliance; shared responsibility clarity

Harm/Safety Indemnification

Standard: Typically excluded or heavily capped

Key Issues:

  • Coverage for AI-caused harm to end users?
  • Bias/discrimination claims?
  • Reputational damage?

Negotiate for: Reasonable coverage for foreseeable harms; clear liability allocation

2. Warranty Provisions

Warranty Type Typical Vendor Position Deployer Should Seek
Performance Warranties "As-is"; no accuracy guarantees Minimum accuracy thresholds; SLAs for critical metrics
Compliance Warranties Compliance with "applicable law" Specific EU AI Act compliance; GDPR compliance; sector-specific regulations
Security Warranties Industry-standard security Specific certifications (SOC 2, ISO 27001); penetration testing; vulnerability disclosure
Training Data Warranties Often absent No known infringement; lawful data collection; opt-out compliance
Bias/Fairness Warranties Usually excluded Testing documentation; fairness metrics; remediation commitments

3. Liability Caps & Exclusions

Typical Vendor Limitations
  • Liability capped at fees paid (often 12 months)
  • Exclusion of consequential/indirect damages
  • Exclusion of lost profits, data loss
  • Carve-outs for customer misuse
Deployer Negotiation Targets
  • Higher caps for IP and regulatory indemnity
  • Uncapped liability for gross negligence/willful misconduct
  • Data breach notification and remediation obligations
  • Clear definition of "misuse" carve-outs

Sample Contract Language

IP Indemnification (Deployer-Favorable)

"Provider shall defend, indemnify, and hold harmless Customer from any third-party claim alleging that (a) the AI System, (b) the training data used to develop the AI System, or (c) outputs generated by the AI System when used in accordance with the Documentation, infringe any intellectual property right. This indemnification extends to claims arising from Provider's use of web-scraped, licensed, or other third-party data in training."

Compliance Warranty (Deployer-Favorable)

"Provider warrants that the AI System, including its development, training, and operation, complies with (a) the EU Artificial Intelligence Act, including transparency and documentation requirements for [high-risk/general-purpose] AI systems, (b) the General Data Protection Regulation with respect to any personal data processed in training or inference, and (c) all applicable data protection and AI-specific regulations in jurisdictions where Customer is authorized to deploy the System."

Audit Rights

"Upon reasonable notice, Customer may audit, or engage a qualified third party to audit, Provider's compliance with this Agreement, including (a) security controls and certifications, (b) AI system documentation and model cards, (c) training data compliance and sourcing practices, and (d) bias testing and fairness assessment results. Provider shall cooperate with such audits and provide access to relevant personnel and documentation."

Incident Notification

"Provider shall notify Customer within 24 hours of discovering any (a) security incident affecting the AI System or Customer data, (b) material performance degradation or bias issue, (c) regulatory inquiry or enforcement action relating to the AI System, or (d) intellectual property claim relating to training data or outputs. Provider shall cooperate with Customer's incident response procedures."

6.1.3 Comprehensive Vendor Due Diligence Checklist

The following checklist provides a structured approach to evaluating AI vendors across all critical dimensions. Adapt depth of assessment to the risk level of the intended use case.

A. Company & Governance Assessment

B. Technical & Model Assessment

C. Training Data Assessment

D. Security Assessment

E. Privacy & Data Protection Assessment

F. Regulatory Compliance Assessment

G. Operational Assessment

H. Contractual Assessment

Vendor Risk Scoring Matrix

Assessment Area Weight Score (1-5) Weighted Score
Company & Governance 10% [____] [____]
Technical & Model Quality 15% [____] [____]
Training Data Compliance 15% [____] [____]
Security 20% [____] [____]
Privacy & Data Protection 15% [____] [____]
Regulatory Compliance 15% [____] [____]
Contractual Terms 10% [____] [____]
TOTAL SCORE 100% [____] / 5.0

Score Interpretation

4.0 - 5.0: Low Risk - Proceed with standard monitoring 3.0 - 3.9: Moderate Risk - Proceed with enhanced controls 2.0 - 2.9: High Risk - Proceed only with risk acceptance and mitigations Below 2.0: Unacceptable Risk - Do not proceed

Key Deliverables

Vendor Assessment Template

Standardized questionnaire for AI vendor evaluation

Risk Scoring Matrix

Weighted scoring system for vendor comparison

Contract Checklist

Required clauses for AI procurement agreements

Training Data Request

Standard information request for training data transparency

Sample Contract Language

Deployer-favorable clause templates

Ongoing Monitoring Plan

Periodic reassessment requirements for approved vendors