4.3 Phase 3: Model Development & Training
Documentation standards, sustainability, and reproducibility requirements for responsible model development and training.
Key Takeaways
- Model Cards are required under EU AI Act Article 11 for high-risk AI systems
- Training a single large language model can emit over 300 tons of CO2 equivalent
- Reproducibility documentation is essential for audit trails and compliance
- System Cards extend Model Cards to document complete AI system behavior
4.3.1 Model Cards & System Cards (Documentation Standards)
Model Cards, introduced by Google researchers in 2018, are a widely adopted standard for AI model documentation. The EU AI Act Article 11 requires comprehensive technical documentation for high-risk AI systems, making Model Cards a regulatory compliance requirement.
Model Card vs. System Card
| Aspect | Model Card | System Card |
|---|---|---|
| Scope | Single ML model | Complete AI system (models + infrastructure + guardrails) |
| Focus | Model behavior and limitations | End-to-end system behavior and safety measures |
| When Used | Individual model releases | Product/application deployment |
| Primary Audience | ML engineers, researchers | Product teams, regulators, users |
Model Card Template
1. Model Details
2. Intended Use
3. Training Data
4. Evaluation Data
5. Performance Metrics
6. Fairness Considerations
7. Ethical Considerations
8. Caveats & Recommendations
System Card Components
System Cards extend Model Cards to document complete AI systems, including safety measures and deployment context:
System Architecture
- Component models and their interactions
- Data flows and dependencies
- Integration with other systems
- Deployment infrastructure
Safety Measures
- Input/output filters and guardrails
- Content moderation approaches
- Rate limiting and abuse prevention
- Fallback mechanisms
Human Oversight
- Human-in-the-loop decision points
- Escalation procedures
- Override capabilities
- Monitoring dashboards
Testing & Red Teaming
- Safety testing methodologies
- Red team findings and mitigations
- Adversarial testing results
- Edge case handling
EU AI Act Documentation Requirements
Article 11 requires high-risk AI systems to maintain technical documentation including: general description, design specifications, development process description, risk management measures, changes made throughout lifecycle, performance metrics, and interaction with other systems.
4.3.2 Energy Consumption & Sustainability Reporting (Green AI)
Large model training can consume electricity equivalent to hundreds of homes annually, making its environmental impact a real concern. Organizations must measure, report, and minimize the carbon footprint of their AI development activities.
Environmental Impact Context
Carbon Footprint Measurement
| Component | Measurement Approach | Tools |
|---|---|---|
| Training Compute | GPU-hours × power consumption × carbon intensity | CodeCarbon, ML CO2 Impact, Carbontracker |
| Data Center Energy | PUE × IT equipment energy × grid carbon intensity | Cloud provider sustainability dashboards |
| Data Transfer | Data volume × energy per byte transferred | Network monitoring tools |
| Inference | Queries × energy per inference × carbon intensity | APM tools with energy monitoring |
| Hardware Embodied | Manufacturing emissions amortized over hardware life | Hardware lifecycle assessments |
Sustainability Strategies
Efficient Architecture Selection
- Choose smaller models when accuracy permits
- Use distillation to compress large models
- Apply pruning and quantization techniques
- Consider sparse architectures
Transfer Learning & Fine-tuning
- Start from pre-trained models
- Fine-tune only necessary layers
- Use parameter-efficient fine-tuning (LoRA, adapters)
- Leverage foundation models where appropriate
Geographic & Temporal Optimization
- Train in regions with cleaner energy grids
- Schedule training during low-carbon periods
- Use cloud providers with renewable commitments
- Consider on-premise renewable-powered facilities
Efficient Experimentation
- Use smaller proxy datasets for hyperparameter search
- Implement early stopping for unpromising runs
- Share and reuse experiment results
- Document negative results to prevent duplication
Sustainability Reporting Template
Model Sustainability Report
Training Phase
Efficiency Measures
EU AI Act Sustainability Requirements
The EU AI Act requires providers of general-purpose AI models to document energy consumption during training, known or estimated energy consumption during use, and other resource usage. Organizations should begin tracking these metrics proactively.
4.3.3 Hyperparameter Tuning & Reproducibility Logs
Reproducibility supports scientific validity, regulatory compliance, and operational reliability. Complete documentation of the training process enables audit, debugging, and continuous improvement.
Reproducibility Requirements
Code Reproducibility
- Complete source code under version control
- Dependency specifications (exact versions)
- Container definitions (Docker/Singularity)
- Build and execution scripts
Data Reproducibility
- Data versioning and snapshot identification
- Preprocessing pipeline code
- Train/validation/test split specifications
- Data augmentation procedures
Training Reproducibility
- Random seeds and initialization
- All hyperparameter values
- Hardware specifications
- Training checkpoints
Environment Reproducibility
- Operating system and version
- CUDA/cuDNN versions
- Hardware driver versions
- Cloud instance specifications
Experiment Tracking Schema
Every training run should log the following metadata:
| Category | Fields | Format |
|---|---|---|
| Identification |
|
UUID, ISO 8601 |
| Code Reference |
|
SHA-256, URL |
| Data Reference |
|
Hash, JSON config |
| Hyperparameters |
|
JSON/YAML config |
| Environment |
|
System info capture |
| Metrics |
|
Time series, final values |
| Artifacts |
|
File references with hashes |
Hyperparameter Documentation
Hyperparameter Registry
For each hyperparameter, document:
- Name: Parameter identifier
- Value: Chosen value for production model
- Search Range: Range explored during tuning
- Selection Method: How value was chosen (grid search, Bayesian optimization, manual)
- Sensitivity: How much performance varies with this parameter
- Rationale: Why this value was selected
Version Control Best Practices
Model Versioning
Use semantic versioning for model releases:
- MAJOR: Breaking changes to input/output format
- MINOR: Performance improvements, new capabilities
- PATCH: Bug fixes, minor adjustments
model-name-v2.3.1
Immutable Artifacts
Treat trained models as immutable artifacts:
- Never modify a released model in place
- Store models with content-addressable hashes
- Maintain complete lineage from data to deployment
Audit Trail
Maintain complete audit trail:
- Who trained the model
- When it was trained
- What data was used
- What changes were made between versions
Implementation Guide
Model Development Phase Deliverables
MLOps Tooling Stack
| Capability | Open Source | Commercial |
|---|---|---|
| Experiment Tracking | MLflow, Aim, Sacred | Weights & Biases, Neptune, Comet |
| Model Registry | MLflow Model Registry, BentoML | AWS SageMaker, Azure ML, Databricks |
| Data Versioning | DVC, lakeFS, Pachyderm | Delta Lake, Databricks Unity Catalog |
| Carbon Tracking | CodeCarbon, Carbontracker | Cloud provider sustainability tools |
| Pipeline Orchestration | Kubeflow, Airflow, Prefect | AWS Step Functions, Azure ML Pipelines |