4.3 Phase 3: Model Development & Training
Documentation standards, sustainability considerations, and reproducibility requirements for responsible model development and training processes.
Key Takeaways
- Model Cards are required under EU AI Act Article 11 for high-risk AI systems
- Training a single large language model can emit over 300 tons of CO2 equivalent
- Reproducibility documentation is essential for audit trails and compliance
- System Cards extend Model Cards to document complete AI system behavior
4.3.1 Model Cards & System Cards (Documentation Standards)
Model Cards, introduced by Google researchers in 2018, have become the industry standard for AI model documentation. The EU AI Act Article 11 requires comprehensive technical documentation for high-risk AI systems, making Model Cards a regulatory compliance requirement.
Model Card vs. System Card
| Aspect | Model Card | System Card |
|---|---|---|
| Scope | Single ML model | Complete AI system (models + infrastructure + guardrails) |
| Focus | Model behavior and limitations | End-to-end system behavior and safety measures |
| When Used | Individual model releases | Product/application deployment |
| Primary Audience | ML engineers, researchers | Product teams, regulators, users |
Model Card Template
1. Model Details
2. Intended Use
3. Training Data
4. Evaluation Data
5. Performance Metrics
6. Fairness Considerations
7. Ethical Considerations
8. Caveats & Recommendations
System Card Components
System Cards extend Model Cards to document complete AI systems, including safety measures and deployment context:
System Architecture
- Component models and their interactions
- Data flows and dependencies
- Integration with other systems
- Deployment infrastructure
Safety Measures
- Input/output filters and guardrails
- Content moderation approaches
- Rate limiting and abuse prevention
- Fallback mechanisms
Human Oversight
- Human-in-the-loop decision points
- Escalation procedures
- Override capabilities
- Monitoring dashboards
Testing & Red Teaming
- Safety testing methodologies
- Red team findings and mitigations
- Adversarial testing results
- Edge case handling
EU AI Act Documentation Requirements
Article 11 requires high-risk AI systems to maintain technical documentation including: general description, design specifications, development process description, risk management measures, changes made throughout lifecycle, performance metrics, and interaction with other systems.
4.3.2 Energy Consumption & Sustainability Reporting (Green AI)
AI training's environmental impact has become a significant concern, with large model training consuming electricity equivalent to hundreds of homes annually. Organizations must measure, report, and minimize the carbon footprint of their AI development activities.
Environmental Impact Context
Carbon Footprint Measurement
| Component | Measurement Approach | Tools |
|---|---|---|
| Training Compute | GPU-hours × power consumption × carbon intensity | CodeCarbon, ML CO2 Impact, Carbontracker |
| Data Center Energy | PUE × IT equipment energy × grid carbon intensity | Cloud provider sustainability dashboards |
| Data Transfer | Data volume × energy per byte transferred | Network monitoring tools |
| Inference | Queries × energy per inference × carbon intensity | APM tools with energy monitoring |
| Hardware Embodied | Manufacturing emissions amortized over hardware life | Hardware lifecycle assessments |
Sustainability Strategies
Efficient Architecture Selection
- Choose smaller models when accuracy permits
- Use distillation to compress large models
- Apply pruning and quantization techniques
- Consider sparse architectures
Transfer Learning & Fine-tuning
- Start from pre-trained models
- Fine-tune only necessary layers
- Use parameter-efficient fine-tuning (LoRA, adapters)
- Leverage foundation models where appropriate
Geographic & Temporal Optimization
- Train in regions with cleaner energy grids
- Schedule training during low-carbon periods
- Use cloud providers with renewable commitments
- Consider on-premise renewable-powered facilities
Efficient Experimentation
- Use smaller proxy datasets for hyperparameter search
- Implement early stopping for unpromising runs
- Share and reuse experiment results
- Document negative results to prevent duplication
Sustainability Reporting Template
Model Sustainability Report
Training Phase
Efficiency Measures
EU AI Act Sustainability Requirements
The EU AI Act requires providers of general-purpose AI models to document energy consumption during training, known or estimated energy consumption during use, and other resource usage. Organizations should begin tracking these metrics proactively.
4.3.3 Hyperparameter Tuning & Reproducibility Logs
Reproducibility is foundational to scientific validity, regulatory compliance, and operational reliability. Complete documentation of the training process enables audit, debugging, and continuous improvement.
Reproducibility Requirements
Code Reproducibility
- Complete source code under version control
- Dependency specifications (exact versions)
- Container definitions (Docker/Singularity)
- Build and execution scripts
Data Reproducibility
- Data versioning and snapshot identification
- Preprocessing pipeline code
- Train/validation/test split specifications
- Data augmentation procedures
Training Reproducibility
- Random seeds and initialization
- All hyperparameter values
- Hardware specifications
- Training checkpoints
Environment Reproducibility
- Operating system and version
- CUDA/cuDNN versions
- Hardware driver versions
- Cloud instance specifications
Experiment Tracking Schema
Every training run should log the following metadata:
| Category | Fields | Format |
|---|---|---|
| Identification |
|
UUID, ISO 8601 |
| Code Reference |
|
SHA-256, URL |
| Data Reference |
|
Hash, JSON config |
| Hyperparameters |
|
JSON/YAML config |
| Environment |
|
System info capture |
| Metrics |
|
Time series, final values |
| Artifacts |
|
File references with hashes |
Hyperparameter Documentation
Hyperparameter Registry
For each hyperparameter, document:
- Name: Parameter identifier
- Value: Chosen value for production model
- Search Range: Range explored during tuning
- Selection Method: How value was chosen (grid search, Bayesian optimization, manual)
- Sensitivity: How much performance varies with this parameter
- Rationale: Why this value was selected
Version Control Best Practices
Model Versioning
Use semantic versioning for model releases:
- MAJOR: Breaking changes to input/output format
- MINOR: Performance improvements, new capabilities
- PATCH: Bug fixes, minor adjustments
model-name-v2.3.1
Immutable Artifacts
Treat trained models as immutable artifacts:
- Never modify a released model in place
- Store models with content-addressable hashes
- Maintain complete lineage from data to deployment
Audit Trail
Maintain complete audit trail:
- Who trained the model
- When it was trained
- What data was used
- What changes were made between versions
Implementation Guide
Model Development Phase Deliverables
MLOps Tooling Stack
| Capability | Open Source | Commercial |
|---|---|---|
| Experiment Tracking | MLflow, Aim, Sacred | Weights & Biases, Neptune, Comet |
| Model Registry | MLflow Model Registry, BentoML | AWS SageMaker, Azure ML, Databricks |
| Data Versioning | DVC, lakeFS, Pachyderm | Delta Lake, Databricks Unity Catalog |
| Carbon Tracking | CodeCarbon, Carbontracker | Cloud provider sustainability tools |
| Pipeline Orchestration | Kubeflow, Airflow, Prefect | AWS Step Functions, Azure ML Pipelines |