6.3 Shared Services & Platform Teams

As the number of pods grows, certain capabilities become more efficient to centralize: MLOps infrastructure, model serving, governance tooling, and specialized expertise. Platform teams provide these shared services, enabling product pods to focus on their unique value while building on common foundations. The challenge is delivering these services without creating the bottlenecks that shared services typically introduce.

The Platform Philosophy

Platform teams exist to accelerate product pods, not to control them. A well-designed platform is self-service: pods can use it without waiting for platform team help. The platform team's success is measured by pod velocity, not by how busy the platform team is.

The Platform Team Model

Platform vs. Product Teams

Platform teams operate differently from product pods:

Dimension Product Pod Platform Team
Customer External users, business stakeholders Other pods (internal customers)
Success Metric Business outcomes (revenue, efficiency) Pod adoption, velocity, satisfaction
Ownership Model Single AI product cradle-to-grave Shared capabilities used by many
Autonomy High autonomy within guardrails Must maintain compatibility, standards
Prioritization STO decides based on product needs Balances needs across all pods

Self-Service Principle

Effective platforms minimize the need for human intervention:

Level 1: Documentation

Comprehensive docs that answer most questions without asking anyone

Level 2: Automation

Self-service tools that let pods provision what they need

Level 3: Guardrails

Automated checks that enable safe self-service

Level 4: Support

Human help only for edge cases and escalations

Common AI Platform Services

MLOps Platform

Infrastructure for building, training, and deploying models:

Capability What It Provides Self-Service Level
Experiment Tracking MLflow or similar for tracking experiments Full self-service
Training Infrastructure GPU clusters, distributed training Self-service provisioning
Model Registry Versioned model storage and metadata Full self-service
Model Serving Scalable inference infrastructure Self-service deployment
Feature Store Shared feature definitions and serving Self-service with review for new features
Monitoring Model performance and drift monitoring Pre-configured dashboards

Data Platform

Infrastructure for data access and management:

Evaluation Platform

Tools for assessing model quality and fairness:

Governance Services

Centralized Governance Capabilities

Some governance functions are more efficient when centralized:

Ethics Liaison Pool

Central management of Ethics Liaisons, ensuring consistent training and shared practices across pods.

Model Card System

Standardized tooling for creating, maintaining, and reviewing Model Cards across all pods.

Risk Assessment Tools

Consistent frameworks and tools for risk tier classification and assessment.

Audit Support

Central coordination of internal and external audits, shared evidence collection.

Governance Guardrail Implementation

Central teams implement automated guardrails used by all pods:

Guardrail Implementation Ownership
Fairness testing Shared library integrated into CI/CD Evaluation Platform team
PII detection Automated scanning in data pipelines Data Platform team
Model Card validation Pre-deploy completeness check Governance Services
Security scanning Dependency and code scanning Security team

Operating Model for Platform Teams

Service Level Agreements

Platform teams define SLAs that pods can rely on:

Sample Platform SLAs
  • Availability: 99.9% uptime for production model serving
  • Provisioning: New training environment within 4 hours self-service
  • Support Response: P1 issues acknowledged within 15 minutes
  • Feature Requests: Initial triage within 1 week

Prioritization Framework

Platform teams must balance competing pod requests:

1

Collect Demand

Regular intake process for pod requests. Track patterns to identify common needs.

2

Assess Impact

How many pods benefit? How much time saved? What's the strategic importance?

3

Balance Portfolio

Mix of quick wins (immediate pod help), investments (long-term capability), and maintenance.

4

Transparent Communication

Share roadmap with pods so they can plan. Be honest about what won't make the cut.

Avoiding Platform Traps

Common Platform Anti-Patterns
  • The Bottleneck: Platform team becomes gatekeeper that pods must wait for
  • The Ivory Tower: Platform builds what's technically elegant, not what pods need
  • The One-Size-Fits-All: Forces all pods into identical patterns regardless of needs
  • The Overbuilder: Builds comprehensive platforms before demand exists
  • The Understaffed: Platform can't keep up, forcing pods to work around it

Success Metrics for Platform Teams

90%
Self-Service Resolution
4.5/5
Pod Satisfaction Score
2x
Pod Velocity Improvement
<4hr
Avg Provisioning Time

Platform as Product

The most successful platform teams treat their platforms as products, with pods as customers. They conduct user research, measure satisfaction, iterate based on feedback, and compete (in a sense) with the alternative of pods building their own solutions. This product mindset prevents platforms from becoming bureaucratic obstacles and keeps them focused on enabling pod success.