6.3 Shared Services & Platform Teams

As the number of pods grows, certain capabilities become more efficient to centralize: MLOps infrastructure, model serving, governance tooling, and specialized expertise. Platform teams provide these shared services, enabling product pods to focus on their unique value while building on common foundations. The challenge is delivering these services without creating the bottlenecks that shared services typically introduce.

The Platform Philosophy

Platform teams exist to accelerate product pods, not to control them. A well-designed platform is self-service: pods can use it without waiting for platform team help. The platform team's success is measured by pod velocity, not by how busy the platform team is.

The Platform Team Model

Platform vs. Product Teams

Platform teams operate differently from product pods:

Dimension	Product Pod	Platform Team
Customer	External users, business stakeholders	Other pods (internal customers)
Success Metric	Business outcomes (revenue, efficiency)	Pod adoption, velocity, satisfaction
Ownership Model	Single AI product cradle-to-grave	Shared capabilities used by many
Autonomy	High autonomy within guardrails	Must maintain compatibility, standards
Prioritization	STO decides based on product needs	Balances needs across all pods

Self-Service Principle

Effective platforms minimize the need for human intervention:

Level 1: Documentation

Comprehensive docs that answer most questions without asking anyone

Level 2: Automation

Self-service tools that let pods provision what they need

Level 3: Guardrails

Automated checks that enable safe self-service

Level 4: Support

Human help only for edge cases and escalations

Common AI Platform Services

MLOps Platform

Infrastructure for building, training, and deploying models:

Capability	What It Provides	Self-Service Level
Experiment Tracking	MLflow or similar for tracking experiments	Full self-service
Training Infrastructure	GPU clusters, distributed training	Self-service provisioning
Model Registry	Versioned model storage and metadata	Full self-service
Model Serving	Scalable inference infrastructure	Self-service deployment
Feature Store	Shared feature definitions and serving	Self-service with review for new features
Monitoring	Model performance and drift monitoring	Pre-configured dashboards

Data Platform

Infrastructure for data access and management:

Data Catalog: Discoverable inventory of available data
Data Access: Governed access to training and inference data
Data Quality: Automated quality monitoring and alerting
Privacy Controls: PII detection, anonymization, consent management
Data Pipelines: Reusable ETL/ELT frameworks

Evaluation Platform

Tools for assessing model quality and fairness:

Fairness Testing: Automated bias evaluation against multiple metrics
Performance Benchmarking: Standardized evaluation suites
A/B Testing: Infrastructure for production experiments
Human Evaluation: Tools for collecting human judgments

Governance Services

Centralized Governance Capabilities

Some governance functions are more efficient when centralized:

Ethics Liaison Pool

Central management of Ethics Liaisons, ensuring consistent training and shared practices across pods.

Model Card System

Standardized tooling for creating, maintaining, and reviewing Model Cards across all pods.

Risk Assessment Tools

Consistent frameworks and tools for risk tier classification and assessment.

Audit Support

Central coordination of internal and external audits, shared evidence collection.

Governance Guardrail Implementation

Central teams implement automated guardrails used by all pods:

Guardrail	Implementation	Ownership
Fairness testing	Shared library integrated into CI/CD	Evaluation Platform team
PII detection	Automated scanning in data pipelines	Data Platform team
Model Card validation	Pre-deploy completeness check	Governance Services
Security scanning	Dependency and code scanning	Security team

Operating Model for Platform Teams

Service Level Agreements

Platform teams define SLAs that pods can rely on:

Sample Platform SLAs

Availability: 99.9% uptime for production model serving
Provisioning: New training environment within 4 hours self-service
Support Response: P1 issues acknowledged within 15 minutes
Feature Requests: Initial triage within 1 week

Prioritization Framework

Platform teams must balance competing pod requests:

Collect Demand

Regular intake process for pod requests. Track patterns to identify common needs.

Assess Impact

How many pods benefit? How much time saved? What's the strategic importance?

Balance Portfolio

Mix of quick wins (immediate pod help), investments (long-term capability), and maintenance.

Transparent Communication

Share roadmap with pods so they can plan. Be honest about what won't make the cut.

Avoiding Platform Traps

Common Platform Anti-Patterns

The Bottleneck: Platform team becomes gatekeeper that pods must wait for
The Ivory Tower: Platform builds what's technically elegant, not what pods need
The One-Size-Fits-All: Forces all pods into identical patterns regardless of needs
The Overbuilder: Builds comprehensive platforms before demand exists
The Understaffed: Platform can't keep up, forcing pods to work around it

Success Metrics for Platform Teams

90%

Self-Service Resolution

4.5/5

Pod Satisfaction Score

Pod Velocity Improvement

<4hr

Avg Provisioning Time

Platform as Product

The most successful platform teams treat their platforms as products, with pods as customers. They conduct user research, measure satisfaction, iterate based on feedback, and compete (in a sense) with the alternative of pods building their own solutions. This product mindset prevents platforms from becoming bureaucratic obstacles and keeps them focused on enabling pod success.