3.2 Agile for AI: Sprints, Epics, and Experiments

Standard Agile methodologies were designed for deterministic software development—where writing code produces predictable outputs. AI development is fundamentally different: experiments may fail, models may not converge, and "done" is often a moving target. AI Innovations adapt Agile practices to accommodate the inherent uncertainty of AI work while maintaining the discipline of iterative delivery.

The Core Adaptation

Traditional Agile assumes that committed work can be completed with reasonable confidence. AI work includes experiments where the outcome is genuinely uncertain. The AI Innovation approach treats experiments as first-class work items with their own planning and completion criteria, separate from deterministic engineering tasks.

Why AI Development is Different

The Uncertainty Dimension

AI work contains inherent uncertainty that traditional software development does not:

Traditional Software AI Development
Requirements can be fully specified Performance targets may be aspirational
Implementation is largely deterministic Model training has stochastic elements
"Done" is clearly defined "Good enough" requires judgment
Bugs are fixable Model limitations may be inherent
Effort estimation is reasonable Experiments may succeed or fail unpredictably

Types of AI Work

AI pods handle three distinct types of work, each with different planning characteristics:

Engineering Work

Deterministic tasks with predictable outcomes: building pipelines, creating APIs, implementing monitoring, writing tests.

Planning approach: Standard Agile estimation

Research Experiments

Exploratory work with uncertain outcomes: trying new architectures, testing hypotheses, exploring data patterns.

Planning approach: Time-boxed, outcome uncertain

Model Iteration

Incremental improvement work: tuning hyperparameters, adding features, addressing bias, fixing edge cases.

Planning approach: Hybrid—effort known, improvement uncertain

Adapted Sprint Structure

The Two-Track Sprint

AI Innovations run sprints with two parallel tracks that are planned and reviewed differently:

Track 1: Committed Work

Engineering tasks and well-understood model work that the pod commits to completing. These are planned with standard story points and included in velocity calculations.

  • Infrastructure and pipeline development
  • API and integration work
  • Known model improvements
  • Documentation and compliance tasks
  • Bug fixes and technical debt
Track 2: Experimental Work

Research and exploratory tasks where the outcome is uncertain. These are time-boxed (not point-estimated) and success is measured by learning, not delivery.

  • Architecture experiments
  • Data exploration
  • Hypothesis testing
  • Performance improvement attempts
  • Novel technique evaluation

Sprint Allocation

The balance between tracks varies by product maturity:

Product Stage Committed Work Experimental Work
Exploration 20-30% 70-80%
Development 50-60% 40-50%
Production 70-80% 20-30%
Mature Operations 80-90% 10-20%

Note: Even mature products should maintain some experimental capacity for continuous improvement and innovation.

Experiments as First-Class Work

Experiment Design

Every experiment should be structured with:

1

Hypothesis

A clear, testable statement of what we believe and why. "We believe [approach X] will improve [metric Y] by [amount Z] because [reasoning]."

2

Time Box

A strict limit on how long the experiment runs before evaluation. Experiments that drag on without conclusion waste resources and create uncertainty.

3

Success Criteria

Specific, measurable outcomes that would validate the hypothesis. What results would make us proceed? What results would make us stop?

4

Minimum Viable Experiment

The smallest version of the experiment that could validate the hypothesis. Don't build production infrastructure for an experiment.

5

Learning Documentation

Required output regardless of outcome. What did we learn? How does this inform next steps? This prevents "failed" experiments from being wasted effort.

Experiment Outcomes

Experiments have three possible outcomes, all of which are valuable:

Validated

Hypothesis confirmed. Results meet success criteria. Proceed to committed engineering work to productionize.

Next step: Create engineering stories

Invalidated

Hypothesis disproven. Results do not meet criteria. Document learnings and move on.

Next step: Archive and inform future decisions

Inconclusive

Results unclear. May need more data, different approach, or refined hypothesis.

Next step: Decide whether to extend, pivot, or stop

Celebrating "Failed" Experiments

An invalidated hypothesis is not a failure—it's valuable learning. Teams should celebrate experiments that quickly disprove bad ideas, saving the effort of building something that wouldn't work. The only true experiment failure is one that produces no actionable learning.

Pod Ceremonies

Daily Standup (15 minutes)

Brief synchronization focused on blockers and coordination:

Sprint Planning (2-4 hours)

Plan both committed work and experimental work for the sprint:

Activity Time Participants
Review goals and priorities 30 min STO leads, full pod
Committed work selection & estimation 60-90 min Full pod
Experiment design & time-boxing 30-60 min Full pod, ML focus
Governance/ethics considerations 15-30 min Ethics Liaison leads
Capacity check & commitment 15 min Full pod

Sprint Review (1-2 hours)

Demo completed work and share experiment learnings:

Sprint Retrospective (1 hour)

Continuous improvement focused on both delivery and AI-specific challenges:

AI-Specific Retro Questions
  • Did our experiments produce useful learning?
  • Were our time boxes appropriate?
  • Did we catch governance issues early enough?
  • Are we maintaining appropriate committed/experimental balance?
  • What technical debt is accumulating that we need to address?

Experiment Review (Weekly, 30 minutes)

AI-specific ceremony to manage experimental work:

Model Card Review (Monthly or on significant change)

Governance ceremony to keep the Model Card current:

Ceremony Calendar

A typical two-week sprint might look like:

Week 1 Week 2
Monday: Sprint Planning
Daily: Standup
Friday: Experiment Review
Daily: Standup
Thursday: Sprint Review
Friday: Retrospective