Umbra Group / Studio / Build Playbook
← Redesign v1.0 · Phase 3
DocumentBuild Playbook
ClassificationInternal
Versionv1.0 · 2026.04
OwnerUmbra Studio
Phase 3 · Build · Five Weeks

The Build Playbook.

Weekly-cadence operational playbook for Weeks 5–9 of a Lighthouse Sprint. Covers the Monday-planning / Tuesday–Thursday-build / Friday-demo rhythm, per-week theme progression, governance build-out, incident response drill, and the Gate 3 production readiness review — the phase where the Redesign Blueprint becomes a running system.

Paired with the Agent Spec Sheet, Redesign Blueprint, Risk Register, and the Governance Runbook templates. Read after the Redesign Playbook; keep open Weeks 5 through 9.

§01

How to use this playbook.

Audience · reading path · relationship to Redesign and Handoff

Build is the construction phase of a Lighthouse Sprint. Discovery produced evidence; Redesign produced a contract; Build produces a running system. Five working weeks — twenty-five working days — organised around a repeating weekly rhythm, one theme per week, and one public demonstration every Friday at 15:00 client-local time.

The playbook is written for the Sprint Lead, who now stands behind the Agent Engineer and the Governance Architect rather than in front of them. Most of the work in Build is authored by those two roles. The Sprint Lead’s job is to protect the cadence, run the demos, and enforce the gates — not to write agents. If the Sprint Lead is writing code after Day 25, something has gone wrong upstream.

If the Gate 2 Redesign Blueprint has any unresolved Conditional Go items, those must be closed in Week 5 before net-new agent construction begins. Build on an unstable contract and every week after will be spent re-scoping instead of shipping.

Inputs required on Day 21

Artifact from RedesignUsed for
Redesign Blueprint v1.0Canonical architecture — every agent, every data path, every governance component.
Agent Spec Sheets (one per agent)Contract-level specification each agent is built against. Build Lead refuses to start construction if a spec is missing fields.
Governance Wrapper designTells the Governance Architect which components to instantiate, in what order, for which agents.
Risk Register v1.0Build-week risks become live; treatment plans activate as construction begins.
Integration mapIdentifies the client systems that must be read from or written to — and thus which credentials / access grants are needed in Week 5 Day 1.
Gate 2 decision memoIncluding any Conditional Go items that must be closed by Week 5 Friday.
Pattern candidates (from Discovery / Redesign)Tagged USP-XXX in agent specs — confirmed or abandoned during Build.
§ Authorship shift
In Redesign, documents were the deliverable. In Build, the system is the deliverable. Documents become secondary artifacts — updated for governance and handoff, but no longer the point. If the team is still writing specs in Week 7, the sprint is in trouble.

Reading order

Read §02–§05 before Day 21. §06 is read by the Governance Architect on Day 21 and revisited at every Friday demo. §07 (Incident Response Drill) must be read by the full Studio team in Week 8 — it is the prerequisite for Gate 3. §08 is Gate 3 itself; §09 is the Indietheka worked example; §10 is the weekly checklist the Sprint Lead runs against every Monday morning.

§02

Build at a glance.

Five-week arc · per-week theme · primary deliverables

Five weeks, twenty-five working days. Each week has one dominant theme — infrastructure, first agents, scale-out, governance hardening, readiness — and one Friday demo that makes that week’s progress visible to the Workflow Owner and the Executive Sponsor.

Themes are not rigid sequences. Week 6 and Week 7 overlap; agent construction continues through Week 8. But the centre of gravity shifts each week, and the Friday demo is scoped to the theme, so the sprint has a legible shape for stakeholders who are not sitting in standup.

The five-week arc

WeekThemeDemo focusExit state
Week 5 · Days 21–25Infrastructure & skeletonCredentials work; first agent stub runs end-to-end on a trivial inputEnvironment stood up, spreadsheet-as-state-machine live, one agent executing in supervised mode
Week 6 · Days 26–30First production agentsTwo agents on real client data; baseline comparison vs. human outputPriority-1 bottleneck addressed; Risk Register items for those agents discharged or documented
Week 7 · Days 31–35Scale-out & integrationFull agent roster running together; integration surface tested6–10 agents operating in supervised mode; data paths between agents verified
Week 8 · Days 36–40Governance hardeningAlerts firing to the right humans; rollback demonstrated live; audit log reviewedAll 6 governance components instrumented; incident response drill completed
Week 9 · Days 41–45Production readinessGate 3 review — the system runs for 72 hours with zero Sprint Lead interventionsGate 3 passed; system enters Handoff under Workflow Owner authority

Deliverables produced during Build

  1. Working agent roster — 5–10 named agents, each matching its Agent Spec, running on real client data by end of Week 7. Demonstrated Fridays 5–9.
  2. Governance Wrapper v1.0 — all 6 components instrumented and verified through drill. Wired in Week 5, hardened Week 8.
  3. State spreadsheet (USP-009 pattern) — the system-of-record the agents read from and write to; editable by the Workflow Owner. Stood up Day 22, populated with real rows by Week 6.
  4. Risk Register v2.0 — treatment outcomes recorded; new risks surfaced by construction are added and scored. Updated each Friday demo.
  5. Incident Response Drill report — transcript and remediation actions from the Week 8 drill. Required for Gate 3.
  6. Pattern Library candidates — patterns tagged during Build, confirmed or abandoned by Gate 3. Tracked in the Pattern Extraction Tracker.
  7. Gate 3 Production Readiness deck (lighthouse-production-readiness.docx) — the deliverable that carries the sprint into Handoff. Authored Week 9, presented Day 45.
§ Scope discipline
If Gate 2 contracted 10 agents, Build delivers 10 agents or fewer — never more. New agent ideas that emerge in Week 7 are captured in the Pattern Library or the post-sprint backlog, not added to this sprint’s roster. Scope creep in Build is the single most common reason a sprint fails Gate 3.
§03

The weekly cadence.

Monday planning · Tuesday–Thursday build · Friday demo

Every week in Build runs the same rhythm. The predictability is a feature — it lets the Workflow Owner plan their own calendar around the sprint, and it tells the Agent Engineer exactly when to start, when to go heads-down, and when to present.

Do not skip demos. Do not skip Monday planning. If a Friday demo would be embarrassing because the week didn’t produce enough, hold it anyway and name the miss. A missed demo that goes on the record is more valuable than a skipped demo that quietly rolls forward into next week.

Monday

Plan the week.

One meeting, 60 minutes, full Studio team plus Workflow Owner. Review last Friday’s demo, confirm the week’s theme, split the agent work into ownership slots, surface any governance items. Output: a named list of deliverables with owners, posted in the shared channel by 11:00.

60 min · 09:30–10:30 client-local
Tue–Thu

Build heads down.

Deep-work days. No meetings longer than 30 minutes. Agent Engineer writes and tests; Governance Architect instruments components; Sprint Lead clears blockers and keeps stakeholder loop asynchronous. One 20-minute standup at 09:30; otherwise, leave the team alone.

72 hrs · uninterrupted construction
Friday

Demonstrate what shipped.

One meeting, 45 minutes, full Studio team plus Workflow Owner plus Executive Sponsor (non-mandatory but invited every week). Live demo, no slides. Decisions logged, Risk Register updated, Pattern Library reviewed, Monday’s plan foreshadowed.

45 min · 15:00–15:45 client-local

Monday planning · detailed agenda

BlockDurationContent
Retrospective0–10 minWhat shipped Friday? What went wrong? What surprised the team? No blame — just observed facts. Two to three items maximum; extended retro happens at Gate 3.
Week theme recap10–15 minSprint Lead names this week’s theme (see §04) and the Friday-demo target. Workflow Owner confirms business-side availability.
Agent ownership split15–35 minAgent Engineer reads out the week’s agent work; Governance Architect reads out the week’s wrapper work. Each item gets an owner and an estimated Thursday exit state.
Risk & blocker scan35–50 minWalk the Risk Register for live items. Any blockers that require client-side action are escalated to the Workflow Owner by 11:00 with a date.
Demo target & close50–60 minAgree what Friday’s demo will show. Sprint Lead posts the week-plan note in the shared channel before end of meeting.

Tuesday–Thursday standup · detailed agenda

Twenty minutes, 09:30, full Studio team only. Workflow Owner is welcome but not obligated. Each participant answers three questions in under 3 minutes:

QuestionWhat a good answer sounds like
What did you ship yesterday?Concrete, demonstrable; “I wired the RSS poller to write into the state spreadsheet and I can show a row” — not “I worked on the poller.”
What will you ship today?Named output, estimated duration, explicit definition of done — “I’ll complete the review-queue pagination; done means the queue renders for 200 rows without time-out.”
What are you blocked on?Specific blocker with a specific owner. If the owner is client-side, Sprint Lead takes the item and escalates by 11:00.
§ Calendar defence
Tuesday–Thursday are deep-work days for the Agent Engineer and the Governance Architect. The Sprint Lead’s job is to refuse every meeting that would cut into their morning. Stakeholder calls, internal reviews, and client check-ins are scheduled Monday afternoon or Friday morning — not Tuesday through Thursday before 14:00.
§04

Week-by-week themes.

What each week focuses on · what the Friday demo proves

The five Build weeks are sequenced so that each week’s Friday demo proves a specific capability and unlocks the next week’s work. Week 5 proves that plumbing works; Week 6 proves that one real agent beats the baseline; Week 9 proves the system runs without a Studio driver in the seat. Read each week’s theme before the Monday planning meeting.

Week 5Days 21–25 Infrastructure & skeleton. Theme 01 · Plumbing

Week 5 is about plumbing, not agents. The team stands up the environment, instantiates the spreadsheet-as-state-machine (USP-009), wires the first credentials, and writes the thinnest possible first agent — one that does almost nothing but proves the pipeline is alive. If Week 5 tries to ship a real production agent, it will skip the infrastructure work, and the sprint will pay for it in Week 7.

Monday planning focus
  • Credentials inventory — every API key, service account, OAuth flow needed for Week 6. Technical Counterpart ownership. Must be resolved by Friday.
  • State schema lock — the spreadsheet columns that will persist through the whole sprint. Once locked, schema changes require a Sprint Lead sign-off.
  • First-agent choice — pick the agent with the smallest useful surface area, not the highest business value. Usually a read-only poller or fetcher.
Thursday exit state
  • Development environment live; team can pull, run, and test locally.
  • State spreadsheet exists with locked schema; at least 5 columns implemented.
  • First agent stub executes end-to-end on a trivial input and writes at least one row to the state spreadsheet.
  • Logging pipeline wired; every agent invocation writes a log record readable by the Governance Architect.
Friday demo target

“Watch this agent run, write a row, and log its invocation.” The demo does not need to be impressive — it needs to prove the plumbing exists. Workflow Owner watches; Executive Sponsor is not required for this demo.

Common Week-5 pitfalls
  • Building two agents in parallel before infrastructure is proven. Resist this — Week 6 will go faster for it.
  • Skipping the state spreadsheet in favour of a database. USP-009 exists because the Workflow Owner has to be able to read and edit the state; databases without a UI layer break that property.
  • Leaving credentials to Week 6. Clients take 2–5 business days to provision access; request on Day 21.
Week 6Days 26–30 First production agents. Theme 02 · Real data

Week 6 is where the sprint stops being infrastructure and starts being a product. Two agents come online against real client data — typically the agent addressing the highest-priority bottleneck from Discovery, plus a supporting agent. The Friday demo shows a side-by-side comparison of agent output against the Discovery-era human baseline, and the Workflow Owner renders an opinion.

Monday planning focus
  • Agent selection — typically Priority-1 bottleneck (from Redesign Blueprint) plus one supporting agent it depends on. Must both be A-category or B-category allocations.
  • Output schema — what exactly does a “successful” agent run produce? Define before construction, not after.
  • Baseline comparison protocol — how will the Workflow Owner judge agent output against the Discovery baseline? Usually a small hand-scored sample (10–20 runs).
Thursday exit state
  • Two agents matching their Agent Specs; both running against real client data.
  • At least 20 agent runs executed end-to-end; output reviewed by Agent Engineer for quality.
  • Governance Wrapper v0.5 — basic logging and audit trail instrumented for both agents.
  • Baseline comparison material prepared for Friday demo.
Friday demo target

“Here is agent output alongside the Discovery baseline. Workflow Owner, what do you see?” This is the first demo where the Workflow Owner’s judgment is the headline. Expected outcome: green-light both agents, or flag quality gaps for Week 7.

Common Week-6 pitfalls
  • Over-polishing the first agent. Ship at 80% quality; Week 7 will tune it.
  • Demoing without a baseline. The demo needs the Discovery numbers on screen; otherwise it’s “look at the output” theatre with no decision surface.
  • Not running enough samples. 20 runs is the floor for a credible quality read; fewer and the Workflow Owner can’t reason about it.
Week 7Days 31–35 Scale-out & integration. Theme 03 · Roster live

Week 7 is the widest week: the remaining agents come online, the integration surface between agents is exercised, and the system runs as a roster for the first time rather than as a set of isolated pieces. This is where the sprint earns or loses its 7× throughput claim.

Monday planning focus
  • Remaining agent sequence — order matters. Dependencies first, dependents second. The Redesign Blueprint’s integration map is the reference.
  • Integration checkpoints — at least one mid-week integration run where all live agents execute against the state spreadsheet simultaneously.
  • Quality tune-up — any Week-6 agent that needed a tune-up gets a named owner and a Thursday done-by.
Thursday exit state
  • Full agent roster constructed — typically 6–10 agents, all matching their Agent Specs.
  • Full-roster integration run completed; data paths verified end-to-end.
  • State spreadsheet handling real-world data volumes.
  • Throughput measurement taken: invocations per hour per agent, end-to-end pipeline latency.
Friday demo target

“Here is the full roster running together. Here is what an end-to-end cycle looks like. Here is the throughput versus Discovery baseline.” The Executive Sponsor should attend this demo — it is the first time the system is visible as a coherent whole, and the throughput number is the first read on whether the sprint will hit its outcome target.

Common Week-7 pitfalls
  • Agent Engineer tries to add an eleventh agent. Decline. New ideas go to Pattern Library or post-sprint backlog.
  • Integration skipped in favour of more construction. A roster that has never run together is an architectural fiction; force the integration run even if it reveals bugs.
  • Workflow Owner overwhelmed. If the roster is too big to review in one session, split the demo into two — but still hold both on Friday.
Week 8Days 36–40 Governance hardening. Theme 04 · Trust earned

Week 8 is when the system stops being demonstrated and starts being trusted. The six governance components — Monitoring, Alerting, Audit Trail, Escalation, Override, Rollback — are all instrumented, tested, and exercised in a live incident response drill. Without this week, the system is a prototype, not a production system.

Monday planning focus
  • Governance gap scan — Governance Architect walks each of the 6 components; names which agents each one covers and which ones it does not.
  • Alert routing confirmation — who gets paged on a SEV-1? Who sees a SEV-2? Workflow Owner confirms.
  • Drill scenario — which incident type the Thursday drill will exercise (see §07).
Thursday exit state
  • All 6 governance components instrumented across every agent.
  • Alerts tested end-to-end against every SEV level.
  • Rollback procedure demonstrated live — revert a production state to a known-good snapshot in under 10 minutes.
  • Audit log reviewable by Workflow Owner without Studio assistance.
  • Incident response drill completed; remediation actions recorded in Risk Register v2.
Friday demo target

“We deliberately broke the system on Thursday. Here is what the alerts looked like. Here is who got paged. Here is how fast we rolled back. Here is what we changed as a result.” This is the demo that earns Gate 3 eligibility — the one that shows the system fails gracefully and recovers quickly.

Common Week-8 pitfalls
  • Deferring the drill to Week 9. It must happen in Week 8; Week 9 is for remediation and Gate 3, not for drill execution.
  • Alerting routed to the wrong humans. Very common — fix it in Week 8, not in production.
  • “Rollback works” claimed without demonstration. If it has not been demonstrated live with the Workflow Owner watching, it does not count toward Gate 3.
Week 9Days 41–45 Production readiness. Theme 05 · Gate 3

Week 9 is the readiness week. No new agents. No new governance components. Week 9 exists to prove the system runs without a Sprint Lead in the seat — a 72-hour autonomy window between Monday afternoon and Thursday afternoon where Studio touches nothing except in response to a fired alert. If that window completes clean, Gate 3 passes Friday.

Monday planning focus
  • Conditional Go items from Gate 2 — all must be closed before the 72-hour autonomy window starts. Sprint Lead confirms.
  • Autonomy window scoping — exact start and end times; list of permitted interventions (“respond to fired alerts only”); list of forbidden interventions (“do not touch state spreadsheet, do not push code”).
  • Gate 3 deck authoring ownership — Sprint Lead drafts the Production Readiness deck across Mon–Wed.
Thursday exit state
  • 72-hour autonomy window completed.
  • Gate 3 deck drafted: SLOs met, alerts fired and resolved, outcome metrics captured, Pattern Library finalised, Risk Register v2.1 closed or accepted.
  • Handoff Playbook pre-read circulated to Workflow Owner.
Friday demo target

“Here is the system, running on its own for 72 hours. Here is what happened. Here is the outcome delta vs. baseline. Gate 3: yes / no / conditional.” Executive Sponsor mandatory. This is the decisive moment of the sprint.

Common Week-9 pitfalls
  • Sprint Lead intervenes during the autonomy window. Even well-meaning fixes invalidate the window. If a SEV-2 fires, respond to it and note the intervention; do not quietly patch the code.
  • Outcome numbers unpolished. The Discovery baseline is the measuring stick; every headline metric must be stated as delta vs. baseline.
  • Handoff not foreshadowed. The Workflow Owner should already know what Handoff will ask of them before Gate 3 Friday.
§05

The Friday demo.

Run-of-show · roles · what counts as evidence

The Friday demo is the most load-bearing ritual of the Build phase. It is not a status update. It is a demonstration — a live run of the system — in front of the humans who will own it after Gate 4. Everything else about the week orbits this 45-minute window.

Attendees: Sprint Lead, Agent Engineer, Governance Architect, Workflow Owner. Executive Sponsor attends Week 7 and Week 9 at minimum; optional for Weeks 5, 6, 8. No additional client stakeholders without Workflow Owner pre-approval — the demo is not a marketing event.

45-minute run-of-show

15:00 00:00–02:00 Frame.

Sprint Lead names the week’s theme, recaps the Monday plan, states what this demo will show and what question it is asking the Workflow Owner to answer.

15:02 02:00–22:00 Demonstrate.

Live run. Real data. No slides except one — the Discovery baseline reference card, kept visible on the right half of the screen. Agent Engineer drives; Governance Architect narrates the wrapper behaviour in parallel. The screen is a split view: agent UI / state spreadsheet so the Workflow Owner can see cause and effect.

15:22 22:00–32:00 Question.

Workflow Owner answers the question framed in minute 00. “Does this output beat the baseline?” / “Does this alert route to the right person?” / “Would you trust this to run overnight?” Sprint Lead records the answer verbatim in the decision log.

15:32 32:00–40:00 Risk & patterns.

Governance Architect reads the Risk Register delta for the week — new risks, discharged risks, re-scored risks. Agent Engineer reads the Pattern Library delta — patterns confirmed, patterns abandoned, pattern candidates raised.

15:40 40:00–45:00 Next.

Sprint Lead names next Monday’s theme and the Week-N+1 demo target. No commitments past next Friday. Post decision log and updated registers in shared channel within 30 minutes of demo close.

What counts as a valid demo artifact

Valid ▸ evidence

Live execution.

The agent runs in front of everyone, on real data, hitting real systems. Any failure is visible. This is the baseline demo artifact.

Valid ▸ evidence

Recorded run.

A pre-recorded run shown in the demo — acceptable only when the agent’s natural cadence is longer than 45 minutes (e.g. a nightly process). Recording must include timestamps, raw input, raw output.

Valid ▸ evidence

State snapshot.

A before / after view of the state spreadsheet with the rows the agent wrote or updated. Acceptable for governance demos where the event was an alert that already fired during the week.

§ Not a demo
Slides about what the team did. A code walk-through. A verbal summary of test results. A chart from a spreadsheet with no provenance. None of these earn credit toward Gate 3. If it cannot be watched happening, it does not count.
§06

Governance build-out.

The six components · instrumentation order · what “done” looks like

In Redesign, the six governance components were designed. In Build, they are instantiated, wired, and tested. The Governance Architect owns this work; the Sprint Lead reviews it every Friday. Do not defer governance to Week 9 — by then, the system has too much surface area to instrument cleanly.

Instrumentation order (Weeks 5–8)

WeekComponentWhy this week“Done” definition
Week 5Audit TrailEvery agent invocation must log from the first day; retrofitting logs is expensive.Every invocation writes: timestamp, agent name, input hash, output hash, outcome, duration.
Week 5MonitoringIntroduced with infrastructure — dashboards read from the log pipeline.At least one dashboard visible to the Governance Architect; includes error rate and throughput per agent.
Week 6AlertingReal data means real failure modes — alerts become useful.Every SEV level has at least one defined alert; routing tested end-to-end.
Week 7OverrideWhen the roster runs together, the Workflow Owner must be able to pause individual agents without stopping the whole system.Per-agent pause / resume control in the state spreadsheet; verified by Workflow Owner.
Week 7EscalationSEV-2 and SEV-3 events start firing — routing rules need to be tested.Escalation matrix documented; at least one SEV-2 escalated end-to-end during the week.
Week 8RollbackThe incident response drill requires a working rollback.Rollback demonstrated live in under 10 minutes; audit log shows pre- and post-rollback state.

The state spreadsheet as governance surface (USP-009)

The state spreadsheet is not just a database. In Umbra Studio sprints it functions as the primary governance surface for the Workflow Owner. Certain columns are writeable by humans only (approvals, overrides, flags); certain columns are writeable by agents only (status, timestamps, output hashes). The column discipline is enforced at the agent layer; the Workflow Owner can always see and, in narrow cases, edit the human columns. This is the pattern that lets humans and agents share authority over the same data without a separate admin UI.

The Governance Architect is responsible for naming and defending this schema. Schema changes after Week 5 require Sprint Lead sign-off and a note in the audit log.

Governance readiness matrix (end of Week 8)

ComponentInstrumentTestedDemonstratedStatus
MonitoringDashboards liveEnd-to-end data flow verifiedFriday Week 5Ready
AlertingSEV-1/2/3 routedTest alerts fired weeklyFriday Week 6Ready
Audit TrailAll invocations loggedRandom spot checksFriday Week 5Ready
EscalationRouting matrixSEV-2 walked liveFriday Week 7Ready
OverridePause / resumeWorkflow Owner exerciseFriday Week 7Ready
RollbackSnapshot + restoreDrill — Week 8Friday Week 8Ready
§ Governance Runbook
The Governance Runbook template (lighthouse-governance-runbook.docx) is populated week-by-week as each component comes online. By end of Week 8, the Runbook is the document the Workflow Owner reads on day one of Handoff.
§07

Incident response drill.

Week 8 · Thursday · a rehearsed failure

The incident response drill is a rehearsed, deliberate failure — an event injected into the running system to test whether monitoring catches it, alerting routes it, the right human responds, rollback works, and the audit log reads cleanly afterward. It is not a check-the-box exercise; it is the single most load-bearing governance event of the sprint.

The drill is scheduled on Thursday of Week 8, 14:00–16:00 client-local. Governance Architect injects the incident; Sprint Lead watches and does not intervene. Workflow Owner is on call and responds as they would in production. Full transcript is captured for Friday’s demo and for the Gate 3 deck.

Choose the scenario at Week 8 Monday planning

The Governance Architect proposes the scenario; Sprint Lead and Workflow Owner agree. The scenario should match one of the top three risks in the Risk Register — no fictional or low-probability failures. Below are three canonical scenarios. Pick one; do not combine.

▸ Drill scenario A

Agent produces bad output at scale.

Injection: Governance Architect seeds the editorial-writing (or analogous content-producing) agent with a deliberately corrupt input that will cause it to produce output far below the Week-6 baseline. Expected response: Monitoring notices the quality-score drop; SEV-2 fires to Workflow Owner within < 5 minutes; Workflow Owner pauses the agent via Override; Agent Engineer investigates; rollback restores the last good state; audit log shows the incident end-to-end. Pass criterion: Workflow Owner reaches a paused, rolled-back state within 30 minutes of injection, without Sprint Lead intervention.

▸ Drill scenario B

Credential revoked mid-run.

Injection: Technical Counterpart revokes one of the API keys used by a live agent. Expected response: Agent invocation fails; Monitoring catches the spike in failure rate; Alerting routes a SEV-2 to Technical Counterpart and a SEV-3 to Workflow Owner; Technical Counterpart provisions a replacement credential; agent resumes; audit log shows the outage window clearly. Pass criterion: System recovers in under 45 minutes; the audit log can answer “what happened and when” without Studio assistance.

▸ Drill scenario C

State spreadsheet corrupted.

Injection: Governance Architect deletes or malforms several rows in the state spreadsheet (USP-009) — simulating a well-meaning human error by a Workflow Owner. Expected response: Monitoring catches schema-shape mismatches; SEV-1 fires; Rollback procedure invoked; snapshot restored in < 10 minutes; agents resume against the restored state. Pass criterion: Rollback executes cleanly; no agent re-runs the same invocation twice after restore (idempotency verified).

Drill artifacts

  1. Drill transcript — minute-by-minute record of events, people, and actions. Authored by Sprint Lead as observer.
  2. Alert trace — every alert that fired, the routing path, the timestamp, the human who responded.
  3. Audit log excerpt — the slice of the audit log covering the incident window, with annotations.
  4. Remediation actions — any gaps discovered during the drill are logged as Risk Register items with owners and due dates.
  5. Drill outcome memo — half-page summary, included verbatim in the Gate 3 deck.
§ Non-negotiable
A sprint cannot pass Gate 3 without a completed drill. If Week 8 Thursday is compromised (client unavailable, system broken), reschedule within Week 9 Monday and accept that Gate 3 may move to the following Tuesday. Do not waive the drill.
§08

Gate 3 · production readiness.

Week 9 Friday · decision to enter Handoff

Gate 3 decides whether the system is ready for Handoff. It is a formal review attended by the Workflow Owner, Technical Counterpart, and Executive Sponsor on the client side, and the full Studio team. The outcome is one of three: Pass, Conditional Pass (with named conditions to be met before Handoff begins), or Hold (Week 10 added; no new agents).

Gate 3 · Production Readiness

Is the system ready to leave Studio’s hands and enter supervised Handoff under the Workflow Owner’s authority?

The sprint passes Gate 3 when the system satisfies all of the following:

  • Agent roster matches the Gate 2 contract. All contracted agents built, running, and passing their Agent Spec success criteria. No net-new agents added since Gate 2 without Sprint Lead sign-off.
  • Governance Wrapper v1.0 fully instrumented. All 6 components operational; readiness matrix (see §06) shows “Ready” on every row.
  • Incident response drill completed. Drill transcript, alert trace, and audit excerpt attached to the Gate 3 deck. Any remediation items carried as Risk Register entries with owners.
  • 72-hour autonomy window closed clean. No Sprint Lead interventions during the window; any fired alerts resolved by the correct on-call human within SLO.
  • Outcome metrics vs. baseline stated. Throughput, quality, human-hour, and error-rate deltas vs. the Discovery baseline. Each expressed as a ratio or percentage, with confidence interval or sample size.
  • Risk Register v2.1 current. Every risk has status: discharged, accepted, or open-with-treatment. No silent risks.
  • Pattern Library candidates resolved. Each candidate confirmed (tagged USP-###), abandoned (with note), or carried forward (with rationale).
  • Workflow Owner ready statement. Workflow Owner answers on the record: “Am I ready to take supervised operation on Monday?” A “no” triggers Conditional Pass with named conditions.

Gate 3 deck · required structure

  1. Cover — sprint name, client, Gate 3 date, recommendation.
  2. Contract recap — what Gate 2 committed to; one-liner per agent.
  3. Outcome metrics vs. Discovery baseline — the headline numbers (throughput, quality, human hours, error rate).
  4. Governance readiness matrix — the six-row table from §06.
  5. Incident response drill summary — scenario, timeline, pass / fail per criterion.
  6. 72-hour autonomy window log — alerts fired, humans responding, any interventions.
  7. Risk Register v2.1 summary — count by status; attach full register as appendix.
  8. Pattern Library update — confirmed USP entries; candidates carried forward.
  9. Handoff readiness statement — Workflow Owner’s on-the-record answer.
  10. Recommendation — Pass / Conditional Pass / Hold, with named conditions and dates.
§ Decision discipline
Gate 3 is a decision, not a presentation. If conditions are attached, they must be named, dated, owned, and written into the decision memo. A Conditional Pass without specifics is a Hold dressed up; do not sign one off.
§09

Worked example · Indietheka.

Five weeks of real Build — numbers, surprises, discharges

The Indietheka Build phase ran from Week 5 to Week 9 of the internal sprint. Ten agents came online. Throughput moved from 0.8 reviews per week (human-only) to 6 per week (agent-supervised), matching the 7× target. Human hours per review fell from 6.4 to 0.5 — a 92% reduction. The governance wrapper caught two real incidents during the sprint, including one that would have published a broken album review without review.

Week 5
Infra

State spreadsheet stood up Day 22 — the Review Queue tab with 18 columns, 7 writeable by humans (priority, approval, notes), 11 writeable by agents (status, timestamps, URLs, hashes). First agent: the RSS Poller — ingests 50 indie-music feeds and writes candidate rows to the queue. Demo Friday: 82 rows written from a 30-minute run.

Credentials provisioned: WordPress app password, Spotify client, Album of the Year scraper config. Surprise: Spotify OAuth flow took 3 days; escalated to Workflow Owner Day 22, resolved Day 24.

Week 6
First production

Two agents promoted to production: Research Compiler (pulls artist bios, discography, press coverage into a research brief) and Editorial Writer (drafts the album review in Spanish). Weekly Friday demo ran 14 side-by-side comparisons — Workflow Owner rated 11 as “publishable with light edit,” 3 as “needs rework.” Baseline for the same task (human-only) was 6.4 hours per review; agent draft time was under 3 minutes.

Risk discharged: R-003 (editorial voice drift). Workflow Owner confirmed Week-6 drafts stayed in the site’s voice. Risk added: R-011 (translation artefacts from English sources bleeding into Spanish output) — treatment plan drafted same day.

Week 7
Scale-out

Full roster live: RSS Poller, Review Queue Manager, Research Compiler, Editorial Writer, Cover Art Retriever, SEO Metadata Builder, WordPress Publisher, Social Syndication, Spotify Sync, Performance Analytics — ten agents. Integration run Wednesday: end-to-end cycle from RSS row to published draft took 8 minutes 40 seconds.

Friday demo: throughput measurement on record — 6 reviews-per-week sustained, versus Discovery baseline of 0.8. Executive Sponsor attended. Surprise: the Cover Art Retriever occasionally returned the wrong pressing’s cover; flagged as R-014, added to Risk Register, treatment plan: fallback to AOTY canonical source.

Week 8
Governance

All six governance components instrumented. Drill scenario A chosen (bad-output-at-scale): Governance Architect seeded the Editorial Writer with a deliberately corrupt research brief Thursday 14:07. Monitoring caught the quality score drop at 14:09; SEV-2 fired to Workflow Owner at 14:10; Workflow Owner paused the agent at 14:13 via state-spreadsheet Override column; rollback restored at 14:19. Full incident end-to-end: 12 minutes. Audit log reviewed clean.

Remediation: added a “research brief coherence check” pre-condition on the Editorial Writer — prevents the corrupted-input class of failure entirely. Confirmed as USP-014 (Pre-Condition Guards on LLM-Class Agents).

Week 9
Readiness

72-hour autonomy window ran from Monday 14:00 to Thursday 14:00. 47 agent invocations occurred; 3 alerts fired (all SEV-3, all resolved by Workflow Owner inside SLO); zero Studio interventions. Two incidents caught by the wrapper: (1) a partial Spotify API outage handled by back-off-and-retry; (2) a malformed RSS feed row handled by the queue’s schema-validator.

Outcome metrics vs. baseline: 7.2× throughput; 92% human-hour reduction; error rate from 22% to 4%; cycle time from 5.2 days to 7 hours. Gate 3: Pass — no conditions. System entered Handoff Monday of Week 10.

§ Pattern harvest
The Build phase confirmed four pattern candidates as USP entries: USP-009 (Spreadsheet-as-State-Machine), USP-011 (Weekly Demo as Trust-Earning Mechanism), USP-013 (72-Hour Autonomy Window before Gate 3), USP-014 (Pre-Condition Guards on LLM-Class Agents). These are now templates for every future sprint.
§10

Build Lead checklist.

Run this every Monday morning

Twelve items. If any answer is “no,” address it before Monday planning starts.

  1. Friday demo decision log from last week is posted in the shared channel, with the Workflow Owner’s verbatim answer to the demo question.
  2. Risk Register delta from last week is merged into v2.x — new risks scored, discharged risks marked.
  3. Pattern Library candidates from last week are tagged (confirmed, abandoned, carried forward).
  4. This week’s theme is named (see §04) and the Friday demo target is pre-drafted.
  5. The week’s agent and governance owners are clear to every Studio team member.
  6. Any client-side blocker from last week has an escalation path ready for Monday 11:00.
  7. The state spreadsheet schema is unchanged, or any change has a signed-off audit-log entry.
  8. Standup attendance for Tue–Thu is confirmed; 09:30 — 09:50 holds are protected on every calendar.
  9. The Tuesday–Thursday calendar is free of non-emergency meetings before 14:00.
  10. The Friday demo slot (15:00–15:45) has the Workflow Owner confirmed; Executive Sponsor invited per week (required Weeks 7, 9).
  11. Any Conditional Go items from Gate 2 still open are tracked, with due dates inside Week 5 if not yet closed.
  12. For Week 8: drill scenario selected, Governance Architect ready to inject, Workflow Owner on call Thursday 14:00–16:00. For Week 9: autonomy window start and end named; permitted intervention list drafted.
§ Closing note
The weekly cadence is the product. A Sprint Lead who protects Monday planning, Tuesday–Thursday focus, and Friday demos for five weeks in a row will almost certainly reach Gate 3. Protect the rhythm. The system follows.