The Build Playbook.
Weekly-cadence operational playbook for Weeks 5–9 of a Lighthouse Sprint. Covers the Monday-planning / Tuesday–Thursday-build / Friday-demo rhythm, per-week theme progression, governance build-out, incident response drill, and the Gate 3 production readiness review — the phase where the Redesign Blueprint becomes a running system.
Paired with the Agent Spec Sheet, Redesign Blueprint, Risk Register, and the Governance Runbook templates. Read after the Redesign Playbook; keep open Weeks 5 through 9.
How to use this playbook.
Build is the construction phase of a Lighthouse Sprint. Discovery produced evidence; Redesign produced a contract; Build produces a running system. Five working weeks — twenty-five working days — organised around a repeating weekly rhythm, one theme per week, and one public demonstration every Friday at 15:00 client-local time.
The playbook is written for the Sprint Lead, who now stands behind the Agent Engineer and the Governance Architect rather than in front of them. Most of the work in Build is authored by those two roles. The Sprint Lead’s job is to protect the cadence, run the demos, and enforce the gates — not to write agents. If the Sprint Lead is writing code after Day 25, something has gone wrong upstream.
If the Gate 2 Redesign Blueprint has any unresolved Conditional Go items, those must be closed in Week 5 before net-new agent construction begins. Build on an unstable contract and every week after will be spent re-scoping instead of shipping.
Inputs required on Day 21
| Artifact from Redesign | Used for |
|---|---|
| Redesign Blueprint v1.0 | Canonical architecture — every agent, every data path, every governance component. |
| Agent Spec Sheets (one per agent) | Contract-level specification each agent is built against. Build Lead refuses to start construction if a spec is missing fields. |
| Governance Wrapper design | Tells the Governance Architect which components to instantiate, in what order, for which agents. |
| Risk Register v1.0 | Build-week risks become live; treatment plans activate as construction begins. |
| Integration map | Identifies the client systems that must be read from or written to — and thus which credentials / access grants are needed in Week 5 Day 1. |
| Gate 2 decision memo | Including any Conditional Go items that must be closed by Week 5 Friday. |
| Pattern candidates (from Discovery / Redesign) | Tagged USP-XXX in agent specs — confirmed or abandoned during Build. |
Reading order
Read §02–§05 before Day 21. §06 is read by the Governance Architect on Day 21 and revisited at every Friday demo. §07 (Incident Response Drill) must be read by the full Studio team in Week 8 — it is the prerequisite for Gate 3. §08 is Gate 3 itself; §09 is the Indietheka worked example; §10 is the weekly checklist the Sprint Lead runs against every Monday morning.
Build at a glance.
Five weeks, twenty-five working days. Each week has one dominant theme — infrastructure, first agents, scale-out, governance hardening, readiness — and one Friday demo that makes that week’s progress visible to the Workflow Owner and the Executive Sponsor.
Themes are not rigid sequences. Week 6 and Week 7 overlap; agent construction continues through Week 8. But the centre of gravity shifts each week, and the Friday demo is scoped to the theme, so the sprint has a legible shape for stakeholders who are not sitting in standup.
The five-week arc
| Week | Theme | Demo focus | Exit state |
|---|---|---|---|
| Week 5 · Days 21–25 | Infrastructure & skeleton | Credentials work; first agent stub runs end-to-end on a trivial input | Environment stood up, spreadsheet-as-state-machine live, one agent executing in supervised mode |
| Week 6 · Days 26–30 | First production agents | Two agents on real client data; baseline comparison vs. human output | Priority-1 bottleneck addressed; Risk Register items for those agents discharged or documented |
| Week 7 · Days 31–35 | Scale-out & integration | Full agent roster running together; integration surface tested | 6–10 agents operating in supervised mode; data paths between agents verified |
| Week 8 · Days 36–40 | Governance hardening | Alerts firing to the right humans; rollback demonstrated live; audit log reviewed | All 6 governance components instrumented; incident response drill completed |
| Week 9 · Days 41–45 | Production readiness | Gate 3 review — the system runs for 72 hours with zero Sprint Lead interventions | Gate 3 passed; system enters Handoff under Workflow Owner authority |
Deliverables produced during Build
- Working agent roster — 5–10 named agents, each matching its Agent Spec, running on real client data by end of Week 7. Demonstrated Fridays 5–9.
- Governance Wrapper v1.0 — all 6 components instrumented and verified through drill. Wired in Week 5, hardened Week 8.
- State spreadsheet (USP-009 pattern) — the system-of-record the agents read from and write to; editable by the Workflow Owner. Stood up Day 22, populated with real rows by Week 6.
- Risk Register v2.0 — treatment outcomes recorded; new risks surfaced by construction are added and scored. Updated each Friday demo.
- Incident Response Drill report — transcript and remediation actions from the Week 8 drill. Required for Gate 3.
- Pattern Library candidates — patterns tagged during Build, confirmed or abandoned by Gate 3. Tracked in the Pattern Extraction Tracker.
- Gate 3 Production Readiness deck (
lighthouse-production-readiness.docx) — the deliverable that carries the sprint into Handoff. Authored Week 9, presented Day 45.
The weekly cadence.
Every week in Build runs the same rhythm. The predictability is a feature — it lets the Workflow Owner plan their own calendar around the sprint, and it tells the Agent Engineer exactly when to start, when to go heads-down, and when to present.
Do not skip demos. Do not skip Monday planning. If a Friday demo would be embarrassing because the week didn’t produce enough, hold it anyway and name the miss. A missed demo that goes on the record is more valuable than a skipped demo that quietly rolls forward into next week.
Plan the week.
One meeting, 60 minutes, full Studio team plus Workflow Owner. Review last Friday’s demo, confirm the week’s theme, split the agent work into ownership slots, surface any governance items. Output: a named list of deliverables with owners, posted in the shared channel by 11:00.
Build heads down.
Deep-work days. No meetings longer than 30 minutes. Agent Engineer writes and tests; Governance Architect instruments components; Sprint Lead clears blockers and keeps stakeholder loop asynchronous. One 20-minute standup at 09:30; otherwise, leave the team alone.
Demonstrate what shipped.
One meeting, 45 minutes, full Studio team plus Workflow Owner plus Executive Sponsor (non-mandatory but invited every week). Live demo, no slides. Decisions logged, Risk Register updated, Pattern Library reviewed, Monday’s plan foreshadowed.
Monday planning · detailed agenda
| Block | Duration | Content |
|---|---|---|
| Retrospective | 0–10 min | What shipped Friday? What went wrong? What surprised the team? No blame — just observed facts. Two to three items maximum; extended retro happens at Gate 3. |
| Week theme recap | 10–15 min | Sprint Lead names this week’s theme (see §04) and the Friday-demo target. Workflow Owner confirms business-side availability. |
| Agent ownership split | 15–35 min | Agent Engineer reads out the week’s agent work; Governance Architect reads out the week’s wrapper work. Each item gets an owner and an estimated Thursday exit state. |
| Risk & blocker scan | 35–50 min | Walk the Risk Register for live items. Any blockers that require client-side action are escalated to the Workflow Owner by 11:00 with a date. |
| Demo target & close | 50–60 min | Agree what Friday’s demo will show. Sprint Lead posts the week-plan note in the shared channel before end of meeting. |
Tuesday–Thursday standup · detailed agenda
Twenty minutes, 09:30, full Studio team only. Workflow Owner is welcome but not obligated. Each participant answers three questions in under 3 minutes:
| Question | What a good answer sounds like |
|---|---|
| What did you ship yesterday? | Concrete, demonstrable; “I wired the RSS poller to write into the state spreadsheet and I can show a row” — not “I worked on the poller.” |
| What will you ship today? | Named output, estimated duration, explicit definition of done — “I’ll complete the review-queue pagination; done means the queue renders for 200 rows without time-out.” |
| What are you blocked on? | Specific blocker with a specific owner. If the owner is client-side, Sprint Lead takes the item and escalates by 11:00. |
Week-by-week themes.
The five Build weeks are sequenced so that each week’s Friday demo proves a specific capability and unlocks the next week’s work. Week 5 proves that plumbing works; Week 6 proves that one real agent beats the baseline; Week 9 proves the system runs without a Studio driver in the seat. Read each week’s theme before the Monday planning meeting.
Week 5 is about plumbing, not agents. The team stands up the environment, instantiates the spreadsheet-as-state-machine (USP-009), wires the first credentials, and writes the thinnest possible first agent — one that does almost nothing but proves the pipeline is alive. If Week 5 tries to ship a real production agent, it will skip the infrastructure work, and the sprint will pay for it in Week 7.
Monday planning focus
- Credentials inventory — every API key, service account, OAuth flow needed for Week 6. Technical Counterpart ownership. Must be resolved by Friday.
- State schema lock — the spreadsheet columns that will persist through the whole sprint. Once locked, schema changes require a Sprint Lead sign-off.
- First-agent choice — pick the agent with the smallest useful surface area, not the highest business value. Usually a read-only poller or fetcher.
Thursday exit state
- Development environment live; team can pull, run, and test locally.
- State spreadsheet exists with locked schema; at least 5 columns implemented.
- First agent stub executes end-to-end on a trivial input and writes at least one row to the state spreadsheet.
- Logging pipeline wired; every agent invocation writes a log record readable by the Governance Architect.
Friday demo target
“Watch this agent run, write a row, and log its invocation.” The demo does not need to be impressive — it needs to prove the plumbing exists. Workflow Owner watches; Executive Sponsor is not required for this demo.
Common Week-5 pitfalls
- Building two agents in parallel before infrastructure is proven. Resist this — Week 6 will go faster for it.
- Skipping the state spreadsheet in favour of a database. USP-009 exists because the Workflow Owner has to be able to read and edit the state; databases without a UI layer break that property.
- Leaving credentials to Week 6. Clients take 2–5 business days to provision access; request on Day 21.
Week 6 is where the sprint stops being infrastructure and starts being a product. Two agents come online against real client data — typically the agent addressing the highest-priority bottleneck from Discovery, plus a supporting agent. The Friday demo shows a side-by-side comparison of agent output against the Discovery-era human baseline, and the Workflow Owner renders an opinion.
Monday planning focus
- Agent selection — typically Priority-1 bottleneck (from Redesign Blueprint) plus one supporting agent it depends on. Must both be A-category or B-category allocations.
- Output schema — what exactly does a “successful” agent run produce? Define before construction, not after.
- Baseline comparison protocol — how will the Workflow Owner judge agent output against the Discovery baseline? Usually a small hand-scored sample (10–20 runs).
Thursday exit state
- Two agents matching their Agent Specs; both running against real client data.
- At least 20 agent runs executed end-to-end; output reviewed by Agent Engineer for quality.
- Governance Wrapper v0.5 — basic logging and audit trail instrumented for both agents.
- Baseline comparison material prepared for Friday demo.
Friday demo target
“Here is agent output alongside the Discovery baseline. Workflow Owner, what do you see?” This is the first demo where the Workflow Owner’s judgment is the headline. Expected outcome: green-light both agents, or flag quality gaps for Week 7.
Common Week-6 pitfalls
- Over-polishing the first agent. Ship at 80% quality; Week 7 will tune it.
- Demoing without a baseline. The demo needs the Discovery numbers on screen; otherwise it’s “look at the output” theatre with no decision surface.
- Not running enough samples. 20 runs is the floor for a credible quality read; fewer and the Workflow Owner can’t reason about it.
Week 7 is the widest week: the remaining agents come online, the integration surface between agents is exercised, and the system runs as a roster for the first time rather than as a set of isolated pieces. This is where the sprint earns or loses its 7× throughput claim.
Monday planning focus
- Remaining agent sequence — order matters. Dependencies first, dependents second. The Redesign Blueprint’s integration map is the reference.
- Integration checkpoints — at least one mid-week integration run where all live agents execute against the state spreadsheet simultaneously.
- Quality tune-up — any Week-6 agent that needed a tune-up gets a named owner and a Thursday done-by.
Thursday exit state
- Full agent roster constructed — typically 6–10 agents, all matching their Agent Specs.
- Full-roster integration run completed; data paths verified end-to-end.
- State spreadsheet handling real-world data volumes.
- Throughput measurement taken: invocations per hour per agent, end-to-end pipeline latency.
Friday demo target
“Here is the full roster running together. Here is what an end-to-end cycle looks like. Here is the throughput versus Discovery baseline.” The Executive Sponsor should attend this demo — it is the first time the system is visible as a coherent whole, and the throughput number is the first read on whether the sprint will hit its outcome target.
Common Week-7 pitfalls
- Agent Engineer tries to add an eleventh agent. Decline. New ideas go to Pattern Library or post-sprint backlog.
- Integration skipped in favour of more construction. A roster that has never run together is an architectural fiction; force the integration run even if it reveals bugs.
- Workflow Owner overwhelmed. If the roster is too big to review in one session, split the demo into two — but still hold both on Friday.
Week 8 is when the system stops being demonstrated and starts being trusted. The six governance components — Monitoring, Alerting, Audit Trail, Escalation, Override, Rollback — are all instrumented, tested, and exercised in a live incident response drill. Without this week, the system is a prototype, not a production system.
Monday planning focus
- Governance gap scan — Governance Architect walks each of the 6 components; names which agents each one covers and which ones it does not.
- Alert routing confirmation — who gets paged on a SEV-1? Who sees a SEV-2? Workflow Owner confirms.
- Drill scenario — which incident type the Thursday drill will exercise (see §07).
Thursday exit state
- All 6 governance components instrumented across every agent.
- Alerts tested end-to-end against every SEV level.
- Rollback procedure demonstrated live — revert a production state to a known-good snapshot in under 10 minutes.
- Audit log reviewable by Workflow Owner without Studio assistance.
- Incident response drill completed; remediation actions recorded in Risk Register v2.
Friday demo target
“We deliberately broke the system on Thursday. Here is what the alerts looked like. Here is who got paged. Here is how fast we rolled back. Here is what we changed as a result.” This is the demo that earns Gate 3 eligibility — the one that shows the system fails gracefully and recovers quickly.
Common Week-8 pitfalls
- Deferring the drill to Week 9. It must happen in Week 8; Week 9 is for remediation and Gate 3, not for drill execution.
- Alerting routed to the wrong humans. Very common — fix it in Week 8, not in production.
- “Rollback works” claimed without demonstration. If it has not been demonstrated live with the Workflow Owner watching, it does not count toward Gate 3.
Week 9 is the readiness week. No new agents. No new governance components. Week 9 exists to prove the system runs without a Sprint Lead in the seat — a 72-hour autonomy window between Monday afternoon and Thursday afternoon where Studio touches nothing except in response to a fired alert. If that window completes clean, Gate 3 passes Friday.
Monday planning focus
- Conditional Go items from Gate 2 — all must be closed before the 72-hour autonomy window starts. Sprint Lead confirms.
- Autonomy window scoping — exact start and end times; list of permitted interventions (“respond to fired alerts only”); list of forbidden interventions (“do not touch state spreadsheet, do not push code”).
- Gate 3 deck authoring ownership — Sprint Lead drafts the Production Readiness deck across Mon–Wed.
Thursday exit state
- 72-hour autonomy window completed.
- Gate 3 deck drafted: SLOs met, alerts fired and resolved, outcome metrics captured, Pattern Library finalised, Risk Register v2.1 closed or accepted.
- Handoff Playbook pre-read circulated to Workflow Owner.
Friday demo target
“Here is the system, running on its own for 72 hours. Here is what happened. Here is the outcome delta vs. baseline. Gate 3: yes / no / conditional.” Executive Sponsor mandatory. This is the decisive moment of the sprint.
Common Week-9 pitfalls
- Sprint Lead intervenes during the autonomy window. Even well-meaning fixes invalidate the window. If a SEV-2 fires, respond to it and note the intervention; do not quietly patch the code.
- Outcome numbers unpolished. The Discovery baseline is the measuring stick; every headline metric must be stated as delta vs. baseline.
- Handoff not foreshadowed. The Workflow Owner should already know what Handoff will ask of them before Gate 3 Friday.
The Friday demo.
The Friday demo is the most load-bearing ritual of the Build phase. It is not a status update. It is a demonstration — a live run of the system — in front of the humans who will own it after Gate 4. Everything else about the week orbits this 45-minute window.
Attendees: Sprint Lead, Agent Engineer, Governance Architect, Workflow Owner. Executive Sponsor attends Week 7 and Week 9 at minimum; optional for Weeks 5, 6, 8. No additional client stakeholders without Workflow Owner pre-approval — the demo is not a marketing event.
45-minute run-of-show
Sprint Lead names the week’s theme, recaps the Monday plan, states what this demo will show and what question it is asking the Workflow Owner to answer.
Live run. Real data. No slides except one — the Discovery baseline reference card, kept visible on the right half of the screen. Agent Engineer drives; Governance Architect narrates the wrapper behaviour in parallel. The screen is a split view: agent UI / state spreadsheet so the Workflow Owner can see cause and effect.
Workflow Owner answers the question framed in minute 00. “Does this output beat the baseline?” / “Does this alert route to the right person?” / “Would you trust this to run overnight?” Sprint Lead records the answer verbatim in the decision log.
Governance Architect reads the Risk Register delta for the week — new risks, discharged risks, re-scored risks. Agent Engineer reads the Pattern Library delta — patterns confirmed, patterns abandoned, pattern candidates raised.
Sprint Lead names next Monday’s theme and the Week-N+1 demo target. No commitments past next Friday. Post decision log and updated registers in shared channel within 30 minutes of demo close.
What counts as a valid demo artifact
Live execution.
The agent runs in front of everyone, on real data, hitting real systems. Any failure is visible. This is the baseline demo artifact.
Recorded run.
A pre-recorded run shown in the demo — acceptable only when the agent’s natural cadence is longer than 45 minutes (e.g. a nightly process). Recording must include timestamps, raw input, raw output.
State snapshot.
A before / after view of the state spreadsheet with the rows the agent wrote or updated. Acceptable for governance demos where the event was an alert that already fired during the week.
Governance build-out.
In Redesign, the six governance components were designed. In Build, they are instantiated, wired, and tested. The Governance Architect owns this work; the Sprint Lead reviews it every Friday. Do not defer governance to Week 9 — by then, the system has too much surface area to instrument cleanly.
Instrumentation order (Weeks 5–8)
| Week | Component | Why this week | “Done” definition |
|---|---|---|---|
| Week 5 | Audit Trail | Every agent invocation must log from the first day; retrofitting logs is expensive. | Every invocation writes: timestamp, agent name, input hash, output hash, outcome, duration. |
| Week 5 | Monitoring | Introduced with infrastructure — dashboards read from the log pipeline. | At least one dashboard visible to the Governance Architect; includes error rate and throughput per agent. |
| Week 6 | Alerting | Real data means real failure modes — alerts become useful. | Every SEV level has at least one defined alert; routing tested end-to-end. |
| Week 7 | Override | When the roster runs together, the Workflow Owner must be able to pause individual agents without stopping the whole system. | Per-agent pause / resume control in the state spreadsheet; verified by Workflow Owner. |
| Week 7 | Escalation | SEV-2 and SEV-3 events start firing — routing rules need to be tested. | Escalation matrix documented; at least one SEV-2 escalated end-to-end during the week. |
| Week 8 | Rollback | The incident response drill requires a working rollback. | Rollback demonstrated live in under 10 minutes; audit log shows pre- and post-rollback state. |
The state spreadsheet as governance surface (USP-009)
The state spreadsheet is not just a database. In Umbra Studio sprints it functions as the primary governance surface for the Workflow Owner. Certain columns are writeable by humans only (approvals, overrides, flags); certain columns are writeable by agents only (status, timestamps, output hashes). The column discipline is enforced at the agent layer; the Workflow Owner can always see and, in narrow cases, edit the human columns. This is the pattern that lets humans and agents share authority over the same data without a separate admin UI.
The Governance Architect is responsible for naming and defending this schema. Schema changes after Week 5 require Sprint Lead sign-off and a note in the audit log.
Governance readiness matrix (end of Week 8)
| Component | Instrument | Tested | Demonstrated | Status |
|---|---|---|---|---|
| Monitoring | Dashboards live | End-to-end data flow verified | Friday Week 5 | Ready |
| Alerting | SEV-1/2/3 routed | Test alerts fired weekly | Friday Week 6 | Ready |
| Audit Trail | All invocations logged | Random spot checks | Friday Week 5 | Ready |
| Escalation | Routing matrix | SEV-2 walked live | Friday Week 7 | Ready |
| Override | Pause / resume | Workflow Owner exercise | Friday Week 7 | Ready |
| Rollback | Snapshot + restore | Drill — Week 8 | Friday Week 8 | Ready |
Incident response drill.
The incident response drill is a rehearsed, deliberate failure — an event injected into the running system to test whether monitoring catches it, alerting routes it, the right human responds, rollback works, and the audit log reads cleanly afterward. It is not a check-the-box exercise; it is the single most load-bearing governance event of the sprint.
The drill is scheduled on Thursday of Week 8, 14:00–16:00 client-local. Governance Architect injects the incident; Sprint Lead watches and does not intervene. Workflow Owner is on call and responds as they would in production. Full transcript is captured for Friday’s demo and for the Gate 3 deck.
Choose the scenario at Week 8 Monday planning
The Governance Architect proposes the scenario; Sprint Lead and Workflow Owner agree. The scenario should match one of the top three risks in the Risk Register — no fictional or low-probability failures. Below are three canonical scenarios. Pick one; do not combine.
Agent produces bad output at scale.
Injection: Governance Architect seeds the editorial-writing (or analogous content-producing) agent with a deliberately corrupt input that will cause it to produce output far below the Week-6 baseline. Expected response: Monitoring notices the quality-score drop; SEV-2 fires to Workflow Owner within < 5 minutes; Workflow Owner pauses the agent via Override; Agent Engineer investigates; rollback restores the last good state; audit log shows the incident end-to-end. Pass criterion: Workflow Owner reaches a paused, rolled-back state within 30 minutes of injection, without Sprint Lead intervention.
Credential revoked mid-run.
Injection: Technical Counterpart revokes one of the API keys used by a live agent. Expected response: Agent invocation fails; Monitoring catches the spike in failure rate; Alerting routes a SEV-2 to Technical Counterpart and a SEV-3 to Workflow Owner; Technical Counterpart provisions a replacement credential; agent resumes; audit log shows the outage window clearly. Pass criterion: System recovers in under 45 minutes; the audit log can answer “what happened and when” without Studio assistance.
State spreadsheet corrupted.
Injection: Governance Architect deletes or malforms several rows in the state spreadsheet (USP-009) — simulating a well-meaning human error by a Workflow Owner. Expected response: Monitoring catches schema-shape mismatches; SEV-1 fires; Rollback procedure invoked; snapshot restored in < 10 minutes; agents resume against the restored state. Pass criterion: Rollback executes cleanly; no agent re-runs the same invocation twice after restore (idempotency verified).
Drill artifacts
- Drill transcript — minute-by-minute record of events, people, and actions. Authored by Sprint Lead as observer.
- Alert trace — every alert that fired, the routing path, the timestamp, the human who responded.
- Audit log excerpt — the slice of the audit log covering the incident window, with annotations.
- Remediation actions — any gaps discovered during the drill are logged as Risk Register items with owners and due dates.
- Drill outcome memo — half-page summary, included verbatim in the Gate 3 deck.
Gate 3 · production readiness.
Gate 3 decides whether the system is ready for Handoff. It is a formal review attended by the Workflow Owner, Technical Counterpart, and Executive Sponsor on the client side, and the full Studio team. The outcome is one of three: Pass, Conditional Pass (with named conditions to be met before Handoff begins), or Hold (Week 10 added; no new agents).
Is the system ready to leave Studio’s hands and enter supervised Handoff under the Workflow Owner’s authority?
The sprint passes Gate 3 when the system satisfies all of the following:
- Agent roster matches the Gate 2 contract. All contracted agents built, running, and passing their Agent Spec success criteria. No net-new agents added since Gate 2 without Sprint Lead sign-off.
- Governance Wrapper v1.0 fully instrumented. All 6 components operational; readiness matrix (see §06) shows “Ready” on every row.
- Incident response drill completed. Drill transcript, alert trace, and audit excerpt attached to the Gate 3 deck. Any remediation items carried as Risk Register entries with owners.
- 72-hour autonomy window closed clean. No Sprint Lead interventions during the window; any fired alerts resolved by the correct on-call human within SLO.
- Outcome metrics vs. baseline stated. Throughput, quality, human-hour, and error-rate deltas vs. the Discovery baseline. Each expressed as a ratio or percentage, with confidence interval or sample size.
- Risk Register v2.1 current. Every risk has status: discharged, accepted, or open-with-treatment. No silent risks.
- Pattern Library candidates resolved. Each candidate confirmed (tagged USP-###), abandoned (with note), or carried forward (with rationale).
- Workflow Owner ready statement. Workflow Owner answers on the record: “Am I ready to take supervised operation on Monday?” A “no” triggers Conditional Pass with named conditions.
Gate 3 deck · required structure
- Cover — sprint name, client, Gate 3 date, recommendation.
- Contract recap — what Gate 2 committed to; one-liner per agent.
- Outcome metrics vs. Discovery baseline — the headline numbers (throughput, quality, human hours, error rate).
- Governance readiness matrix — the six-row table from §06.
- Incident response drill summary — scenario, timeline, pass / fail per criterion.
- 72-hour autonomy window log — alerts fired, humans responding, any interventions.
- Risk Register v2.1 summary — count by status; attach full register as appendix.
- Pattern Library update — confirmed USP entries; candidates carried forward.
- Handoff readiness statement — Workflow Owner’s on-the-record answer.
- Recommendation — Pass / Conditional Pass / Hold, with named conditions and dates.
Worked example · Indietheka.
The Indietheka Build phase ran from Week 5 to Week 9 of the internal sprint. Ten agents came online. Throughput moved from 0.8 reviews per week (human-only) to 6 per week (agent-supervised), matching the 7× target. Human hours per review fell from 6.4 to 0.5 — a 92% reduction. The governance wrapper caught two real incidents during the sprint, including one that would have published a broken album review without review.
Infra
State spreadsheet stood up Day 22 — the Review Queue tab with 18 columns, 7 writeable by humans (priority, approval, notes), 11 writeable by agents (status, timestamps, URLs, hashes). First agent: the RSS Poller — ingests 50 indie-music feeds and writes candidate rows to the queue. Demo Friday: 82 rows written from a 30-minute run.
Credentials provisioned: WordPress app password, Spotify client, Album of the Year scraper config. Surprise: Spotify OAuth flow took 3 days; escalated to Workflow Owner Day 22, resolved Day 24.
First production
Two agents promoted to production: Research Compiler (pulls artist bios, discography, press coverage into a research brief) and Editorial Writer (drafts the album review in Spanish). Weekly Friday demo ran 14 side-by-side comparisons — Workflow Owner rated 11 as “publishable with light edit,” 3 as “needs rework.” Baseline for the same task (human-only) was 6.4 hours per review; agent draft time was under 3 minutes.
Risk discharged: R-003 (editorial voice drift). Workflow Owner confirmed Week-6 drafts stayed in the site’s voice. Risk added: R-011 (translation artefacts from English sources bleeding into Spanish output) — treatment plan drafted same day.
Scale-out
Full roster live: RSS Poller, Review Queue Manager, Research Compiler, Editorial Writer, Cover Art Retriever, SEO Metadata Builder, WordPress Publisher, Social Syndication, Spotify Sync, Performance Analytics — ten agents. Integration run Wednesday: end-to-end cycle from RSS row to published draft took 8 minutes 40 seconds.
Friday demo: throughput measurement on record — 6 reviews-per-week sustained, versus Discovery baseline of 0.8. Executive Sponsor attended. Surprise: the Cover Art Retriever occasionally returned the wrong pressing’s cover; flagged as R-014, added to Risk Register, treatment plan: fallback to AOTY canonical source.
Governance
All six governance components instrumented. Drill scenario A chosen (bad-output-at-scale): Governance Architect seeded the Editorial Writer with a deliberately corrupt research brief Thursday 14:07. Monitoring caught the quality score drop at 14:09; SEV-2 fired to Workflow Owner at 14:10; Workflow Owner paused the agent at 14:13 via state-spreadsheet Override column; rollback restored at 14:19. Full incident end-to-end: 12 minutes. Audit log reviewed clean.
Remediation: added a “research brief coherence check” pre-condition on the Editorial Writer — prevents the corrupted-input class of failure entirely. Confirmed as USP-014 (Pre-Condition Guards on LLM-Class Agents).
Readiness
72-hour autonomy window ran from Monday 14:00 to Thursday 14:00. 47 agent invocations occurred; 3 alerts fired (all SEV-3, all resolved by Workflow Owner inside SLO); zero Studio interventions. Two incidents caught by the wrapper: (1) a partial Spotify API outage handled by back-off-and-retry; (2) a malformed RSS feed row handled by the queue’s schema-validator.
Outcome metrics vs. baseline: 7.2× throughput; 92% human-hour reduction; error rate from 22% to 4%; cycle time from 5.2 days to 7 hours. Gate 3: Pass — no conditions. System entered Handoff Monday of Week 10.
Build Lead checklist.
Twelve items. If any answer is “no,” address it before Monday planning starts.
- Friday demo decision log from last week is posted in the shared channel, with the Workflow Owner’s verbatim answer to the demo question.
- Risk Register delta from last week is merged into v2.x — new risks scored, discharged risks marked.
- Pattern Library candidates from last week are tagged (confirmed, abandoned, carried forward).
- This week’s theme is named (see §04) and the Friday demo target is pre-drafted.
- The week’s agent and governance owners are clear to every Studio team member.
- Any client-side blocker from last week has an escalation path ready for Monday 11:00.
- The state spreadsheet schema is unchanged, or any change has a signed-off audit-log entry.
- Standup attendance for Tue–Thu is confirmed; 09:30 — 09:50 holds are protected on every calendar.
- The Tuesday–Thursday calendar is free of non-emergency meetings before 14:00.
- The Friday demo slot (15:00–15:45) has the Workflow Owner confirmed; Executive Sponsor invited per week (required Weeks 7, 9).
- Any Conditional Go items from Gate 2 still open are tracked, with due dates inside Week 5 if not yet closed.
- For Week 8: drill scenario selected, Governance Architect ready to inject, Workflow Owner on call Thursday 14:00–16:00. For Week 9: autonomy window start and end named; permitted intervention list drafted.