Operations Manual.
Running instructions for the system after sprint close: daily operations, human checkpoints, troubleshooting, performance monitoring, contacts, and the change log.
Drafted during Build, finalized at Handoff. The operator's first read on Day 51.
lighthouse-operations-manual.docx. For filling in a live engagement, open the DOCX file.
Overview
OPERATIONS MANUAL
Lighthouse Sprint — Stage 4: Handoff
Your team's complete guide to running the agentic workflow day-to-day.
| Client | [Client Name] |
|---|---|
| Workflow | [Workflow Name] |
| Prepared by | [Sprint Lead Name] |
| Date | [YYYY-MM-DD] |
| Version | [1.0] |
| Classification | Confidential — Client Internal |
Section 1
1. System Overview
Provide a plain-language summary of what the agentic workflow does. Write this for the operators who will run it daily — not for engineers.
1.1 What This System Does
[Describe the workflow in 2-3 sentences. What goes in, what comes out, who benefits.]
1.2 Workflow Architecture
Paste or reference the workflow diagram from the Redesign Blueprint. Label every agent, human checkpoint, and data flow.
[Insert workflow diagram or reference to diagram file]
1.3 Agents in This Workflow
List every agent deployed in this workflow. One row per agent.
| Agent ID | Agent Name | What It Does | Classification |
|---|---|---|---|
| [AGT-0X] | [Name] | [One-line description] | [Classification] |
| [AGT-0X] | [Name] | [One-line description] | [Classification] |
| [AGT-0X] | [Name] | [One-line description] | [Classification] |
| [AGT-0X] | [Name] | [One-line description] | [Classification] |
| [AGT-0X] | [Name] | [One-line description] | [Classification] |
2. Daily Operations
What does an operator need to do every day to keep this workflow running?
2.1 Morning Checklist
List the checks an operator should run at the start of each day.
[1. Check monitoring dashboard for overnight alerts]
[2. Review agent queue — any stuck or failed items?]
[3. Verify data inputs are flowing — check source systems]
[4. Review any items flagged for human approval]
[5. Check governance dashboard — any escalations pending?]
2.2 During the Day
What ongoing tasks does the operator manage?
[Describe the steady-state operation — what the operator monitors, when they intervene, how they handle the human-approval queue.]
2.3 End of Day
What wrap-up activities are needed?
[1. Review daily throughput — compare to baseline targets]
[2. Clear any remaining approval queue items]
[3. Note any anomalies in the daily log]
[4. Confirm all agents completed their scheduled runs]
2.4 Weekly Tasks
[List any weekly maintenance, reporting, or review tasks.]
3. Human Checkpoints
For every agent-initiated, human-approved task — what does the operator need to review, and how do they approve or reject?
3.1 Approval Queue
[Describe where approval requests appear — dashboard, email, Slack, etc. How does the operator access them?]
3.2 Review Criteria
For each type of approval, what should the operator check before approving?
| Agent | What to Review | Approve If | Reject If |
|---|---|---|---|
| [Agent Name] | [What to check] | [Criteria for approval] | [Criteria for rejection] |
| [Agent Name] | [What to check] | [Criteria for approval] | [Criteria for rejection] |
| [Agent Name] | [What to check] | [Criteria for approval] | [Criteria for rejection] |
3.3 Handling Rejections
[What happens when an operator rejects an agent's output? Does it retry, escalate, or require manual completion?]
4. Troubleshooting
Common issues and their solutions. Written for operators, not engineers.
4.1 Common Issues
| Symptom | Likely Cause | Fix | Escalate If |
|---|---|---|---|
| [What the operator sees] | [Root cause] | [Step-by-step fix] | [When to call for help] |
| [What the operator sees] | [Root cause] | [Step-by-step fix] | [When to call for help] |
| [What the operator sees] | [Root cause] | [Step-by-step fix] | [When to call for help] |
| [What the operator sees] | [Root cause] | [Step-by-step fix] | [When to call for help] |
| [What the operator sees] | [Root cause] | [Step-by-step fix] | [When to call for help] |
4.2 Escalation Path
[Level 1: Operator tries the fix from the table above]
[Level 2: Contact [Team Lead / Technical Support] via [channel]]
[Level 3: Contact Umbra Studio support (within 30-day window) via [channel]]
5. Performance Monitoring
5.1 Key Metrics to Track
| Metric | Baseline | Target | Current | Status |
|---|---|---|---|---|
| End-to-End Cycle Time | [from baseline] | [target] | [measure] | [on/off track] |
| Throughput | [from baseline] | [target] | [measure] | [on/off track] |
| Error Rate | [from baseline] | [target] | [measure] | [on/off track] |
| Human Hours per Cycle | [from baseline] | [target] | [measure] | [on/off track] |
| Agent Uptime | N/A | 99.5% | [measure] | [on/off track] |
5.2 Monitoring Dashboard
[URL or access instructions for the monitoring dashboard]
[What to look at, how often, what normal looks like]
6. Contacts & Support
| Workflow Owner | [Name, email, phone] |
|---|---|
| Primary Operator(s) | [Names and contact info] |
| Technical Support | [Internal team contact] |
| Umbra Studio Contact | [Sprint Lead name, email — for 30-day support window] |
| Support Window Ends | [Date — 30 days after handoff] |
| Emergency Contact | [For critical system failures outside business hours] |
7. Change Log
Track all updates to this manual.
| Date | Version | Changed By | Description |
|---|---|---|---|
| [YYYY-MM-DD] | [1.0] | [Sprint Lead] | Initial version — created during Handoff stage |