Template · sample data · v1.0 Print → PDF Ops manual
Lighthouse Watch · Monthly Health Report

Acme Corp · Marketing Operations Agent. Month 04 of 12.

Reporting period · 2026-04-01 to 2026-04-30 · Tier · Watch · Standard
Report2026.04 Signed · L. Hartwell
Lead engineer
Acceptance avg
87%
▲ +2pts vs Mar
Eval pass-rate
94%
— flat
+1 improvement
Shipped
Promoted week 04
Incidents
2
1 P1 · 1 P2 · all resolved

01 · Model status3 agents

All agents in production are running on contract-spec model versions. One model swap test was conducted this cycle (Claude N+1 release on 2026-04-12); recommendation: stay on current version. Cost would have increased 9% with no eval improvement.

Agent Model in production Last eval Status Recommended action
Brief Reviewer claude-sonnet-4-6 2026-04-22 Healthy Stay
Performance Summarizer claude-sonnet-4-6 2026-04-22 Healthy Stay
Localization QA claude-haiku-4-5 2026-04-22 Watch Run side-by-side with Sonnet 4.6 next month — Haiku showing 4pt acceptance drop on long-form variants

02 · Acceptance & eval trends90 days

Acceptance trends 90-day. Brief Reviewer climbing steadily after the W2 reviewer-UI tweak; Performance Summarizer flat; Localization QA the watchlist item.

Brief ReviewerAcceptance %
91%
▲ +6
Performance SummarizerAcceptance %
86%
▲ +2
Localization QAAcceptance %
73%
▼ -9

03 · Incidents this month2 total

04-08
P1 · High
Brief Reviewer acceptance dropped to 71% for 5 consecutive days after a prompt-template change merged to shared library. Root cause: tone parameter inadvertently shifted from "advisory" to "directive". Fix: revert + add eval check to catch tone drift.
Resolved 04-10
04-19
P2 · Standard
Drift score 0.27 on Localization QA outputs vs. baseline. Cause: candidate model upgrade upstream (Haiku 4.5 → 4.5.1) changed long-form variant phrasing patterns. No reviewer escalation, but flagged in this report's recommendation.
Acknowledged · scoping next

04 · The +1 improvement deliveredMonth 04

Pattern deployment · governance.policy-checklist v2

Reviewer single-pass approval for low-risk briefs.

Replaced the bespoke two-stage gating logic on Brief Reviewer with the canonical governance.policy-checklist pattern from the Studio library. The new pattern routes briefs through a deterministic risk scorer; low-risk briefs auto-approve with a 24-hr post-hoc audit trail, freeing reviewers to focus on the 30% of briefs that warrant human judgment.

Eval gate: a 12-check eval suite expansion was committed before promotion. All 12 passed at ≥95%. Reviewer feedback in shadow mode confirmed the auto-approval set was 96% identical to the prior approve queue.

Before
~32 min
Avg time-to-approve per low-risk brief
After
~3 min
Time-to-approve auto-route + spot-check audit

05 · Pattern-library updates available3 candidates

New patterns added to @umbra/patterns since last report. These are eligible for future +1 improvements.

USP-007
research.evidence-with-verification v3
Dual-source verification with confidence scoring. Could replace current single-source citation pattern in Brief Reviewer.
Recommend · Q3
USP-012
throttle.bounded-action-per-run v1
Caps maximum agent actions per cycle to prevent runaway loops. Useful for Performance Summarizer if scope expands.
Hold · not yet relevant
USP-014
telemetry.periodic-recap v2
Auto-generated weekly recap of agent activity for sponsors who don't want to read every report. Good fit for Acme's CMO.
Recommend · Q3

06 · Next month previewMonth 05

The +1 on deck and any expected events.

+1 improvement · proposed

Localization QA model side-by-side test (Sonnet 4.6 vs Haiku 4.5)

The 9-point acceptance drop on Localization QA over 90 days warrants a controlled swap test before reviewers escalate further. Side-by-side run for 1 month, then promote whichever wins on acceptance + cost. Eval gate: must beat current state by ≥3pts acceptance to promote Sonnet.

Quarterly architecture review · scheduled

Q2 review · 2026-05-21

90-min call with Acme CMO + Marketing Ops Director + Studio principal. Agenda: acceptance trends, scope of the localization issue, proposed Q3 patterns, year-end renewal preview.

Foundation watch

No model migrations expected

No deprecation dates on any in-scope dependency in May. Anthropic earnings call 2026-05-08 may surface a new release; if so, swap test runs the following week as standard.

Signed
L. Hartwell
Lead engineerUmbra Studio · Watch
Reviewed
A. Saca
Studio principalUmbra Studio