Lighthouse Watch · Monthly Health Report
Acme Corp · Marketing Operations Agent. Month 04 of 12.
Reporting period · 2026-04-01 to 2026-04-30 · Tier · Watch · Standard
Report2026.04
Signed · L. Hartwell
Lead engineer
Acceptance avg
87%
▲ +2pts vs Mar
Eval pass-rate
94%
— flat
+1 improvement
Shipped
Promoted week 04
Incidents
2
1 P1 · 1 P2 · all resolved
01 · Model status3 agents
All agents in production are running on contract-spec model versions. One model swap test was conducted this cycle (Claude N+1 release on 2026-04-12); recommendation: stay on current version. Cost would have increased 9% with no eval improvement.
| Agent |
Model in production |
Last eval |
Status |
Recommended action |
| Brief Reviewer |
claude-sonnet-4-6 |
2026-04-22 |
Healthy |
Stay |
| Performance Summarizer |
claude-sonnet-4-6 |
2026-04-22 |
Healthy |
Stay |
| Localization QA |
claude-haiku-4-5 |
2026-04-22 |
Watch |
Run side-by-side with Sonnet 4.6 next month — Haiku showing 4pt acceptance drop on long-form variants |
02 · Acceptance & eval trends90 days
Acceptance trends 90-day. Brief Reviewer climbing steadily after the W2 reviewer-UI tweak; Performance Summarizer flat; Localization QA the watchlist item.
Brief ReviewerAcceptance %
91%
▲ +6
Performance SummarizerAcceptance %
86%
▲ +2
Localization QAAcceptance %
73%
▼ -9
03 · Incidents this month2 total
04-08
P1 · High
Brief Reviewer acceptance dropped to 71% for 5 consecutive days after a prompt-template change merged to shared library. Root cause: tone parameter inadvertently shifted from "advisory" to "directive". Fix: revert + add eval check to catch tone drift.
Resolved 04-10
04-19
P2 · Standard
Drift score 0.27 on Localization QA outputs vs. baseline. Cause: candidate model upgrade upstream (Haiku 4.5 → 4.5.1) changed long-form variant phrasing patterns. No reviewer escalation, but flagged in this report's recommendation.
Acknowledged · scoping next
04 · The +1 improvement deliveredMonth 04
Pattern deployment · governance.policy-checklist v2
Reviewer single-pass approval for low-risk briefs.
Replaced the bespoke two-stage gating logic on Brief Reviewer with the canonical governance.policy-checklist pattern from the Studio library. The new pattern routes briefs through a deterministic risk scorer; low-risk briefs auto-approve with a 24-hr post-hoc audit trail, freeing reviewers to focus on the 30% of briefs that warrant human judgment.
Eval gate: a 12-check eval suite expansion was committed before promotion. All 12 passed at ≥95%. Reviewer feedback in shadow mode confirmed the auto-approval set was 96% identical to the prior approve queue.
Before
~32 min
Avg time-to-approve per low-risk brief
After
~3 min
Time-to-approve auto-route + spot-check audit
05 · Pattern-library updates available3 candidates
New patterns added to @umbra/patterns since last report. These are eligible for future +1 improvements.
USP-007
research.evidence-with-verification v3
Dual-source verification with confidence scoring. Could replace current single-source citation pattern in Brief Reviewer.
Recommend · Q3
USP-012
throttle.bounded-action-per-run v1
Caps maximum agent actions per cycle to prevent runaway loops. Useful for Performance Summarizer if scope expands.
Hold · not yet relevant
USP-014
telemetry.periodic-recap v2
Auto-generated weekly recap of agent activity for sponsors who don't want to read every report. Good fit for Acme's CMO.
Recommend · Q3
06 · Next month previewMonth 05
The +1 on deck and any expected events.
+1 improvement · proposed
Localization QA model side-by-side test (Sonnet 4.6 vs Haiku 4.5)
The 9-point acceptance drop on Localization QA over 90 days warrants a controlled swap test before reviewers escalate further. Side-by-side run for 1 month, then promote whichever wins on acceptance + cost. Eval gate: must beat current state by ≥3pts acceptance to promote Sonnet.
Quarterly architecture review · scheduled
Q2 review · 2026-05-21
90-min call with Acme CMO + Marketing Ops Director + Studio principal. Agenda: acceptance trends, scope of the localization issue, proposed Q3 patterns, year-end renewal preview.
Foundation watch
No model migrations expected
No deprecation dates on any in-scope dependency in May. Anthropic earnings call 2026-05-08 may surface a new release; if so, swap test runs the following week as standard.
Signed
L. Hartwell
Lead engineerUmbra Studio · Watch
Reviewed
A. Saca
Studio principalUmbra Studio