Lighthouse Watch · Monthly Health Report

Acme Corp · Marketing Operations Agent. Month 04 of 12.

Reporting period · 2026-04-01 to 2026-04-30 · Tier · Watch · Standard

Report2026.04 Signed · L. Hartwell
Lead engineer

Acceptance avg

87%

▲ +2pts vs Mar

Eval pass-rate

94%

— flat

+1 improvement

Shipped

Promoted week 04

Incidents

1 P1 · 1 P2 · all resolved

01 · Model status3 agents

All agents in production are running on contract-spec model versions. One model swap test was conducted this cycle (Claude N+1 release on 2026-04-12); recommendation: stay on current version. Cost would have increased 9% with no eval improvement.

Agent	Model in production	Last eval	Status	Recommended action
Brief Reviewer	`claude-sonnet-4-6`	2026-04-22	Healthy	Stay
Performance Summarizer	`claude-sonnet-4-6`	2026-04-22	Healthy	Stay
Localization QA	`claude-haiku-4-5`	2026-04-22	Watch	Run side-by-side with Sonnet 4.6 next month — Haiku showing 4pt acceptance drop on long-form variants

02 · Acceptance & eval trends90 days

Acceptance trends 90-day. Brief Reviewer climbing steadily after the W2 reviewer-UI tweak; Performance Summarizer flat; Localization QA the watchlist item.

Brief ReviewerAcceptance %

91%

▲ +6

Performance SummarizerAcceptance %

86%

▲ +2

Localization QAAcceptance %

73%

▼ -9

03 · Incidents this month2 total

04-08

P1 · High

Brief Reviewer acceptance dropped to 71% for 5 consecutive days after a prompt-template change merged to shared library. Root cause: tone parameter inadvertently shifted from "advisory" to "directive". Fix: revert + add eval check to catch tone drift.

Resolved 04-10

04-19

P2 · Standard

Drift score 0.27 on Localization QA outputs vs. baseline. Cause: candidate model upgrade upstream (Haiku 4.5 → 4.5.1) changed long-form variant phrasing patterns. No reviewer escalation, but flagged in this report's recommendation.

Acknowledged · scoping next

04 · The +1 improvement deliveredMonth 04

Pattern deployment · governance.policy-checklist v2

Reviewer single-pass approval for low-risk briefs.

Replaced the bespoke two-stage gating logic on Brief Reviewer with the canonical governance.policy-checklist pattern from the Studio library. The new pattern routes briefs through a deterministic risk scorer; low-risk briefs auto-approve with a 24-hr post-hoc audit trail, freeing reviewers to focus on the 30% of briefs that warrant human judgment.

Eval gate: a 12-check eval suite expansion was committed before promotion. All 12 passed at ≥95%. Reviewer feedback in shadow mode confirmed the auto-approval set was 96% identical to the prior approve queue.

Before

~32 min

Avg time-to-approve per low-risk brief

After

~3 min

Time-to-approve auto-route + spot-check audit

05 · Pattern-library updates available3 candidates

New patterns added to @umbra/patterns since last report. These are eligible for future +1 improvements.

USP-007
research.evidence-with-verification v3

Dual-source verification with confidence scoring. Could replace current single-source citation pattern in Brief Reviewer.

Recommend · Q3

USP-012
throttle.bounded-action-per-run v1

Caps maximum agent actions per cycle to prevent runaway loops. Useful for Performance Summarizer if scope expands.

Hold · not yet relevant

USP-014
telemetry.periodic-recap v2

Auto-generated weekly recap of agent activity for sponsors who don't want to read every report. Good fit for Acme's CMO.

Recommend · Q3

06 · Next month previewMonth 05

The +1 on deck and any expected events.

+1 improvement · proposed

Localization QA model side-by-side test (Sonnet 4.6 vs Haiku 4.5)

The 9-point acceptance drop on Localization QA over 90 days warrants a controlled swap test before reviewers escalate further. Side-by-side run for 1 month, then promote whichever wins on acceptance + cost. Eval gate: must beat current state by ≥3pts acceptance to promote Sonnet.

Quarterly architecture review · scheduled

Q2 review · 2026-05-21

90-min call with Acme CMO + Marketing Ops Director + Studio principal. Agenda: acceptance trends, scope of the localization issue, proposed Q3 patterns, year-end renewal preview.

Foundation watch

No model migrations expected

No deprecation dates on any in-scope dependency in May. Anthropic earnings call 2026-05-08 may surface a new release; if so, swap test runs the following week as standard.

Signed

L. Hartwell

Lead engineerUmbra Studio · Watch

Reviewed

A. Saca

Studio principalUmbra Studio