# McKinsey "One Year of Agentic AI" → Umbra Studio

**Source:** Yee, Chui, Roberts (with Xu), *One year of agentic AI: Six lessons from the people doing the work*, McKinsey & Company, September 2025. Based on 50+ McKinsey-led agentic builds plus dozens more in the marketplace.

**Why this matters for Studio:** Almost every claim in this report is a public-record validation of the Lighthouse Sprint thesis. McKinsey is, in effect, telling the market what Umbra Studio is selling. The report should be read as both (a) a credibility lever for outreach and the deck, and (b) a checklist that exposes two real gaps in the current operating framework — formal evaluations and tool-selection guidance.

---

## The six lessons, mapped

### Lesson 1 — "It's not about the agent; it's about the workflow"

**McKinsey's claim.** Organizations that focus on agents produce great-looking demos that don't move workflow outcomes. Value comes from reimagining the entire workflow — people, process, technology — and dropping the right tool into each step.

**Key line:** *"Agentic AI efforts that focus on fundamentally reimagining entire workflows… are more likely to deliver a positive outcome."*

**Studio mapping.** This is the Lighthouse Sprint thesis verbatim. "We sell workflow redesign, not AI implementation." Discovery → Redesign → Build → Handoff is McKinsey's prescription, packaged.

**Gap.** None — but the deck and landing copy should now lean harder on this validation. The phrase "agents bolted onto legacy workflows" should appear in our outbound copy *next to a McKinsey citation*, not just as our opinion.

**Action.** Add a "What the analysts see" callout to studio.html and the deck — paraphrased lesson + footnote citation.

---

### Lesson 2 — "Agents aren't always the answer"

**McKinsey's claim.** Different work demands different tools: rules-based automation, predictive analytics, gen AI, or agents. The trap is binary "agent / no agent" thinking. The report includes a tool-selection rubric:

- Rule-based + structured input → rule-based automation
- Unstructured input, extractive/generative task → gen AI / NLP / predictive
- Classification or forecasting from past data → predictive analytics or gen AI
- Synthesis, judgment, creative interpretation → gen AI
- Multistep decision-making with long-tail variable inputs → AI agents

Also: *"low-variance, high-standardization workflows… agents based on nondeterministic LLMs could add more complexity and uncertainty than value."*

**Studio mapping.** Stage 2 Redesign already has an "agent / human allocation framework." This needs a sibling: an **agent / non-agent technology allocation framework**. The Redesign Blueprint template should force the team to defend each "agent" assignment against rules / predictive / plain-LLM alternatives.

**Gap.** The current framework risks the very mistake McKinsey calls out — assuming agents are the answer because Studio is an agentic-design firm. We need explicit guardrails so we don't over-prescribe agents into low-variance steps.

**Action.**
1. Add a *Tool Selection Matrix* section to the Redesign Blueprint template (USL-T03) using McKinsey's variance × standardization framing.
2. Add a Discovery question: "Where does this workflow have low variance + high standardization?" — those steps become candidates for rules, not agents.
3. In sales materials, lead with "we'll tell you where agents are *not* the answer" — counterintuitive positioning, builds trust.

---

### Lesson 3 — "Stop 'AI slop': invest in evaluations and build trust"

**McKinsey's claim.** Demo-impressive agents frustrate real users; trust collapses; adoption stalls. The fix is to onboard agents like employees — clear job description, evaluations, continual feedback. Experts must label desired/undesired outputs (sometimes thousands of pairs). The report lists eight eval types:

1. Task success rate (end-to-end)
2. F1 / precision / recall
3. Retrieval accuracy
4. Semantic similarity
5. LLM-as-judge
6. Bias detection (confusion matrices)
7. Hallucination rate
8. Calibration error (confidence vs. accuracy)

**Key line:** *"Onboarding agents is more like hiring a new employee versus deploying software."*

**Studio mapping.** This is **the biggest gap in the current framework.** The Lighthouse Sprint Operating Framework v1.0 mentions quality gates and a Governance Wrapper but has no explicit eval workstream — no eval-design activity in Discovery, no eval suite as a Build deliverable, no eval-types catalog for the Sprint Lead to draw from.

**Gap.** Real and material. If a client asks "how will you prove the agent is good?" today, we'd answer with monitoring + audit trail, not with a labeled eval set scored against task success / F1 / hallucination rate. McKinsey is implicitly raising the bar for what "production-ready" means.

**Action — add an Eval Workstream to the framework:**
1. New Stage 1 (Discovery) activity: *Define eval criteria with experts* — surface tacit knowledge of "what separates a good output from a bad one."
2. New Stage 2 (Redesign) artifact: *Eval Plan* — for every agent in the redesign, which of the eight eval types apply and what the target threshold is.
3. New Stage 3 (Build) deliverable: *Eval Suite* — the labeled dataset + scoring harness, handed over alongside the agent.
4. New template: **USL-T12 Eval Plan & Suite** (currently missing from the 12-template index).
5. Update the Governance Runbook to require eval re-runs as part of the rollback / monitor cycle.

This is the single most important change suggested by the report. It strengthens Studio's "we ship working systems, not pilots" claim by giving us a defensible answer to "working how well?".

---

### Lesson 4 — "Make it easy to track and verify every step"

**McKinsey's claim.** At small scale, errors are easy to spot. At hundreds of agents, only step-level observability lets you debug. Outcome-only tracking is the trap.

**Example from the report:** an alternative-dispute-resolution provider noticed an accuracy drop on new cases, traced it via observability tools to upstream user segments submitting lower-quality data, fixed the upstream collection, and recovered.

**Studio mapping.** Strong alignment with the Governance Wrapper's six components (monitoring, alerting, audit trail, escalation, override, rollback). The framing is right; the language could borrow McKinsey's clearer phrasing.

**Gap.** The Governance Runbook describes monitoring at the workflow level. Make it explicit that monitoring is **per-step**, not per-outcome. Add a runbook section on "diagnosing accuracy drift to its upstream cause" using the McKinsey example as the canonical pattern.

**Action.**
1. Update Governance Runbook (USL-T11): add "step-level observability" as a required wrapper component, distinct from outcome metrics.
2. Add a worked example to the Operations Manual using the upstream-data-quality pattern.
3. In the deck's Governance section, explicitly contrast Studio's per-step instrumentation against typical "outcome-only" agent deployments.

---

### Lesson 5 — "The best use case is the reuse case"

**McKinsey's claim.** Companies build a unique agent for each task and create massive duplication; reusable components, centralized services (LLM observability, preapproved prompts), and shared assets eliminate **30–50% of nonessential work**.

**Studio mapping.** Direct validation of the Pattern Library moat. Studio's "third workflow cheaper than first, second cheaper than first" pricing logic is exactly this dynamic, but quantified.

**Gap.** The Pattern Library exists as a concept and 8 seed patterns. It is not yet a *product the client receives.* Each engagement should explicitly hand over a "patterns inherited / patterns contributed" inventory.

**Action.**
1. Add to every Outcome Report (USL-T10): a "Patterns inherited from library" section and "Patterns contributed back" section — makes the compounding moat visible to the client.
2. Quote the **30–50%** number directly in pricing conversations: "engagement two and three are priced lower because we eliminate 30–50% of the work via the library — McKinsey's number, not ours."
3. Build a one-page **Pattern Library Public Index** as a credibility asset on studio.html (titles + 1-line descriptions of each USP-XXX pattern, no IP given away).

---

### Lesson 6 — "Humans remain essential, but their roles and numbers will change"

**McKinsey's claim.** Humans stay — for oversight, edge cases, judgment, compliance — but the *number* of humans in a redesigned workflow generally drops. The redesign must explicitly place humans at the right inflection points. Human–agent interfaces matter: bounding boxes, highlights, click-to-source — one P&C insurer reached **~95% acceptance** on AI summaries by investing in interaction design.

**Key line:** *"Without [deliberate human-agent collaboration design], even the most advanced agentic programs risk silent failures, compounding errors, and user rejection."*

**Studio mapping.** Studio's "human above the loop" language is correct but currently abstract. The 95% acceptance number is gold — it gives a concrete target for what "good" looks like.

**Gap.** Studio has no explicit deliverable for the **human-agent UI**. The Build stage talks about agents and orchestration, not the reviewer interface that determines acceptance.

**Action.**
1. Add to the Build playbook an explicit *Reviewer UI* workstream — bounding boxes, source-linking, click-to-context, confidence display.
2. Add an outcome metric to the Outcome Report: **acceptance rate** — % of agent outputs accepted without human edit. Target ≥80%; cite McKinsey's 95% as best-in-class.
3. Reframe "human above the loop" in sales copy: "we design the reviewer UI, not just the agent — that's the difference between 40% acceptance and 95%."

---

## Quote bank — for deck, landing, and outreach

Use sparingly and always cite. Two or three is enough.

| Use it for | Quote / stat | Source location in PDF |
|---|---|---|
| Opener / problem statement | *"Many [companies] are finding it challenging to see value from their investments. In some cases, they are even retrenching — rehiring people where agents have failed."* | p. 2 |
| Studio thesis validation | *"Agentic AI efforts that focus on fundamentally reimagining entire workflows… are more likely to deliver a positive outcome."* | p. 3 |
| Tool-selection / "we'll tell you when not to use agents" | *"On one level, these issues are straightforward. For example, low-variance, high-standardization workflows… could add more complexity and uncertainty than value [if served by agents]."* | p. 4 |
| Eval workstream pitch | *"Onboarding agents is more like hiring a new employee versus deploying software."* | p. 5 |
| Pattern Library pricing | "30 to 50 percent of the nonessential work" eliminated via reusable services and assets | p. 7 |
| UI / human-collab pitch | "user acceptance levels near 95 percent" via interactive visual elements | p. 9 |
| Closing / urgency | *"Unless companies approach their agentic programs with learning in mind (and in practice), they're likely to repeat mistakes and slow their progress."* | p. 9 |

---

## Asset update queue — what changes downstream

Ranked by leverage, lightest to heaviest.

| Priority | Asset | Change |
|---|---|---|
| P0 | **studio.html / studio deck** | Add "What McKinsey just published" sidebar with two quotes (workflow + 30–50%). Reframe hero subhead around "we'll tell you where agents are *not* the answer." |
| P0 | **Lighthouse Sprint Operating Framework v1.0** | New section: *Eval Workstream*. New artifact: Eval Plan (Stage 2). New deliverable: Eval Suite (Stage 3). |
| P0 | **Template index** | Add **USL-T12 Eval Plan & Suite**. |
| P1 | **Redesign Blueprint (USL-T03)** | Add Tool Selection Matrix (variance × standardization). Force defense of each agent assignment. |
| P1 | **Governance Runbook (USL-T11)** | Make step-level observability explicit. Add upstream-cause diagnosis worked example. |
| P1 | **Outcome Report (USL-T10)** | Add Patterns Inherited / Contributed sections. Add Acceptance Rate metric. |
| P2 | **Build playbook** | Add Reviewer UI workstream. |
| P2 | **Pattern Library Public Index** | New one-pager on studio.html — titles + 1-liners for USP-001..008. |
| P2 | **Outreach emails** | Bake the 30–50% reuse stat into pricing-conversation email; bake the 95% acceptance stat into "what good looks like" email. |

---

## How to use this doc going forward

1. Read it before any Studio outbound copy edit, deck revision, or framework refactor.
2. Treat the Eval Workstream gap as the next sprint of internal work — it's the single biggest credibility upgrade available.
3. When a prospect asks "what's different about Umbra Studio?", the answer is now: *"McKinsey published the recipe. We're the boutique that ships it — with our own properties as the proof points."*

— Synthesized April 28, 2026.
