Shape the Field

AI Agents
in Product

Early use cases. How thinking evolved. What's next.

May 9, 2026 · CPO Council · Tenni Theurer

co-builder

$ whoami

Tenni Theurer. PM who's been running AI agents as an actual operating system.

1 year. 8 projects. 250+ sessions. Startups, advisory, enterprise.

Not experimenting. Operating.

$ cat approach.md

Built the whole thing with a product manager's lens.

Backlogs. Feedback loops. Interaction contracts. Value checks.

That framing turned out to matter more than the AI itself.

1

Capture

Week 1

2

Automate

Week 2

3

Leverage

Week 3

4

Share

What's next

Act 1

Early Use Cases

The solo builder phase

where-everyone-starts

$ describe --phase=1

Task automation. Auto-bookers, calendar sync, receipt scanners.

The agent does what a human would do, just faster.

This is where most orgs are right now.

Then you build for real businesses

early-stage-startup

$ ls agents/

email-pipeline/ Triages VIP contacts, classifies, ingests into 200+ doc KB

data-room-agent/ One-line email -> investor doc edits -> deck upload -> confirmation

financial-models/ 3 generations, 4 vendor comparisons, unit economics by location

ops-app/ Field feedback -> prompts -> schema migrations -> user onboarding

pm-agent/ Runs every 6h. Reads all state. Writes reports. Pushes back.

vision-review-agent

$ describe evaluator

Vision-based agent reviews field documentation against compliance standards.

Key unlock: full document sets, not samples.

Caught location mismatches 85 miles off. Signature gaps. Backdated records.

Batch of 29 real submissions. Correctly triaged pass / hold / incomplete.

Heading to production.

agent-built-software

$ describe --loop

plan Agent reads context, backlog, and constraints. Writes implementation plan.

build Agent executes the plan. Writes code. Runs tests. Commits.

eval Second agent reviews the diff. Flags gaps. Feeds back to plan.

Loop runs until eval passes. Human approves the merge.

$ stats

Multiple production apps shipped this way. Real users. Real revenue.

The agent doesn't just help you manage work - it does the work.

You become the evaluator, not the operator.

I approached the agent the way a PM approaches a product. What's the user need? What's the feedback loop? Where does the system break? What's the minimum that delivers value?

turning-point

$ diff session-001 session-250

+ persistent memory loads every session

+ task system the agent reads AND writes to

+ PM layer: every task gets a value check

+ - what artifact was produced?

+ - what decision was made?

+ - what metric moved?

+ tasks with no tangible output get flagged

The agent stopped being a tool I use

and became a system I work inside.

Under the Hood

The Stack

Same layers, different scale. Click each layer to see what's inside.

Model Layer

Memory

Tools / MCPs

Skills

Automations

Outputs

The Human

The complexity tension

Capability

Complexity

Human load

Each layer makes the system more capable but also more complex. The human - me - has limited memory and computing power. So every layer needs to make my interface simpler, not just the system more powerful. I become the evaluator, not the operator. The system proposes; I judge.

Live Simulation

Agent in Action

What a morning triage actually looks like

co-builder - morning triage

$ _

Act 2

What Changed

From solo OS to manager OS

The pattern survived the transition to enterprise. The implementation had to be completely rewritten. Three weeks to get it load-bearing.

W1

Bootstrap

Trust-building

W2

OS Overhaul

Real work

W3

Leverage

Load-bearing

Click each week above to see the progression.

The honest version

What works

The handoff pattern

AI prepares; human reviews and sends. Never delegate the send button.

Privacy as structure

Two-repo split makes leaks structurally impossible.

Investigation before draft

Replies grounded in real history, not generic AI tone.

Curated > comprehensive

A 5-observation synthesis lands. 20 pages doesn't.

Always-on enrichment

Work flows back into the right files. Memory compounds.

Where it breaks

Signal-to-noise

First version of any pipeline is too noisy. Needs allowlist pass.

Tooling fragility

MCP servers drop. Auth walls appear constantly.

The "drafted vs sent" gap

AI surfaces what's owed. Can't make you close the loop.

Judgment doesn't scale

5-10x on inputs. 2-3x on leverage. That's the ceiling.

Single-user shape

A teammate can't fork without rebuilding their own layer.

The System

The Manager OS

8 use cases. Built with PM discipline.

Not a chatbot. A manager OS. Two repos: one shareable, one private. AI handles inputs and drafts; the human owns judgment and the send button.

manager-os --list-capabilities

$ list --capabilities

Interactive

Where Are You?

Which of these are still manual in your org?

✓

Daily intelligence triage

✓

Meeting prep & capture

✓

Drafting with context

✓

People intelligence

✓

Knowledge base

✓

Strategic synthesis

✓

Backlog & follow-through

✓

Operational hygiene

0/8

manual processes identified

Check the ones your PMs still do manually.

Every checked box is an agent opportunity. The question isn't whether to automate them - it's what order, and who designs the system.

Act 3

12 Months Out

Four predictions grounded in 250+ sessions of real usage

🔒

Prediction 1

Memory becomes the moat, not the model

+

Proved this twice - once solo (250+ sessions of accumulated context), once at enterprise scale. Every product org will have access to the same foundation models. The differentiation is what your agent knows about this user, this workflow, this org.

CPO Implication

Your agent strategy should start with "what's the memory architecture?" not "which model?"

⚙

Prediction 2

The Manager OS is the first real agent product category

+

Not "chat with your data." Not "autocomplete in your IDE." A system that does daily triage, preps you for meetings, drafts in your voice, tracks follow-through, and maintains a running model of your people and priorities. I built it by hand. It works. Someone will productize it.

CPO Implication

If you're building agent products, look at what managers actually do all day. That's the TAM, not developer productivity.

▲

Prediction 3

The ceiling is judgment, not automation

+

5-10x on inputs and drafts. 2-3x on actual leverage. That's the honest number. The compression is real. The thinking still requires you.

CPO Implication

Design for the 2-3x, not the 5-10x. Build your products around the handoff, not the automation.

⚡

Prediction 4

Agent-native PMs will create an uncomfortable performance gap

+

The PMs who build agent systems around themselves will operate at a visibly different level. This isn't a tool adoption curve. It's a skill gap. The difference between a PM who uses agents and a PM who builds agent systems is the difference between someone who uses Excel and someone who builds models.

CPO Implication

You need to decide if this is something you encourage, require, or let happen organically. In 12 months, you'll be able to tell which PMs built agent systems and which didn't - just from the quality of their work.

Close

The Shape

1

Capture

Watch, ingest, summarize

2

Automate

Pipelines, skills, agents

3

Leverage

Load-bearing, daily ops

4

Share

Team infrastructure

Every CPO in this room can start week 1 tomorrow. The question isn't whether agents work in product leadership - they do. The question is whether you're willing to invest the trust-building to get to the point where it's load-bearing.

takeaway

The hard part isn't the AI.

It's designing the system around the AI -

the memory, the contracts, the feedback loops,

the human-agent boundaries.

Building great agent systems is product work.

It's PM work.

The people best equipped to lead this transition

are already in your org.

Full speaker notes and outline →

Questions

"What about hallucination / trust?"›

The handoff pattern handles this. AI never sends - it drafts. Investigation before draft means replies are grounded in real thread history. Trust is a design problem, not a model problem.

"Does it scale beyond one person?"›

Not yet - that's the honest answer. The patterns are transferable but the infrastructure isn't forkable. Someone will productize this.

"What model do you use?"›

Claude (Opus), but that's the least interesting part. The value is in the memory architecture, the interaction contracts, and the workflow design.

"How do you handle sensitive data?"›

Privacy as structure: two-repo split, PII guards on every commit, session memory rotation. Make leaks structurally impossible rather than relying on vigilance.

"What should I try first?"›

Daily intelligence triage. One sweep of everything that came in overnight. Lowest risk, highest immediate value, builds the trust that unlocks everything else.

Shape the Field

AI Agentsin Product

Act 1

Early Use Cases

Then you build for real businesses

Under the Hood

The Stack

The complexity tension

Live Simulation

Agent in Action

Act 2

What Changed

The honest version

What works

Where it breaks

The System

The Manager OS

What this changes for PMs

Interactive

Where Are You?

Act 3

12 Months Out

Close

The Shape

Questions

AI Agents
in Product