Shape the Field

AI Agents
in Product

Early use cases. How thinking evolved. What's next.

May 9, 2026 · CPO Council · Tenni Theurer

co-builder
$ whoami
Tenni Theurer. PM who's been running AI agents as an actual operating system.
1 year. 8 projects. 250+ sessions. Startups, advisory, enterprise.
Not experimenting. Operating.
$ cat approach.md
Built the whole thing with a product manager's lens.
Backlogs. Feedback loops. Interaction contracts. Value checks.
That framing turned out to matter more than the AI itself.
1
Capture
Week 1
2
Automate
Week 2
3
Leverage
Week 3
4
Share
What's next

Act 1

Early Use Cases

The solo builder phase

where-everyone-starts
$ describe --phase=1
Task automation. Auto-bookers, calendar sync, receipt scanners.
The agent does what a human would do, just faster.
This is where most orgs are right now.

Then you build for real businesses

early-stage-startup
$ ls agents/
email-pipeline/ Triages VIP contacts, classifies, ingests into 200+ doc KB
data-room-agent/ One-line email -> investor doc edits -> deck upload -> confirmation
financial-models/ 3 generations, 4 vendor comparisons, unit economics by location
ops-app/ Field feedback -> prompts -> schema migrations -> user onboarding
pm-agent/ Runs every 6h. Reads all state. Writes reports. Pushes back.
vision-review-agent
$ describe evaluator
Vision-based agent reviews field documentation against compliance standards.
Key unlock: full document sets, not samples.
Caught location mismatches 85 miles off. Signature gaps. Backdated records.
Batch of 29 real submissions. Correctly triaged pass / hold / incomplete.
Heading to production.
agent-built-software
$ describe --loop
plan Agent reads context, backlog, and constraints. Writes implementation plan.
build Agent executes the plan. Writes code. Runs tests. Commits.
eval Second agent reviews the diff. Flags gaps. Feeds back to plan.
Loop runs until eval passes. Human approves the merge.
$ stats
Multiple production apps shipped this way. Real users. Real revenue.
The agent doesn't just help you manage work - it does the work.
You become the evaluator, not the operator.
I approached the agent the way a PM approaches a product. What's the user need? What's the feedback loop? Where does the system break? What's the minimum that delivers value?
turning-point
$ diff session-001 session-250
+ persistent memory loads every session
+ task system the agent reads AND writes to
+ PM layer: every task gets a value check
+ - what artifact was produced?
+ - what decision was made?
+ - what metric moved?
+ tasks with no tangible output get flagged
The agent stopped being a tool I use
and became a system I work inside.

Under the Hood

The Stack

Same layers, different scale. Click each layer to see what's inside.

Model Layer
Memory
Tools / MCPs
Skills
Automations
Outputs
The Human

The complexity tension

Capability
Complexity
Human load
Each layer makes the system more capable but also more complex. The human - me - has limited memory and computing power. So every layer needs to make my interface simpler, not just the system more powerful. I become the evaluator, not the operator. The system proposes; I judge.

Live Simulation

Agent in Action

What a morning triage actually looks like

co-builder - morning triage
$ _

Act 2

What Changed

From solo OS to manager OS

The pattern survived the transition to enterprise. The implementation had to be completely rewritten. Three weeks to get it load-bearing.

W1
Bootstrap
Trust-building
W2
OS Overhaul
Real work
W3
Leverage
Load-bearing
Click each week above to see the progression.

The honest version

What works

The handoff pattern
AI prepares; human reviews and sends. Never delegate the send button.
Privacy as structure
Two-repo split makes leaks structurally impossible.
Investigation before draft
Replies grounded in real history, not generic AI tone.
Curated > comprehensive
A 5-observation synthesis lands. 20 pages doesn't.
Always-on enrichment
Work flows back into the right files. Memory compounds.

Where it breaks

Signal-to-noise
First version of any pipeline is too noisy. Needs allowlist pass.
Tooling fragility
MCP servers drop. Auth walls appear constantly.
The "drafted vs sent" gap
AI surfaces what's owed. Can't make you close the loop.
Judgment doesn't scale
5-10x on inputs. 2-3x on leverage. That's the ceiling.
Single-user shape
A teammate can't fork without rebuilding their own layer.

The System

The Manager OS

8 use cases. Built with PM discipline.

Not a chatbot. A manager OS. Two repos: one shareable, one private. AI handles inputs and drafts; the human owns judgment and the send button.

manager-os --list-capabilities
$ list --capabilities

Interactive

Where Are You?

Which of these are still manual in your org?

Daily intelligence triage
Meeting prep & capture
Drafting with context
People intelligence
Knowledge base
Strategic synthesis
Backlog & follow-through
Operational hygiene
0/8
manual processes identified
Check the ones your PMs still do manually.
Every checked box is an agent opportunity. The question isn't whether to automate them - it's what order, and who designs the system.

Act 3

12 Months Out

Four predictions grounded in 250+ sessions of real usage

🔒
Prediction 1
Memory becomes the moat, not the model
+

Proved this twice - once solo (250+ sessions of accumulated context), once at enterprise scale. Every product org will have access to the same foundation models. The differentiation is what your agent knows about this user, this workflow, this org.

CPO Implication

Your agent strategy should start with "what's the memory architecture?" not "which model?"

Prediction 2
The Manager OS is the first real agent product category
+

Not "chat with your data." Not "autocomplete in your IDE." A system that does daily triage, preps you for meetings, drafts in your voice, tracks follow-through, and maintains a running model of your people and priorities. I built it by hand. It works. Someone will productize it.

CPO Implication

If you're building agent products, look at what managers actually do all day. That's the TAM, not developer productivity.

Prediction 3
The ceiling is judgment, not automation
+

5-10x on inputs and drafts. 2-3x on actual leverage. That's the honest number. The compression is real. The thinking still requires you.

CPO Implication

Design for the 2-3x, not the 5-10x. Build your products around the handoff, not the automation.

Prediction 4
Agent-native PMs will create an uncomfortable performance gap
+

The PMs who build agent systems around themselves will operate at a visibly different level. This isn't a tool adoption curve. It's a skill gap. The difference between a PM who uses agents and a PM who builds agent systems is the difference between someone who uses Excel and someone who builds models.

CPO Implication

You need to decide if this is something you encourage, require, or let happen organically. In 12 months, you'll be able to tell which PMs built agent systems and which didn't - just from the quality of their work.

Close

The Shape

1
Capture
Watch, ingest, summarize
2
Automate
Pipelines, skills, agents
3
Leverage
Load-bearing, daily ops
4
Share
Team infrastructure
Every CPO in this room can start week 1 tomorrow. The question isn't whether agents work in product leadership - they do. The question is whether you're willing to invest the trust-building to get to the point where it's load-bearing.
takeaway
The hard part isn't the AI.
It's designing the system around the AI -
the memory, the contracts, the feedback loops,
the human-agent boundaries.
Building great agent systems is product work.
It's PM work.
The people best equipped to lead this transition
are already in your org.

Full speaker notes and outline →

Questions

"What about hallucination / trust?"
The handoff pattern handles this. AI never sends - it drafts. Investigation before draft means replies are grounded in real thread history. Trust is a design problem, not a model problem.
"Does it scale beyond one person?"
Not yet - that's the honest answer. The patterns are transferable but the infrastructure isn't forkable. Someone will productize this.
"What model do you use?"
Claude (Opus), but that's the least interesting part. The value is in the memory architecture, the interaction contracts, and the workflow design.
"How do you handle sensitive data?"
Privacy as structure: two-repo split, PII guards on every commit, session memory rotation. Make leaks structurally impossible rather than relying on vigilance.
"What should I try first?"
Daily intelligence triage. One sweep of everything that came in overnight. Lowest risk, highest immediate value, builds the trust that unlocks everything else.