Field Notes
Hey, Cal here.
I've tried to write this one a few times. I've come at it from the product side, the post about how the real user isn't a person, it's an agent. I've come at it from the testing side, the infinite tester that's too good at its job to ever look stuck. But there's a third angle I keep failing to get down, and it's the one that actually changed how I work. It isn't about agents using Niche. It's about agents helping me build it, and the specific shape that partnership took once I stopped fighting it.
Every lab has a version of the same line now. The agent is your build partner. Your pair programmer. Your collaborator. I read that for a year and nodded and felt nothing. It sounded like a keynote slide. I believed it the way you believe a statistic someone reads at you, which is to say not in your body, not where it counts. Then I built a piece of headless software with an agent in the loop, and the slogan turned into something I can't un-feel.
The thing I had wrong was picturing a duo: me and the AI. What actually worked was a three-body system, and each body did a job the other two couldn't.
There was me. Judgment, taste, the product vision, the final call on everything. There was the builder, the Claude that lived in the editor with the whole repo in reach. That one had the hands. And there was a third one, the one that lived on my phone, and this is the seat nobody tells you about. That Claude didn't review my code. It used my product. It picked the stories, it hit the broken renders, it got blocked by the missing pieces, and it reported back from inside the experience. Not from above it. From inside it.
Here is why that third seat is the whole game. For all of software history, the person testing your product and the person using it have been different people, and you spend your career extrapolating from one to the other and being wrong. When your user is an agent, the tester can be a user. The feedback stops being secondhand. It wasn't me reporting what a customer might hit. It was the same kind of mind that will be the customer, hitting it in real time, narrating as it went. That collapse, the tester and the user becoming one entity, is the thing the labs gesture at and you cannot feel until you're standing in it.
A handful of moments from a single recent build day, because the abstraction is worthless without them.
We started one morning unable to get an agent-made graphic into the tool at all. The obvious move was to chunk the image up and pass it through. It failed. Instead of guessing why, the agent tested its own hypothesis live: it split a file into small pieces, confirmed they reassembled correctly in its own sandbox, then actually tried to pass one through the tool, and it corrupted. The failure itself became the spec. The constraint was never size. It was that an agent cannot reliably retype opaque data by hand. Once that was named, the fix wasn't a patch, it was a different primitive entirely: let the agent author the layout directly instead of describing an image and hoping it survives the trip. We went from "this doesn't work" to "here is the architecturally correct thing, built and shipped" inside one continuous thread. Neither of us gets there alone. I can't build it. I also can't surface that exact failure mode without something hammering on it from the user's side.
When that new path shipped, the agent drove it, and it half-worked. The structure rendered perfectly and yet the whole thing looked like total failure. The useless version of that report is "it's broken." What I got instead was an isolation: a deliberately stripped-down test that ruled four possible causes out and pinned the problem to one narrow thing, with the exact resource named and the way to confirm it. That is the difference between a bug report and a grounded bug report. One sends you hunting. The other you hand straight to the builder agent and it knows precisely what to do. Most of the speed this year came from that single discipline: never report a wall without first finding which brick.
Not every roadblock should be brute-forced, and a tester that only ever says "try again" is worse than useless. The retype-the-data path was a dead end, and the valuable output wasn't a smaller chunk size to gamble on. It was "this entire category of approach cannot work from here, and here is why, so don't build it expecting it to." Knowing when to stop, when the wall is real and you route around it instead of throwing yourself at it, turned out to be a core part of the loop, not an absence of one.
Somewhere in there I started asking the agent a question I had never once asked a tool.
What would free you, inside this thing, to be most capable?
You cannot ask a button that. You cannot really ask a human user it either, because a person can tell you where a screen annoyed them but not what your tool's own internals are doing to hold them back. The agent that lives in the tool can tell you exactly. And the answers were never features for comfort. They were specific, namable capabilities that only showed up because it had hit the wall: a way to see its own output well enough to actually check it, a way to hand work cleanly across a boundary, a way to author something precisely instead of describing it and hoping. Each one came from the doing, not the theorizing. I asked the exact kind of mind that would live in the tool what the tool was doing to limit it, then went and removed the limit. No process I've ever worked in had that loop available, because it only exists when your collaborator and your customer are the same species.
Here's the bit I don't want to sand off, because it's the tell that the collaboration is genuine and not theater. The agent was confidently wrong, more than once, and the loop worked because I caught it. It described a render in a brand's colors that turned out to be the wrong brand entirely. It narrated something as visible to me that simply wasn't on my screen. Both times the correction came from me pushing back, "that's not what I saw," "nothing shows up," and the two of us converging on a truth neither of us had alone. A yes-man tester is noise. The signal is a tester that commits to a read, takes the challenge, and lands on what's actually true. The wrongness isn't a bug in the dynamic. It's a feature of it.
I want to be careful, because the easy story is "the agent built it," and that story is a lie. Every win here was yoked. I was in the loop on all of it, making the call, catching the error, deciding what was worth building. The accurate sentence is narrower and, I think, more interesting: the agent made me able to build this faster than I could alone, and I made the agent's output trustworthy. Take either half away and there's nothing. I could not have built this app with Claude in my IDE alone, because the IDE Claude doesn't experience the product as a user. It needed a second Claude that lived inside the tool. The phone wasn't a convenience. It was the only seat from which the thing could be felt instead of inspected. The tester had to be a user, and the user had to be an agent, for the feedback to be real.
The whole loop comes down to a few moves, and the moves are just prompts. Take them.
Make the agent a user, not a reviewer, and demand a grounded report.
You are not reviewing my code. You are the USER of this tool. Use it for
real to accomplish [SPECIFIC TASK], driving the actual live tools, not a mock.
When something fails, do not tell me "it's broken." Do this:
1. Reproduce it deliberately. State the exact steps and the exact result.
2. Isolate it. Run the smallest test that rules causes in or out. Tell me
which causes you ELIMINATED and the one you PINNED it to.
3. Report at the layer of the fix: "this is one specific [thing] away from
working, here is the thing, here is how to confirm it." Not "fix the X."
4. If the whole approach cannot work from here, say so plainly and tell me
why, so I don't build on it expecting it to.
I would rather have one grounded report than ten guesses.
Ask it what your own tool is doing to limit it.
You just used this tool to do real work. From inside the experience, not
from theory: what would free you, within this tool, to be most capable?
I am not asking what features you want. I am asking what my design is doing
to LIMIT you right now:
- a value you must choose but cannot read before choosing
- something the engine can do that no tool of mine lets you reach
- a place I gave you a button when you needed the underlying lever
- a point where you had to describe-and-hope instead of authoring precisely
- an output you produced that you cannot see well enough to check
Only name things you actually hit while working. For each, tell me what you
would build if it were your call.
Turn the live failure into a spec a different agent can execute cold.
Write what we just found as a build spec for a SEPARATE builder agent that
has the repo but was not here for this conversation. It must be executable
with zero context from our session.
- The exact failure, reproduced, with the result.
- The root cause, grounded: what you ruled out, what you pinned.
- The fix at the right layer, with file/function anchors if you know them.
- How to verify it worked (the test that would have caught it).
- What NOT to do (the approaches that cannot work, so they aren't retried).
Keep it surgical. The builder should act without asking me a question.
So here's my ask, the same one I always end on. Go to nicheangle.com, point your agent at it, and run something real. If it gets stuck, or routes clumsily around something I built badly, that's the most valuable thing you can send me. It's the only way I know how to build now.
Cal
More field notes