← Will W.

Reviewing agent sessions?

A lot of people are drowning in reviews these days. We don't quite trust AI output, so we rely on human approval to sign off on the artifacts (code/PRs) that agents make. Unfortunately, the rate at which artifacts are produced exceeds the rate at which humans can review, and we need some better tooling or perspective here.

I've been wondering recently if it'd be worth reviewing agent sessions alongside (or maybe even instead of) the artifacts they make. By agent session, I literally mean the chat log with Claude that led up to the PR being shared.

That sounds like twice the review work, but I think it might actually speed things up. If we can look into the process by which a thing is made, we can reach a shared understanding of said thing much faster. I'll be able to understand how the thing was iterated on, what might have gone wrong during the process and how it was corrected, what key decisions had to be made and where the important parts are.

Seeing the process also builds trust in the end artifact. If I see you've one-shotted a multi-thousand line file, I'm going to be a little more skeptical than if I see you've used a reasonable harness to build it (although, in either case, probably best not to shove multi-thousand line changes into people's face).

Reviewing sessions is akin, in some ways, to pair programming. When pairing, we can talk aloud, share context, and intertwine our production process live. We come away from it having shared understanding, one of the most important things we're trying to get from code review in the first place. Reviewing a session is like replaying a pairing session.

There's novelty here. Without agents, people generally don't record their working process. You work independently until it comes time to share an artifact. With agents, the working process is always recorded. Why not share it?