I gave Claude Design one short prompt. “Build a prototype for a tool called Agent Room — an observability dashboard for AI coding agents.” That was the entire input.
A few hours later I had a working, self-hosted app on GitHub that spawns real coding agents, streams their tool calls over websockets, and tracks token burn per session. No mocks. Real agents, real cost tracking, real bugs caught and fixed by other agents.
I recorded the whole thing if you’d rather watch it:
Below is the written version — what I tried, what worked, and the parts that are still rough.
The Idea (One Short Prompt)
The whole pitch was: a live grid of AI coding agents you can supervise from one place. Think htop, but for the fleet of Claude/GPT/etc. instances chewing on your repos.
I typed that into Claude Design and hit send. Done. That was the input.
Claude Design Asks Better Questions Than I Do
This is the part that surprised me. Claude Design didn’t just generate something — it ran a multi-step refinement before committing. It asked me things like:
- How many agents in a typical view? (10–50)
- Which signals matter most? (status, last action, cost, blockers)
- What actions should be available? (replay-a-step, approve/reject, steer)
- Aesthetic? (terminal/monospace)
- How “alive” should it feel?
- Tweaks to expose? (light/dark, accent color, density, live-update speed)
- Mock task types? (bug fix, feature build, refactor, test writing, etc.)
- Novelty mix? (conventional + novel)
- A metaphor — “treat the agent room as…” (I picked the room metaphor)
And critically: every question had a “decide for me” opt-out. That single design choice is more important than it sounds. Half of prototyping is admitting you don’t have a strong opinion yet. The model is happy to pick a default and let you react to it later.
This is also where I realized Claude Design is doing something other prototyping tools don’t: turning vague taste into structured decisions before the model spends a single token on output. The questions ARE the spec.
The Prototype Editor
Claude churned for a few minutes and dropped me into a fully interactive prototype. Multiple agent terminals streaming fake tool calls. A human-feedback queue. Filter to “failed.” A message input to steer agents. The tweaks panel — light/dark, accent color, density, live-update speed — all wired up.
Three editing tools live on top of the prototype:
Comment. I dictated through Super Whisper: “Please simplify the agent view so it’s not as busy and we can comprehend what’s going on when looking at many agents at one time.” Claude rewrote the components. This worked well — felt natural, agentic, fast.
Edit. A Figma-style HTML element panel. Click any element, get a properties pane, change fonts/spacing/colors directly. Designers and front-end engineers will love this. It’s the manual escape hatch when you don’t want to write a prompt for a one-pixel change.
Draw. A lasso tool — circle a region, type instructions. I tried it on some shared-file tags I found distracting. Claude removed my entire terminal pane and reverted to a busier layout. Not great. This is the only feature I’d avoid for now. It’s a research preview and feels like one.
Net of the three: comment is excellent, edit is excellent, draw is shaky. Stick with comment + edit and you’ll have a great time.
Exporting to Claude Code
The “hand off to Claude Code” button generates a copy-paste command (or a zip download, if you’d rather). The command points your local agent at a share URL, pulls a readme, and unpacks the design files into your repo.
I dropped them into a prototype/ folder under a fresh agent_room directory in VS Code and asked Claude Code to plan the real build.
For the engine, I picked pi.dev — a minimal terminal coding harness with extensions, skills, themes, and packages. Wanted something light and self-contained for a developer-machine app, not a full SaaS framework.
Plan → Act → Iterate
Claude Code ran a planning session before writing any code. Decisions:
- Distribution: self-host. Each user runs it locally, no shared infra.
- Execution env: git worktrees, so concurrent agents don’t stomp on each other’s working tree.
- API keys: bring-your-own.
- Scope: standalone executable for developer machines. Implement the whole thing in one pass.
Going from prototype to functioning product is still a hurdle. Claude Design’s output is opinionated about layout but isn’t aware of pi-coding-agent’s data model, so the plan has to bridge them. Some of the prototype’s components didn’t map cleanly to real data shapes and had to be rewritten.
It’s not push-button. But the plan was solid, and once I approved it Claude Code executed end-to-end: server, UI, build, CLI.
(Side note: my Claude Code TUI keeps glitching out into garbled rendering after running for a while. If you’ve solved this, please tell me. Different terminals, different machines, same issue.)
The Working App
npm run build, fire up the CLI, open the room. Empty grid waiting for agents. The tweaks panel from the prototype made it through the migration — green accent, comfy density, dark mode. That continuity from sketch to product is genuinely satisfying.
I configured the Anthropic provider, pointed an agent at the agent_room repo itself, and asked it to write unit tests. It generated 62 tests across 6 files — and incidentally surfaced a real bug in the model pricing logic.
So I fed the bug back as a fix task and watched a second agent pick it up. Websocket stream into the UI: workspace path, files read, files edited, tool calls, duration, cost, burn rate. Total tokens, files touched, calls across the fleet up top.
The fix was a substring-collision bug: pricing keys weren’t sorted longest-first, so gpt-4o-mini lookups were getting matched against gpt-4o and miscalculating cost. Sort longer keys first, update GPT-4o pricing to current, validate, all green. Same agent run also fixed a stuck “agent done” callback I’d been meaning to look at.
That’s the loop. Idea → Claude Design questions → interactive prototype → export → Claude Code plan → working app → spawn agents → catch bugs → fix bugs with more agents → ship.
What I’d Tell an Enterprise Team
A few things worth internalizing if you’re considering this kind of workflow:
- The questions are the spec. Claude Design’s refinement step is where the value is — not the rendered output. Treat it like a structured intake.
- “Decide for me” is a real advantage. Defaults you can react to beat decisions you have to make in a vacuum — and most prototyping tools don’t give you that opt-out at all.
- Comment + edit good. Draw skip. Until the lasso tool stabilizes, don’t bet on it.
- Prototype-to-real-app is still work. The plan + execute step in Claude Code is where most of the actual engineering happens. Budget for it.
- Spawn agents to test agents. Once the app was running, the fastest way to harden it was to point another coding agent at it. 62 tests + a real bug found, in one shot.
Try It Yourself
Full code is on GitHub if you want to clone Agent Room and run your own fleet. Video walkthrough is here on YouTube if seeing it move helps.
From one sentence to a real, self-hosted app in an afternoon is now actually true. The seams are still visible — but the seams are a lot smaller than they were a year ago, and they’re closing fast.