Eleven tools, one prompt: the case for agent-first product design

Published: April 22, 2026

The single most important product decision we made was exposing the entire SwiftyClip pipeline as MCP tools on day one. Not in a later version. Not as a "pro" add-on. Day one. We now have eleven clip.* tools, and the design of each one has taught us something about what agent-first product design actually means in 2026.

What "agent-first" means, concretely

The cheap definition: "my product has an API." That's been true of most SaaS tools for fifteen years. What's new in 2026 is that the API surface is not for developers writing curl scripts — it's for agents writing prompts for each other. The asymmetry is real. Agents need tool names that read like intents, not implementation details. They need schemas that compose. They need error messages they can reason about.

Concretely, that means we named our tools clip.ingest, clip.analyze, clip.scoreSegments — verb-first, aligned with the pipeline stage. We didn't name them createProject, postTranscription, or any of the REST-flavored habits we could have defaulted to. Claude Code reads clip.ingest and knows what it does. The name is the documentation.

Eleven tools, not one

An early draft of the MCP server had three tools: ingest, do, and schedule. "Do" took a big enum and a project ID and ran whatever the next pipeline stage was. It was elegant to us. Agents hated it.

The problem: agents plan. They want to say "after transcribing, score segments; if the top scored segment is under 0.6, try again with different parameters." A single do tool means the agent has to treat the entire pipeline as a black box. Splitting into granular tools — one per stage — restores the agent's ability to reason about what each call accomplishes.

Here's the current roster, with the reason each one earned its own tool:

clip.ingest — imports a video and returns a project ID. Agents want a stable handle they can reference.
clip.listProjects — simple enumeration. Agents ask "what do I have to work with?" before planning.
clip.listSegments — after analysis, agents want to see candidates before choosing.
clip.queueStatus — polling batch state. Agents need this to not step on the user's active work.
clip.transcribe — isolated because sometimes agents only need the transcript.
clip.analyze — heavy computation. Agents want to kick it off and do other work.
clip.scoreSegments — separable from analyze because agents often re-score with different targetCount values.
clip.render — one at a time. Agents track per-segment success.
clip.exportToDesktop — convenience variant that skips picking a directory. Saves the agent a question.
clip.schedule — publishes to a platform. Separate because scheduling is the one action that reaches the public internet.
clip.registerWebhook — lets agents subscribe to events. Closes the feedback loop.

Eleven tools, not fifteen, not three. Each one exists because an agent wanted to reason about it independently.

What we'd design differently

A few things look obvious in retrospect:

First, we'd start with WebSocket instead of stdio. Stdio is wonderful for local CLI agents but excludes hosted agents that can't spawn subprocesses. We added WebSocket in v1.0.4 after realizing this, but if we were doing it over we'd ship both from the start.

Second, we'd expose richer project metadata earlier. The first draft of listProjects returned just IDs and titles. Agents immediately asked for duration, creation date, and source URL. If you don't surface metadata, agents have to make a second call for every project they list. That's wasted tokens.

Third, we'd namespace better. clip.* works today because we only have one domain. The moment we add a second (transcript library, asset manager, whatever), we'll wish we'd used video.clip.* / video.project.*. Cost of adding a second dot later: non-zero.

On-device + MCP is a category shift

The standard combo in 2026 is cloud-first product + MCP wrapper around the public API. That's fine. It works. It's also fundamentally constrained: the agent is another client of the same service, subject to the same per-call pricing, the same rate limits, the same data-policy concerns.

On-device + MCP is qualitatively different. The agent drives the user's own hardware. There's no per-call cost. There's no cloud upload. The privacy posture is "same as desktop app" instead of "same as SaaS." For creators with sensitive content — unreleased podcasts, NDA footage, legal depositions — this is the only workable model.

The economics also tilt hard in favor of on-device over the next 3-5 years. Apple Silicon unit performance improves faster than cloud GPU cost-per-token drops. We're betting on that curve. Our unit cost is $0 per clip and stays there. Cloud clippers can match our feature set but not our cost structure. See the cloud-clipper death march for the full argument.

Try it yourself

If you have Claude Code installed, this is the fastest path:

Install SwiftyClip from swiftyclip.com/download.
Add the MCP config.
Prompt: "Use swiftyclip to clip ~/Movies/podcast.mp4. Score and render the top 3 segments to ~/Desktop/."

The agent will chain six tool calls and drop three finished vertical clips on your desktop. You never opened the Mac app. That's agent-first product design in production.

More patterns: /agents/examples. Full tool reference: /docs/mcp. Machine-readable schemas: /api/mcp/schema + /api/mcp/openapi.json.