Jul 01, 202610 min readBuyer guidesBy Alexander Vinokurov

How We Build AI Agents: Our AI Agent Development Process, From Feasibility to Production

How we design and build AI agents that survive production, not just the demo — feasibility proof, behavior prototypes, tested action maps, provider fallbacks, and an outcome-based model where you pay only when the agent works.

Most AI agent projects do not fail because the model cannot do the task. They fail because the agent was built for the demo — it dazzles in a controlled pitch and then falls apart on the messy inputs of production — or because the client paid upfront for something no one had proven would work. Our entire process is built to remove that risk before it reaches you.

This is how we design and build AI agents, from the first feasibility call to a version running in production: what we check before committing, how we prove behavior early, how we make an agent reliable instead of merely impressive, and why our payment model means you never pay for an agent that does not work.

We de-risk feasibility first

Not every AI project needs a research phase. Some are obviously safe: a landing page to sell something, or an app whose core is a clean connection to an API or a database — we can scope those directly from experience. But when the task is more serious — an agent that has to reason, act, and be trusted with real consequences — we start with a fast proof-of-concept to confirm the model can actually do it reliably. We would rather find the hard part in week one than promise a scope we cannot deliver.

We prototype the behavior before building it

Before committing to a full build, we get the agent behaving on real examples — the actual inputs it will face, not a curated happy path. A behavior prototype turns an abstract idea into something you can watch work (or watch fail) early, while changing direction is still cheap. It is the fastest way to agree on what the agent should actually do before engineering time is spent making it robust.

Not all agents are equal — mapping actions and tests

Some agents are simple: the actions they can take are obvious, and the work is mostly wiring them up cleanly. Others are not. When an agent has many possible actions with real dependencies between them, we map that space explicitly — a matrix of what the agent can do, under what conditions, and what each action depends on — and then cover it with tests grouped by use case. This is the difference between an agent that works in a demo and one that holds up across the long tail of real usage. Plenty of AI shops prompt-and-pray; the test coverage is where reliability actually comes from.

Reliability: built to survive a provider outage

An agent that depends on a single LLM provider inherits that provider's downtime. We build agents so that if one provider's API goes down or degrades, the agent keeps working — the request routes to an alternative instead of failing. Reliability like this is invisible when everything is fine and decisive on the day a provider has an outage, which at any real scale it eventually will.

Where the agent lives

The right home for an agent depends on the product, so we build across surfaces rather than forcing everything into one. We have shipped agents as standalone services (an autonomous trading bot, client under NDA, with strategy logic in Python and Rust), as native iOS apps, as voice assistants (a port of a GPT model into a Meta AI surface), and as Telegram bots — from a personal assistant with long-term memory to lighter conversational products. The stack follows the job: a Node or Python web service, a mobile app, a standalone worker, chat, or voice.

You don't pay for an agent that doesn't work

This is the core of how we remove risk. Our work is outcome-based and split into phases: there is no upfront payment, and you pay at the end of a phase only when you are satisfied it works. For agents specifically, this pairs naturally with sandbox validation — the same way our trading bot ran every strategy in a paper mode with no real money before a single deliberate click promoted it to live trading. You watch the agent prove itself before it touches anything that matters, and before you pay. You can read more about how our outcome-based process works.

Design matters as much as the model

A reliable agent that users do not trust still fails — they double-check its work, undo it, or quietly stop using it. So building the agent is only half of it; the other half is designing how much control the user hands over and making that safe. We treat that as a first-class part of the work, not an afterthought. If you want the design side in depth, we wrote a field guide on designing AI agents users trust.

How we approach agent projects

AI agent work at 99 Francs is founder-led and run by a small senior team, which is what makes this pace realistic. We scope the feasibility, prove the behavior, build for reliability, and validate before production — and you pay per outcome along the way. If you are building an AI agent or assistant, or shaping an AI product more broadly, that is the work: making something that survives contact with real users.

FAQ

Frequently asked questions.

We de-risk feasibility (often with a quick proof-of-concept), prototype the behavior on real examples, map the agent's actions and dependencies, cover them with grouped use-case tests, build on LLM providers with fallbacks, and validate in a sandbox before production. The work is phased and outcome-based, so you pay only when a phase works.

It depends on complexity and feasibility. Simple agents with obvious actions move fast; agents with many interdependent actions need action mapping and test coverage, which takes longer. Because the work is split into phases, you see and approve progress at each step instead of waiting for one big delivery.

You do not pay for it. There is no upfront payment, and you pay at the end of a phase only when you are satisfied it works. A feasibility proof-of-concept at the start surfaces the hard parts early, before scope and budget are committed.

We build on LLM providers with fallbacks so one provider's outage does not stop the agent, and deploy wherever the product needs it — a Node or Python web service, a native iOS app, a standalone service, Telegram, or voice. Heavier logic is built in Python or Rust where it fits.

We map the agent's possible actions and their dependencies, then cover them with tests grouped by use case, and validate the agent in a sandbox before it reaches production. Test coverage — not a better prompt alone — is what makes an agent reliable across real usage.

Both. We design how much control the user hands over and how the agent earns trust, and we build the agent to production across web, mobile, standalone services, chat, and voice.

Work with 99 Francs

Need this done instead of just read about it?

99 Francs is a subscription-based design studio: one flat monthly rate, unlimited requests, first delivery in 1–2 days. Start with pricing or book a free intro call.

See pricing→Book an intro call

How We Build AI Agents: Our AI Agent Development Process, From Feasibility to Production

We de-risk feasibility first

We prototype the behavior before building it

Not all agents are equal — mapping actions and tests

Reliability: built to survive a provider outage

Where the agent lives

You don't pay for an agent that doesn't work

Design matters as much as the model

How we approach agent projects

Frequently asked questions.

Need this done instead of just read about it?

How to Design AI Agents Users Trust: UX Patterns for Control, Confidence, and Recovery

How We Build Startup MVPs With No Upfront Payment — Our Outcome-Based, Phase-by-Phase Process

Best Design Subscription for an MVP: What Founders Should Look For

More articles by themes

How We Build AI Agents: Our AI Agent Development Process, From Feasibility to Production

We de-risk feasibility first

We prototype the behavior before building it

Not all agents are equal — mapping actions and tests

Reliability: built to survive a provider outage

Where the agent lives

You don't pay for an agent that doesn't work

Design matters as much as the model

How we approach agent projects

Frequently asked questions.

Need this done instead of just read about it?

More on the same themes.

How to Design AI Agents Users Trust: UX Patterns for Control, Confidence, and Recovery

How We Build Startup MVPs With No Upfront Payment — Our Outcome-Based, Phase-by-Phase Process

Best Design Subscription for an MVP: What Founders Should Look For

More articles by themes