Building Real AI CX Agents: What Works, What Doesn’t, and When to Build Your Own
A few months ago, we watched a support team at a large consumer brand cross an interesting line.
Their AI agents were resolving just over half of their tier‑1 tickets across chat and email. Deflection sat around 55 percent. Cost per resolution on those intents had dropped by more than 70 percent. CSAT on AI‑handled interactions was slightly higher than the human baseline. Voice was coming next.
This is not a demo. It is a production system, wired into real billing, order management, and identity systems, answering customers all day.
CX is not the only place AI is working, but it is the place where we see this pattern most often right now. The work is structured. The stakes are clear. The metrics are unforgiving. That makes it a good proving ground.
What follows is what we have learned building and deploying CX agents across chat, email, and voice, both for CX platforms and for enterprises that consider support part of the product.

Why CX is structurally friendly to AI
If you strip away the hype, CX has a few properties that make it show ROI before a lot of other AI bets.
1. The work is SOP‑driven.
A large share of inbound volume is some variation of the same workflows: check an order, adjust a subscription, process a refund, move a booking, update an address. Humans today are following documented procedures or tribal knowledge. That is exactly the kind of work you can encode as repeatable “steps” for an agent.
2. The systems already exist.
CX teams live in CRMs, helpdesks, billing systems, logistics portals, internal tools. There is a source of truth to read from and write to. You do not have to invent new data models to let an agent act.
3. Success is measurable.
You can look at deflection, first response time, average handle time, first contact resolution, CSAT, cost per resolution. You can measure these per intent, not just in aggregate. That makes it easy for a CFO to decide if this is working.
4. The blast radius is bounded.
A bad CX response is painful, but it is not the same as an AI mis‑allocating capital or pushing the wrong production deployment. Most teams can afford to start here, learn, and then move into riskier domains.
There are limits.
Tax and compliance flows, high‑emotion complaints, and edge cases that require judgment do not deflect at the same rates as order status or plan changes. In the data, you consistently see simple intents in the 60–80 percent deflection range and complex, sentiment‑heavy ones below 30 percent. Programs that pretend otherwise usually have fuzzy definitions of “resolved” or quietly route the hard stuff away from the agent.
The implication for a CTO or CEO is simple: CX is a good first place to prove out agents, but only if you scope them to the right slices of work.
From chatbots to agents that do real work
Most leaders have been burned at least once by “AI support” that was just a smarter FAQ.
The difference now is not the marketing. It is that the agent can actually act.
In the deployments we see working, an agent typically does three things well:
1. Takes actions, not just answers questions.
The agent can call tools that:
- Look up orders, subscriptions, or policies
- Issue refunds or credits under clear rules
- Book, reschedule, or cancel appointments
- Update profile data
- Trigger internal workflows or handoffs
That is the core shift from chatbots to agents. The system is part of the workflow, not a side channel.
2. Maintains memory across channels and time.
Customers start in chat, follow up over email, then call. Modern platforms keep a conversation and identity graph so the agent knows this is the same person and does not restart from zero. That alone can move CSAT.
3. Runs as one “brain” across chat, email, and voice.
A single policy surface (what the agent is allowed to do) drives behavior across channels. Voice is no longer a separate IVR tree. It is just another place the same agent logic shows up.
Enterprise CX platforms are building this as their core product. Specialized voice platforms are doing the same thing focused on phones. The point is not which brand you pick. The point is that the successful ones all converge on these properties.
Inside a modern CX agent, in business language
You do not need to be deep in LangGraph or orchestration libraries to reason about how a CX agent works. At the level a CTO or CEO should care about, there are a handful of moving parts.
1. Systems of record and tools
The agent needs a narrow, explicit set of things it can do:
- Read: customer record, orders, invoices, entitlements, past tickets
- Write: create ticket, update address, issue refund, book slot, log note
This is your integration surface. In practice, it is a list of internal and external APIs with clear contracts and guardrails.
2. SOPs as the product surface
Every meaningful intent has a procedure behind it. For example:
- “If the customer requests a refund within 30 days and the item is not in the excluded list, issue a refund up to $X. Otherwise escalate.”
- “If the call is after hours and the issue matches categories A/B/C, book the earliest available slot within the next 48 hours with an on‑call tech.”
CX AI platforms talk about “Agent Operating Procedures” or similar. Underneath, this is where your playbooks live. It is also where most of the product work is on a CX agent: getting real SOPs into a form the agent can follow and the business can change.
3. Tone and brand voice
If your vans are wrapped and your website looks like 2026, your agent cannot sound like a 2005 IVR. In the voice deployments we see succeed, someone owns:
- Baseline voice and tone for the brand
- How that changes in sensitive flows (refund, outage, compliance)
- Concrete examples of “what we would never say”
This is not fluff. It is part of adoption. One founder we work with talks about the “voice sommelier” role inside their CX team: someone who tunes the agent so it actually sounds like them.
4. Confidence bands and safe fallbacks
The agent needs to know when it is guessing.
Practically, this means:
- Scoring how confident it is that it understood the intent
- Scoring how confident it is that a proposed action is valid
- Routing low‑confidence cases to humans or constrained flows
The details vary, but the product pattern is stable: let the agent be bold inside a safe box, and conservative at the edges.
5. Evals, not vibes
The deployments that stick build an evaluation harness up front:
- A test set of real conversations and expected outcomes
- Deterministic checks for tool use (did the agent call the right thing, with safe parameters)
- An “LLM as judge” layer that scores quality, tone, and policy adherence
- Automatic promotion of edge cases from production into the eval set
This is unglamorous work. It is also where most of the risk sits. If you expose tools that touch money, identity, or policy, you should not ship an agent without an eval harness that matches your risk appetite.
6. Human in the loop as a first‑class design choice
The point is not to remove humans. The point is to shift them.
In good systems, you see:
- Clear escalation paths for high‑risk intents
- “Shadow” periods where humans watch agent suggestions before allowing auto‑action
- Agent assist views that tee up drafts for humans to approve
- Feedback channels for humans to correct the agent in context
The best implementations do not hide this from customers. They say when they are escalating, and why.
If you cannot point to these pieces in your plan, you do not have an agent. You have a chatbot with aspirations.
Platforms vs custom: three patterns that actually work
In real conversations with CTOs and CEOs, the build–buy question comes up fast. It is not a religious argument. It is a sequencing and control question.
We see three patterns work.
1. Adopt a CX AI platform when you want speed
If you have:
- Standard helpdesk and CRM tooling
- A large volume of repetitive inbound
- Limited internal AI capacity
then using a mature CX AI platform is the most direct path to value.
You get:
- A prebuilt orchestration layer
- A library of tools for common systems (Zendesk, Salesforce, billing, logistics)
- A UI for CX operators to define procedures and tone
- Built‑in testing, analytics, and QA
You give up some control over low‑level behavior and roadmap, but you gain speed. This is the model behind the big names in enterprise CX AI. It is also what they need to justify per‑conversation or per‑resolution pricing.
2. Plug in a voice‑first platform when phones are the pain
In home services, transportation, healthcare, and a lot of B2C, phones are still king. Missed calls are missed revenue.
Voice‑first platforms exist purely to solve this problem:
- Answer every call within two rings
- Qualify the customer
- Book or reschedule directly into a field‑service or CRM system
- Escalate to a human when something looks risky or unusual
They often layer outbound campaigns and coaching for human CSRs on top.
If your call center is the bottleneck, starting here is rational. It lets you get 24/7 coverage and higher booking rates without rebuilding your entire CX stack.
3. Build your own agent layer when support is the product
Sometimes the platform is not the hard part. The way support works is the differentiator.
We see this at scaled SaaS and fintech companies where:
- Cases are complex and domain‑specific
- Wrong answers have real financial or legal cost
- The company already operates its own internal support tools
- CX is framed as a core part of the product, not a cost center
In that scenario, teams often build their own agent layer:
- An orchestration engine (LangGraph or similar)
- Internal tools exposed over a narrow interface
- A homegrown eval harness
- Deep integration with existing routing, SLAs, and reporting
The build–buy call here is not “platform or nothing.” It is “where do we buy commodity building blocks, and where do we invest in proprietary logic.”
In one recent conversation with a scaled B2B platform company, for example, the question was not “Should we buy an off‑the‑shelf CX agent.” The question was “What is the cycle time per case type if we keep arming our own agent, and is there anything the platforms have that we cannot replicate.”
That is the right frame if you already see support as the product.
How larger orgs are actually thinking about this
There is a pattern in how scaled organizations talk about AI in CX once you get past slideware.
A few themes that keep coming up:
1. Support is not a pure cost center.
In payroll, healthcare, financial services, and HR, the ability to say “we will handle this for you” is half the product. Leaders in these spaces are allergic to blunt “let’s deflect everything” narratives. They care about churn, NPS, and trust.
2. Build vs buy is about secret sauce.
They are happy to buy plumbing and generic tooling. They are not happy to outsource case logic that encodes their hard‑won domain knowledge.
3. Success is cycle time per case type.
They care less about gross deflection numbers and more about:
- How long it takes to arm a new case type
- How fast they can change policy in the agent
- How quickly production issues show up in evals
4. Their worries are concrete.
When they hesitate, it is usually around:
- Safety and correctness on tax, compliance, or regulated flows
- Whether their evals are mature enough to catch regressions
- Brand tone and the risk of a “generic AI” voice
- Whether they can realistically staff the internal platform work
This is where an external team that has shipped CX agents across multiple platforms can add real value. Not by pushing a platform, but by walking through what actually worked, what failed, and what that implies for their context.
Product lessons from the field
Across CX AI platform work and direct enterprise deployments, the same product patterns show up.
A few that matter.
1. Map the real workflow, not the wiki
The documented process and what agents actually do are rarely the same.
Before we scope an agent, we now insist on a working session where frontline agents literally draw how they handle a case today. Who they ping. Which screens they ignore. What they do when the system is wrong.
That is where the value and the sharp edges live.
2. Treat tone and voice as design, not decoration
If you do not define tone, the model will improvise. Sometimes it will be right. Sometimes it will sound like a generic assistant talking to someone who is furious about a tax notice.
The teams that get this right:
- Write explicit voice guides, the way marketing teams do
- Keep separate patterns for sensitive flows
- Listen to transcripts regularly and tune
This is one of the places where CX leaders, not engineers, should own the spec.
3. Keep humans in the loop where the cost of error is high
There is a strong temptation to “flip to auto” once an agent looks good on a few cases. That is how you get quiet failures.
A safer pattern:
- Start with agent suggestions only, humans always approve
- Move to auto‑action only for intents where you have high‑quality evals and low blast radius
- Keep a standing rule that certain categories never auto‑action (for example, multi‑period tax adjustments, large money movements, dispute resolutions)
4. Use confidence bands, not a single threshold
Most systems support some notion of confidence. The naive approach is to pick a number and treat everything above it as safe.
A better approach is to have bands:
- High: auto‑action allowed within narrow tool constraints
- Medium: draft only, human must approve
- Low: escalate or ask a clarifying question
You can tune these per intent based on observed behavior and risk tolerance.
5. Measure resolution, not just deflection
Deflection is easy to inflate. You send someone a link and count the ticket as “handled.” That tells you very little about whether their problem is solved.
The better metric is whether the customer:
- Achieved the intended outcome
- Stayed satisfied
- Avoided re‑contact on the same issue
That is harder to measure, but if you do not, you will optimize for the wrong thing.
6. Accept that integrations are the bottleneck and the moat
Every CX AI company will tell you they have great models. Very few will tell you that most of their work is wrangling CRMs, billing systems, logistics platforms, identity providers, and bespoke internal tools.
In practice:
- Integrations take one to six months per enterprise
- They are where deals stall
- They are also what makes a deployment hard to rip out once it works
For a buyer, this is a warning and an opportunity. If you want real value, you need to budget for integration and pick partners who have actually done this work before.
A 90‑day playbook for a first production CX agent
A lot of AI CX projects die because they try to boil the ocean. The teams that ship pick a narrow wedge and go deep.
One pattern we see work in about three months:
1. Weeks 1–2: pick one intent, one channel, and two systems
- Choose a single, high‑volume, low‑risk intent (for example, order status, basic plan change, simple booking).
- Start in one channel (chat or in‑app messaging is usually easiest).
- Limit integrations to the systems you absolutely need (often CRM plus order or scheduling).
Define what “good” looks like numerically before you start.
2. Weeks 2–4: build the first version and the eval harness together
- Implement the minimal tool surface the agent needs.
- Encode the real SOP, not the wishful one.
- Stand up an eval set with real transcripts and expected outcomes.
- Decide the first confidence bands and escalation rules.
Run the agent in a staging environment against historical conversations. Fix the obvious issues there.
3. Weeks 4–6: shadow production with humans in the loop.
- Turn the agent on for a slice of real traffic, but keep humans approving or editing every action.
- Capture where humans override or correct the agent.
- Promote those cases into the eval set.
This is where you will find gaps in your SOPs, tools, and data.
4. Weeks 6–8: partial auto‑action with tight constraints
- For high‑confidence, low‑risk cases, allow the agent to act without human approval.
- Keep humans fully in the loop on medium and low confidence.
- Monitor resolution, re‑contact, and CSAT for this intent daily.
If something drifts, roll back fast. You have the evals to see why.
5. Weeks 8–12: widen carefully
Only once the first intent is stable should you consider:
- Adding a second intent
- Adding a second channel
- Relaxing confidence bands on the first intent
Each expansion should run through the same sequence: define, integrate, eval, shadow, partial auto, then broaden.
At the end of this, you have more than a demo. You have:
- A production agent handling a real slice of volume
- An eval harness tied to outcomes
- A working pattern for how your org builds and operates agents
From there, you can decide whether to keep expanding, to bring in a platform, or to invest in a deeper custom stack.
How Modern CX Agents are Built
If there is one through‑line across the CX work we do at Lazer, it is that the model is rarely the bottleneck. The hard problems are almost always product problems: which workflows to encode, how to express them as procedures, where to keep humans in the loop, how to measure success, and how to plug into systems you already run.
Get those right, and CX is one of the cleanest places to make AI real.




