Intro 01 - cx is ai friendly 02 - chatbots to agents 03 - inside modern cx 04 - platform vs custom 05 - larger orgs 06 - Product lessons 07 - 90 Day playbook Conclusion

Outline

Andrew Xia

Dir. of Engineering

Share on Twitter

Share on LinkedIn

Building Real AI CX Agents: What Works, What Doesn’t, and When to Build Your Own

A few months ago, we watched a support team at a large consumer brand cross an interesting line.

Their AI agents were resolving just over half of their tier‑1 tickets across chat and email. Deflection sat around 55 percent. Cost per resolution on those intents had dropped by more than 70 percent. CSAT on AI‑handled interactions was slightly higher than the human baseline. Voice was coming next.

This is not a demo. It is a production system, wired into real billing, order management, and identity systems, answering customers all day.

CX is not the only place AI is working, but it is the place where we see this pattern most often right now. The work is structured. The stakes are clear. The metrics are unforgiving. That makes it a good proving ground.

What follows is what we have learned building and deploying CX agents across chat, email, and voice, both for CX platforms and for enterprises that consider support part of the product.

Why CX is structurally friendly to AI

If you strip away the hype, CX has a few properties that make it show ROI before a lot of other AI bets.

‍

1. The work is SOP‑driven.

A large share of inbound volume is some variation of the same workflows: check an order, adjust a subscription, process a refund, move a booking, update an address. Humans today are following documented procedures or tribal knowledge. That is exactly the kind of work you can encode as repeatable “steps” for an agent.

2. The systems already exist.

CX teams live in CRMs, helpdesks, billing systems, logistics portals, internal tools. There is a source of truth to read from and write to. You do not have to invent new data models to let an agent act.

3. Success is measurable.

You can look at deflection, first response time, average handle time, first contact resolution, CSAT, cost per resolution. You can measure these per intent, not just in aggregate. That makes it easy for a CFO to decide if this is working.

4. The blast radius is bounded.

A bad CX response is painful, but it is not the same as an AI mis‑allocating capital or pushing the wrong production deployment. Most teams can afford to start here, learn, and then move into riskier domains.

There are limits.

‍

Tax and compliance flows, high‑emotion complaints, and edge cases that require judgment do not deflect at the same rates as order status or plan changes. In the data, you consistently see simple intents in the 60–80 percent deflection range and complex, sentiment‑heavy ones below 30 percent. Programs that pretend otherwise usually have fuzzy definitions of “resolved” or quietly route the hard stuff away from the agent.

‍

The implication for a CTO or CEO is simple: CX is a good first place to prove out agents, but only if you scope them to the right slices of work.

From chatbots to agents that do real work

Most leaders have been burned at least once by “AI support” that was just a smarter FAQ.

The difference now is not the marketing. It is that the agent can actually act.

In the deployments we see working, an agent typically does three things well:

1. Takes actions, not just answers questions.

The agent can call tools that:

Look up orders, subscriptions, or policies
Issue refunds or credits under clear rules
Book, reschedule, or cancel appointments
Update profile data
Trigger internal workflows or handoffs

That is the core shift from chatbots to agents. The system is part of the workflow, not a side channel.

2. Maintains memory across channels and time.

Customers start in chat, follow up over email, then call. Modern platforms keep a conversation and identity graph so the agent knows this is the same person and does not restart from zero. That alone can move CSAT.

3. Runs as one “brain” across chat, email, and voice.

A single policy surface (what the agent is allowed to do) drives behavior across channels. Voice is no longer a separate IVR tree. It is just another place the same agent logic shows up.

‍

Enterprise CX platforms are building this as their core product. Specialized voice platforms are doing the same thing focused on phones. The point is not which brand you pick. The point is that the successful ones all converge on these properties.

Inside a modern CX agent, in business language

You do not need to be deep in LangGraph or orchestration libraries to reason about how a CX agent works. At the level a CTO or CEO should care about, there are a handful of moving parts.

1. Systems of record and tools

The agent needs a narrow, explicit set of things it can do:

Read: customer record, orders, invoices, entitlements, past tickets
Write: create ticket, update address, issue refund, book slot, log note

This is your integration surface. In practice, it is a list of internal and external APIs with clear contracts and guardrails.

‍

2. SOPs as the product surface

Every meaningful intent has a procedure behind it. For example:

“If the customer requests a refund within 30 days and the item is not in the excluded list, issue a refund up to $X. Otherwise escalate.”
“If the call is after hours and the issue matches categories A/B/C, book the earliest available slot within the next 48 hours with an on‑call tech.”

CX AI platforms talk about “Agent Operating Procedures” or similar. Underneath, this is where your playbooks live. It is also where most of the product work is on a CX agent: getting real SOPs into a form the agent can follow and the business can change.

‍

3. Tone and brand voice

If your vans are wrapped and your website looks like 2026, your agent cannot sound like a 2005 IVR. In the voice deployments we see succeed, someone owns:

Baseline voice and tone for the brand
How that changes in sensitive flows (refund, outage, compliance)
Concrete examples of “what we would never say”

This is not fluff. It is part of adoption. One founder we work with talks about the “voice sommelier” role inside their CX team: someone who tunes the agent so it actually sounds like them.

‍

4. Confidence bands and safe fallbacks

The agent needs to know when it is guessing.

Practically, this means:

Scoring how confident it is that it understood the intent
Scoring how confident it is that a proposed action is valid
Routing low‑confidence cases to humans or constrained flows

The details vary, but the product pattern is stable: let the agent be bold inside a safe box, and conservative at the edges.

‍

5. Evals, not vibes

The deployments that stick build an evaluation harness up front:

A test set of real conversations and expected outcomes
Deterministic checks for tool use (did the agent call the right thing, with safe parameters)
An “LLM as judge” layer that scores quality, tone, and policy adherence
Automatic promotion of edge cases from production into the eval set

This is unglamorous work. It is also where most of the risk sits. If you expose tools that touch money, identity, or policy, you should not ship an agent without an eval harness that matches your risk appetite.

‍

6. Human in the loop as a first‑class design choice

The point is not to remove humans. The point is to shift them.

In good systems, you see:

Clear escalation paths for high‑risk intents
“Shadow” periods where humans watch agent suggestions before allowing auto‑action
Agent assist views that tee up drafts for humans to approve
Feedback channels for humans to correct the agent in context

The best implementations do not hide this from customers. They say when they are escalating, and why.

‍

If you cannot point to these pieces in your plan, you do not have an agent. You have a chatbot with aspirations.

Platforms vs custom: three patterns that actually work

In real conversations with CTOs and CEOs, the build–buy question comes up fast. It is not a religious argument. It is a sequencing and control question.

We see three patterns work.

1. Adopt a CX AI platform when you want speed

If you have:

Standard helpdesk and CRM tooling
A large volume of repetitive inbound
Limited internal AI capacity

then using a mature CX AI platform is the most direct path to value.

You get:

A prebuilt orchestration layer
A library of tools for common systems (Zendesk, Salesforce, billing, logistics)
A UI for CX operators to define procedures and tone
Built‑in testing, analytics, and QA

You give up some control over low‑level behavior and roadmap, but you gain speed. This is the model behind the big names in enterprise CX AI. It is also what they need to justify per‑conversation or per‑resolution pricing.

‍

2. Plug in a voice‑first platform when phones are the pain

In home services, transportation, healthcare, and a lot of B2C, phones are still king. Missed calls are missed revenue.

Voice‑first platforms exist purely to solve this problem:

Answer every call within two rings
Qualify the customer
Book or reschedule directly into a field‑service or CRM system
Escalate to a human when something looks risky or unusual

They often layer outbound campaigns and coaching for human CSRs on top.

If your call center is the bottleneck, starting here is rational. It lets you get 24/7 coverage and higher booking rates without rebuilding your entire CX stack.

‍

3. Build your own agent layer when support is the product

Sometimes the platform is not the hard part. The way support works is the differentiator.

We see this at scaled SaaS and fintech companies where:

Cases are complex and domain‑specific
Wrong answers have real financial or legal cost
The company already operates its own internal support tools
CX is framed as a core part of the product, not a cost center

In that scenario, teams often build their own agent layer:

An orchestration engine (LangGraph or similar)
Internal tools exposed over a narrow interface
A homegrown eval harness
Deep integration with existing routing, SLAs, and reporting

The build–buy call here is not “platform or nothing.” It is “where do we buy commodity building blocks, and where do we invest in proprietary logic.”

‍

In one recent conversation with a scaled B2B platform company, for example, the question was not “Should we buy an off‑the‑shelf CX agent.” The question was “What is the cycle time per case type if we keep arming our own agent, and is there anything the platforms have that we cannot replicate.”

‍

That is the right frame if you already see support as the product.

How larger orgs are actually thinking about this

There is a pattern in how scaled organizations talk about AI in CX once you get past slideware.

A few themes that keep coming up:

‍

1. Support is not a pure cost center.

In payroll, healthcare, financial services, and HR, the ability to say “we will handle this for you” is half the product. Leaders in these spaces are allergic to blunt “let’s deflect everything” narratives. They care about churn, NPS, and trust.

2. Build vs buy is about secret sauce.

They are happy to buy plumbing and generic tooling. They are not happy to outsource case logic that encodes their hard‑won domain knowledge.

3. Success is cycle time per case type.

They care less about gross deflection numbers and more about:

How long it takes to arm a new case type
How fast they can change policy in the agent
How quickly production issues show up in evals

4. Their worries are concrete.

When they hesitate, it is usually around:

Safety and correctness on tax, compliance, or regulated flows
Whether their evals are mature enough to catch regressions
Brand tone and the risk of a “generic AI” voice
Whether they can realistically staff the internal platform work

‍

This is where an external team that has shipped CX agents across multiple platforms can add real value. Not by pushing a platform, but by walking through what actually worked, what failed, and what that implies for their context.

Product lessons from the field

Across CX AI platform work and direct enterprise deployments, the same product patterns show up.

‍

A few that matter.

1. Map the real workflow, not the wiki

The documented process and what agents actually do are rarely the same.

Before we scope an agent, we now insist on a working session where frontline agents literally draw how they handle a case today. Who they ping. Which screens they ignore. What they do when the system is wrong.

That is where the value and the sharp edges live.

‍

2. Treat tone and voice as design, not decoration

If you do not define tone, the model will improvise. Sometimes it will be right. Sometimes it will sound like a generic assistant talking to someone who is furious about a tax notice.

The teams that get this right:

Write explicit voice guides, the way marketing teams do
Keep separate patterns for sensitive flows
Listen to transcripts regularly and tune

This is one of the places where CX leaders, not engineers, should own the spec.

‍

3. Keep humans in the loop where the cost of error is high

There is a strong temptation to “flip to auto” once an agent looks good on a few cases. That is how you get quiet failures.

A safer pattern:

Start with agent suggestions only, humans always approve
Move to auto‑action only for intents where you have high‑quality evals and low blast radius
Keep a standing rule that certain categories never auto‑action (for example, multi‑period tax adjustments, large money movements, dispute resolutions)

‍

4. Use confidence bands, not a single threshold

Most systems support some notion of confidence. The naive approach is to pick a number and treat everything above it as safe.

A better approach is to have bands:

High: auto‑action allowed within narrow tool constraints
Medium: draft only, human must approve
Low: escalate or ask a clarifying question

You can tune these per intent based on observed behavior and risk tolerance.

‍

5. Measure resolution, not just deflection

Deflection is easy to inflate. You send someone a link and count the ticket as “handled.” That tells you very little about whether their problem is solved.

The better metric is whether the customer:

Achieved the intended outcome
Stayed satisfied
Avoided re‑contact on the same issue

That is harder to measure, but if you do not, you will optimize for the wrong thing.

‍

6. Accept that integrations are the bottleneck and the moat

Every CX AI company will tell you they have great models. Very few will tell you that most of their work is wrangling CRMs, billing systems, logistics platforms, identity providers, and bespoke internal tools.

In practice:

Integrations take one to six months per enterprise
They are where deals stall
They are also what makes a deployment hard to rip out once it works

For a buyer, this is a warning and an opportunity. If you want real value, you need to budget for integration and pick partners who have actually done this work before.

A 90‑day playbook for a first production CX agent

A lot of AI CX projects die because they try to boil the ocean. The teams that ship pick a narrow wedge and go deep.

One pattern we see work in about three months:

1. Weeks 1–2: pick one intent, one channel, and two systems

Choose a single, high‑volume, low‑risk intent (for example, order status, basic plan change, simple booking).
Start in one channel (chat or in‑app messaging is usually easiest).
Limit integrations to the systems you absolutely need (often CRM plus order or scheduling).

Define what “good” looks like numerically before you start.

‍

2. Weeks 2–4: build the first version and the eval harness together

Implement the minimal tool surface the agent needs.
Encode the real SOP, not the wishful one.
Stand up an eval set with real transcripts and expected outcomes.
Decide the first confidence bands and escalation rules.

Run the agent in a staging environment against historical conversations. Fix the obvious issues there.

‍

3. Weeks 4–6: shadow production with humans in the loop.

Turn the agent on for a slice of real traffic, but keep humans approving or editing every action.
Capture where humans override or correct the agent.
Promote those cases into the eval set.

This is where you will find gaps in your SOPs, tools, and data.

‍

4. Weeks 6–8: partial auto‑action with tight constraints

For high‑confidence, low‑risk cases, allow the agent to act without human approval.
Keep humans fully in the loop on medium and low confidence.
Monitor resolution, re‑contact, and CSAT for this intent daily.

If something drifts, roll back fast. You have the evals to see why.

‍

5. Weeks 8–12: widen carefully

Only once the first intent is stable should you consider:

Adding a second intent
Adding a second channel
Relaxing confidence bands on the first intent

Each expansion should run through the same sequence: define, integrate, eval, shadow, partial auto, then broaden.

‍

At the end of this, you have more than a demo. You have:

A production agent handling a real slice of volume
An eval harness tied to outcomes
A working pattern for how your org builds and operates agents

‍

From there, you can decide whether to keep expanding, to bring in a platform, or to invest in a deeper custom stack.

Conclusion

How Modern CX Agents are Built

If there is one through‑line across the CX work we do at Lazer, it is that the model is rarely the bottleneck. The hard problems are almost always product problems: which workflows to encode, how to express them as procedures, where to keep humans in the loop, how to measure success, and how to plug into systems you already run.

Get those right, and CX is one of the cleanest places to make AI real.

lazer technologies

Need support?

If you're moving from demo to distribution with agentic commerce, we can help.

Let's talk

Our work

Build with experts

Get the right product shipped

Scroll back to the top

Building Real AI CX Agents: What Works, What Doesn’t, and When to Build Your Own

Building Real AI CX Agents: What Works, What Doesn’t, and When to Build Your Own

Why CX is structurally friendly to AI

1. The work is SOP‑driven.

2. The systems already exist.

3. Success is measurable.

4. The blast radius is bounded.

From chatbots to agents that do real work

1. Takes actions, not just answers questions.

2. Maintains memory across channels and time.

3. Runs as one “brain” across chat, email, and voice.

Inside a modern CX agent, in business language

1. Systems of record and tools

2. SOPs as the product surface

3. Tone and brand voice

4. Confidence bands and safe fallbacks

5. Evals, not vibes

6. Human in the loop as a first‑class design choice

Platforms vs custom: three patterns that actually work

1. Adopt a CX AI platform when you want speed

2. Plug in a voice‑first platform when phones are the pain

3. Build your own agent layer when support is the product

In that scenario, teams often build their own agent layer:

How larger orgs are actually thinking about this

A few themes that keep coming up:

1. Support is not a pure cost center.

2. Build vs buy is about secret sauce.

3. Success is cycle time per case type.

4. Their worries are concrete.

Product lessons from the field

1. Map the real workflow, not the wiki

2. Treat tone and voice as design, not decoration

3. Keep humans in the loop where the cost of error is high

4. Use confidence bands, not a single threshold

5. Measure resolution, not just deflection

6. Accept that integrations are the bottleneck and the moat

A 90‑day playbook for a first production CX agent

1. Weeks 1–2: pick one intent, one channel, and two systems

2. Weeks 2–4: build the first version and the eval harness together

3. Weeks 4–6: shadow production with humans in the loop.

4. Weeks 6–8: partial auto‑action with tight constraints

5. Weeks 8–12: widen carefully

At the end of this, you have more than a demo. You have:

How Modern CX Agents are Built

Need support?

Build with experts

Let's Talk

Thank you.