Download the Voice Intelligence whitepaper for GPT-powered contact centers here

OpenAI vs. Human vs. Voice AI: A Cost Comparison (5/5)

  • Xuchen Yao
  • Saturday, Oct 12, 2024
blog-image

This is a series of 5 articles exploring customer communication strategies for small businesses, focusing on answering services:

  1. Why Small Businesses Need an Answering Service?: Discover the importance and benefits of answering services.

  2. Outsourcing vs. In-house Live Receptionists: What is live receptionists? Should you outsource or hire in-house?

  3. Automated Phone Answering Systems (Interactive Voice Response IVR vs. Voice AI Agents): What is automated answering service? Should you use Interactive Voice Response or Voice AI agents?

  4. Decision: Should My Small Businesses Use Live Receptionists or Automated Answering Services?: You’ve learned all about answering services from our series. Now it’s time to decide which type of service is best for your business.

  5. (This Article) OpenAI vs. Human vs. Voice AI: A Cost Comparison: Wonder if you should switch to the latest voice AI technology? Let’s take a look at the real costs.


TLDR:

  1. Both OpenAI and humans can be expensive:
    1. OpenAI’s Realtime API can enable voice agent (AI) experience at about $1 per minute.
    2. On demand virtual receptionists (human) are also priced around $1 per minute.
  2. But there are balanced choices with caveats:
    1. When employed long-term, human agents with good English can be as low as $5 per hour ($0.08 per minute).
    2. Voice AI agents offered by startups can be as low as $7.2 per hour ($0.12 per minute).

If you prefer to listen to an audio version of this article, here is the video:

Real-World cost of ChatGPT-4o’s Realtime API

OpenAI released its Realtime API for ChatGPT-4o on October 1, 2024. This is 5 months after the release of GPT-4o, the first omnichannel large language model. The performance is stunning. Chatgpt-4o-realtime sounds like a human, responds like a human, and is robust against noises and interruptions.

However, is ChatGPT-4o-realtime affordable?

At first glance, OpenAI’s Realtime API appears about 30x more expensive than GPT-4o-mini in text ($5 vs. $0.15 / 1M input tokens).


October 2024 pricing for chatgpt-4o-mini

October 2024 pricing for chatgpt-4o-realtime


October 2024 pricing for chatgpt-4o-realtime

October 2024 pricing for chatgpt-4o-mini

OpenAI claims it costs roughly $0.06 per minute for audio input and $0.24 per minute for audio output. Adding these up suggests it shouldn’t exceed $0.30 per minute, right?

We conducted a real-world test of the 4o-realtime API and found it costs approximately $1 per minute.


Screenshot of the cost for one test of the ChatGPT-4o Realtime API

Screenshot of the cost for one test of the ChatGPT-4o Realtime API

We carried out a 5 minute voice conversation with the chatgpt-4o-realtime API, and found that it cost $5.38. The 5 minute voice conversation has about 142 seconds of transcribed audio (think of it as audio input), the rest is mostly audio output.

In another test we did, a simple conversation of 10 minutes cost about $10.

Yikes, that’s expensive. It’s actually about 10 times more expensive than Seasalt.ai’s own voice agents.

If some developers are just testing the API and trying to spend some meaningful effort to train a voice AI agent that actually does something, they can easily spend hundreds of dollars in a day!

ChatGPT-4o’s Realtime API vs. Human Agents – which is more affordable?

So if one built a voice AI agent using ChatGPT-4o’s Realtime API, it’ll cost about $1 per minute, or $60 an hour.

How much does a human agent cost?

If you use one in-house, such as a front desk receptionist, they might be anything between the minimum wage ($7.25 federal to $16 in California) to maybe $20 to $30 per hour.

If you use an outsourced agency, the price can vary: some start at $349/month for 200 minutes plus setup fee. Seasalt.ai has written a detailed survey around this: Cost of Live Receptionists: In-house vs. Outsourcing.


Live Receptionist Vendor Summary

Live Receptionist Vendor Summary by Seasalt.ai

ChatGPT-4o’s Realtime API vs. other Voice AI Agents – which is the difference?

ChatGPT-4o’s Realtime API represents a significant advancement in voice AI technology, offering several key differences compared to other voice AI agents:

  • Responsiveness: it provides near real-time interactions, with response times averaging 2 to 3 seconds
  • Robustness: The API enables interruptions and redirection during conversations, allowing for more natural dialogue flow
  • End to end: the API does not require duct taping different components together, such as speech to text (Azure, Deepgram, etc), and text to speech (Azure, Eleven labs).

But the caveat here is the cost: ChatGPT-4o’s Realtime API costs roughly $1 per minute, while other voice AI agents can be as low as $0.12 per minute.


Seasalt.ai vs. Bland AI vs. Smith.ai . Synthflow.ai vs. Retell AI vs. Slang AI vs. Gridspace for voice AI agents

Voice AI Agent Product Comparison by Seasalt.ai

There’s a 10 times difference in price, but is there a 10 times difference in performance? That’s for the client to judge.

Verdict

For business owners, there are basically 4 options:

  1. In-house human agents
  2. Outsource to a different company, either onshore or offshore
  3. Use an affordable voice AI agent
  4. Build with the most advanced/expensive OpenAI Realtime API

I summarized various pros and cons of different options below:

  • OpenAI Realtime API offers the fastest and most natural experience but requires technical expertise and is expensive.
  • Onshore On Demand Human Agents are good for basic tasks in perfect English but have limited integration.
  • Offshore Long Term Human Agents are the most affordable but can be unreliable due to infrastructure issues and high turnover.
  • Integrated Voice AI Agents offer a balance between cost, features, and ease of use, but may be slightly less responsive and have integration quirks.

Different Options for Phone Answer Services: human vs. OpenAI vs. voice AI startups

Different Options for Phone Answer Services: human vs. OpenAI vs. voice AI startups

As a practitioner in the field of speech recognition and natural language processing, my two cents are:

  1. Use the integrated voice AI agents on the market, like the one I proudly built with SeaChat. They are mature and affordable.
  2. Give another year to the OpenAI Realtime API for the guinea pigs to test it out, and hopefully the price will drop to a more affordable $10/hour, then it’ll become truly amazing. Watch out, human agents!

Learn More

If you’d like to first explore the AI voice technology for customer service at a reasonable price, you can visit SeaChat or you can book a demo with us.

About this Series

This is a series of 5 articles exploring customer communication strategies for small businesses, focusing on answering services: