Tom's Guide

Moshi Chat's GPT-4o advanced voice competitor tried to argue with me — OpenAI doesn't need to worry just yet

By Ryan Morrison,

1 day ago

Moshi Chat is a new native speech AI model from French startup Kyutai, promising a similar experience to GPT-4o where it understands your tone of voice and can be interrupted.

Unlike GPT-4o, Moshi is a smaller model and can be installed locally and run offline. This could be perfect for the future of smart home appliances — if they can improve the responsiveness.

I had several conversations with Moshi. Each lasts up to five minutes in the current online demo and in every case it ended with it repeating the same word over and over, losing cohesion.

In one of the conversations it started to argue with me, flat out refusing to tell me a story, demanding instead to state a fact and wouldn’t let up until I said “tell me a fact.”

This is all likely an issue of context window size and compute resources that can be easily solved over time. While OpenAI doesn’t need to worry about the competition from Moshi yet, it does show that as with Sora , where Luma Labs , Runway and others are pressing against its quality — others are catching up.

What is Moshi Chat?

Moshi Chat is the brainchild of the Kyutai research lab and was built from scratch six months ago by a team of eight researchers. The goal is to make it open and build on the new model over time, but this is the first openly accessible native generative voice AI.

“This new type of technology makes it possible for the first time to communicate in a smooth, natural and expressive way with an AI,” the company said in a statement.

Its core functionality is similar to OpenAI’s GPT-4o but from a much smaller model. It is also available to use today, whereas GPT-4o advanced voice won’t be widely available until Fall.

The team suggests Moshi could be used in roleplay scenarios or even as a coach to spur you on while you train. The plan is to work with the community and make it open so others can build on top of and further fine-tune the AI.

It is a 7B parameter multimodal model called Helium trained on text and audio codecs, but Moshi is speech in speech out natively. It can run on an Nvidia GPU, Apple's Metal or a CPU.

What happens next with Moshi?

Kyutai hopes that the community support will be used to enhance Moshi's knowledge base and factuality. These have been limited because it is a lightweight base model, but it is hoped that expanding these aspects in combination with native speech will create a powerful assistant.

The next stage is to further refine the model and scale it up to allow for more complex and longer form conversations with Moshi.

In using it and from watching the demos I’ve found it incredibly fast and responsive for the first minute or so, but the longer the conversation goes on the more incoherent it becomes. Its lack of knowledge is also obvious and if you cal it out for making a mistake it gets flustered and goes into a loop of "I’m sorry, I’m sorry, I’m sorry."

This isn’t a direct competitor for OpenAI’s GPT-4o advanced voice yet, even though advanced voice isn’t currently available. But, offering an open, locally running model that has the potential to work in much the same way is a significant step forward for open source AI development.

More from Tom's Guide

Expand All

Read in NewsBreak

Comments / 0

Add a Comment

Engadget14 days ago

Is the Samsung Galaxy S25 Plus really dead? We’re not so sure

Tom's Guide1 day ago

Google Pixel 9 could get an upgrade that's been a long time coming — borrowed from Galaxy S24 Ultra

Tom's Guide1 day ago

Apple wants you to play PC games on your iPhone — 3 reasons why that's a bad idea

Tom's Guide19 hours ago

The Apple Vision Pro accessory you shouldn't overlook

Tom's Guide16 hours ago

People Are Just Now Learning What "Google" Actually Means

IFLScience24 days ago

FBI warns that Mexican cartels are targeting Americans with timeshare scams

NewSantaAna19 days ago

You can play one of the best PS5 games of 2024 for free right now — here's how

Tom's Guide1 day ago

iPhone 16 May Pack Samsung’s Advanced Image Sensor [Updated]

Mac Observer3 days ago

3 Zodiac Signs Who Are Immature

Total Apex Sports & Entertainment23 days ago

I hung out with a humanoid robot. She seemed flattered and eager to please.

Business Insider12 days ago

3 Zodiac Signs Who Love Sunshine

Total Apex Sports & Entertainment9 days ago

65-inch Hisense U6 Mini-LED ULED 4K UHD Google Smart TV drops to new all-time low price

Neowin2 days ago

These new smart glasses feature a built-in camera and GPT-4o support

Android Authority4 days ago

Elon Musk says he’ll lock iPhones in an electromagnetic cage at all his businesses after Apple announces OpenAI partnership

Fortune24 days ago

5 Zodiac Signs That Make The Best Friends

Total Apex Sports & Entertainment7 days ago

5 Zodiac Signs That Are High Maintenance

Total Apex Sports & Entertainment5 days ago

A man set up fake Wi-Fi to steal people's data – here's how to stay safe

Tom's Guide1 day ago

Microsoft explains why it wants you to switch to Microsoft account from Local account

Neowin22 days ago

Why Depression Can Impact A Cluttered Home & Cluttered Mind

Declutterbuzz24 days ago

Welcome to NewsBreak, an open platform where diverse perspectives converge. Most of our content comes from established publications and journalists, as well as from our extensive network of tens of thousands of creators who contribute to our platform. We empower individuals to share insightful viewpoints through short posts and comments. It’s essential to note our commitment to transparency: our Terms of Use acknowledge that our services may not always be error-free, and our Community Standards emphasize our discretion in enforcing policies. We strive to foster a dynamic environment for free expression and robust discourse through safety guardrails of human and AI moderation. Join us in shaping the news narrative together.

Comments / 0

Community Policy