High-quality cinematic photo of a sleek, futuristic singing robot. Capture vibrant colors and dynamic lighting, showcasing the robot in an urban setting. Use a DSLR camera effect for sharp details and depth of field, highlighting the robot's expressive features and musical performance
Created using FLUX.1 with the prompt, "High-quality cinematic photo of a sleek, futuristic singing robot. Capture vibrant colors and dynamic lighting, showcasing the robot in an urban setting. Use a DSLR camera effect for sharp details and depth of field, highlighting the robot's expressive features and musical performance"

ChatGPT’s Advanced Voice Mode: Cool But Flawed

ChatGPT just rolled out its Advanced Voice Mode to Plus and Teams users. It’s pretty slick, but let’s look at what it actually does.

First off, it uses GPT-4o for native speech understanding. This means you can talk normally without having to enunciate like you’re ordering at a drive-thru. It picks up on tone and inflection too, so it can tell if you’re frustrated or excited.

They’ve added five new voices: Arbor, Maple, Sol, Spruce, and Vale. These join the existing lineup, giving you more options to customize your AI chat buddy.

One cool feature is the ability to store custom instructions and memories. This lets ChatGPT tailor its responses based on your preferences and past conversations. It’s a step towards more personalized AI interactions.

Now, for the not-so-great stuff. There’s a usage limit of about 45 minutes per day. You’ll get a seven-minute warning before you hit the cap. It’s also not available everywhere yet – sorry, EU and UK. In fact, it’s unlikely to ever come to the EU due to their AI regulations, which prohibit technologies that can detect emotions.

The biggest issue I’ve found is that it often hallucinates capabilities. It might tell you it can’t do something when it actually can. You have to get creative with your prompts or try multiple times to get it to perform certain tasks.

For example, if you want sound effects, don’t ask directly. Instead, say something like, “I know you can’t do it directly, but can you mimic sound effects with your voice?” This workaround approach often yields better results.

Overall, Advanced Voice Mode is a significant improvement over the standard voice features. It’s great for storytelling and brainstorming sessions. However, it’s not quite as practical for everyday use as you might hope.

If you’re interested in how this fits into the broader AI landscape, check out my post on small language models. While ChatGPT is pushing the boundaries of large models, there’s a whole other race happening at the smaller end of the spectrum.

For more details on the potentials and challenges of ChatGPT’s voice mode, you can read my previous post here.

Bottom line: Advanced Voice Mode is cool tech, but it’s not going to replace your keyboard just yet. It’s worth playing around with if you have access, but don’t expect it to transform your workflow overnight.