Meta rolls out its own version of Advanced Voice Mode at Connect 2024

Zuckerberg debuting natural voice interactions
Meta

At Meta Connect 2024 on Wednesday, CEO Mark Zuckerberg took to the stage to discuss his company’s latest advancements in artificial intelligence. In what he describes as “probably the biggest AI news that we have,” Zuckerberg unveiled Natural Voice Interactions, a direct competitor to Google’s Gemini Live and OpenAI’s Advanced Voice Mode.

“I think that voice is going to be a way more natural way of interacting with AI than text,” Zuckerberg commented. “I think it has the potential to be one of [the], if not the most frequent, ways that we all interact with AI.” Zuckerberg also announced that the new feature will begin rolling out to users today across all of Meta’s major apps including Instagram, WhatsApp, Messenger, and Facebook.

screenshots of natural voice interactions features
Meta

“Meta AI differentiates itself in this category by not just offering state-of-the-art AI models, but also unlimited access to those models for free integration easily into our different products and apps,” Zuckerberg said. “Meta AI is on track to being the most used AI assistant in the world. We’re almost at 500 million monthly actives and we haven’t even launched in some of the bigger countries yet.”

As with Gemini Live and Advanced Voice Mode, Natural Voice Interactions allows users to forgo text prompts and speak directly with the chatbot. Users can stutter, correct themselves, interrupt the AI, and generally speak as they would with another human and still have the chatbot follow the conversation. The new feature will also allow users to pick the voice of the AI and choose from a variety of celebrities including John Cena, Dame Judy Dench, Kristen Bell, Keegan Michael Key, and Awkwafina. You may remember that lineup from Meta’s previous incursion into natural language chatting, which was shuttered in August because users found the interactions to be “creepy” and “surreal.”

HIS NAME IS JOHN CENA
Meta

Zuckerberg provided a live demo of the feature onstage, asking the chatbot a series of softball questions that the AI answered satisfactorily. Its speaking cadence appeared a bit stilted and less conversational than what we’ve seen from Advanced Voice Mode, but was still far better than the monotone intonations you’d get from a Siri response. However, it wasn’t until Zuckerberg referred to the AI as Awkwafina that this reporter realized that’s whose voice it was supposed to be.

Natural Voice Interactions was “probably the biggest” AI news announced Wednesday but it was far from the only announcement. Zuckerberg also revealed that Meta’s Llama model has reached version 3.2 given that the system has gone multimodal. Llama 3.2 11B and 90B (referring to the number of parameters each was trained on) can now both can interpret charts and graphs, identify assets within images as well as generate image captions.

Unfortunately, these new models will not be available in Europe. This is due to what Meta categorizes as the EU’s “unpredictable” regulatory environment, which prevents the company from using Europeans’ data to train its AI models. The company is launching a pair of extremely lightweight models in Europe, dubbed Llama 3.2 1B and 3B, neither of which have been trained on European data. Those models are built for smartphones and other edge devices.

And for seemingly unfathomable reasons, Meta also announced that it is trialing a new feature that will inject AI-generated images — some of which may include your likeness — directly into your Facebook and Instagram feeds. These “Imagined for You” images will prompt users to either share the image as-is or iterate upon it in-app and in real time.

“I think there’s been this trend over time where the feeds started off as primarily and exclusively content for people you followed, your friends,” Zuckerberg told The Verge in a recent interview. “And you just add on to that, a layer of, ‘Okay, and we’re also going to show you content that’s generated by an AI system that might be something that you’re interested in’ … how big it gets is kind of dependent on the execution and how good it is.”