Gemini AI is making robots in the office far more useful

Andrew Tarantola

July 11, 2024 at 2:55 p.m.·3 min read

An Everyday Robot navigating through an office. — Everyday Robot

Lost in an unfamiliar office building, big box store, or warehouse? Just ask the nearest robot for directions.

A team of Google researchers combined the powers of natural language processing and computer vision to develop a novel means of robotic navigation as part of a new study published Wednesday.

View this post on Instagram

A post shared by Google DeepMind (@googledeepmind)

Essentially, the team set out to teach a robot — in this case an Everyday Robot — how to navigate through an indoor space using natural language prompts and visual inputs. Robotic navigation used to require researchers to not only map out the environment ahead of time but also provide specific physical coordinates within the space to guide the machine. Recent advances in what’s known as Vision Language navigation have enabled users to simply give robots natural language commands, like “go to the workbench.” Google’s researchers are taking that concept a step further by incorporating multimodal capabilities, so that the robot can accept natural language and image instructions at the same time.

For example, a user in a warehouse would be able to show the robot an item and ask, “what shelf does this go on?” Leveraging the power of Gemini 1.5 Pro, the AI interprets both the spoken question and the visual information to formulate not just a response but also a navigation path to lead the user to the correct spot on the warehouse floor. The robots were also tested with commands like, “Take me to the conference room with the double doors,” “Where can I borrow some hand sanitizer,” and “I want to store something out of sight from public eyes. Where should I go?”

Or, in the Instagram Reel above, a researcher activates the system with an “OK robot” before asking to be led somewhere where “he can draw.” The robot responds with “give me a minute. Thinking with Gemini …” before setting off briskly through the 9,000-square-foot DeepMind office in search of a large wall-mounted whiteboard.

To be fair, these trailblazing robots were already familiar with the office space’s layout. The team utilized a technique known as “Multimodal Instruction Navigation with demonstration Tours (MINT).” This involved the team first manually guiding the robot around the office, pointing out specific areas and features using natural language, though the same effect can be achieved by simply recording a video of the space using a smartphone. From there the AI generates a topological graph where it works to match what its cameras are seeing with the “goal frame” from the demonstration video.

Then, the team employs a hierarchical Vision-Language-Action (VLA) navigation policy “combining the environment understanding and common sense reasoning,” to instruct the AI on how to translate user requests into navigational action.

The results were very successful with the robots achieving “86 percent and 90 percent end-to-end success rates on previously infeasible navigation tasks involving complex reasoning and multimodal user instructions in a large real world environment,” the researchers wrote.

However, they recognize that there is still room for improvement, pointing out that the robot cannot (yet) autonomously perform its own demonstration tour and noting that the AI’s ungainly inference time (how long it takes to formulate a response) of 10 to 30 seconds turns interacting with the system a study in patience.

South China Morning Post
Huawei steals spotlight at ChinaJoy expo as HarmonyOS Next promotion entices gamers
Huawei Technologies stole the spotlight at this year's edition of ChinaJoy, the country's biggest annual digital entertainment expo in Shanghai, as the telecommunications equipment giant unveiled more than 40 video games that can run on HarmonyOS Next, the latest iteration of its mobile operating system. HarmonyOS Next, which will end support for Android apps, will be officially launched for commercial use on Huawei's upcoming flagship 5G smartphone series, the Mate 70, in the fourth quarter thi
BuzzFeed
"It's Terrible For Humanity": 17 Common Gadgets And Tech Features From "Back In The Day" That Older Adults Think Should Make A Comeback
"Subscribing to apps is just as expensive now, and if you don't pay extra, you get ads and commercials. Monetization has ruined another beautiful thing."
South China Morning Post
Tech war: China eyes supercomputers for building LLMs amid US sanctions on advanced chips
China must find an alternative approach to artificial intelligence (AI) development, in lieu of stacking up processors inside data centres, as US sanctions continue to bar the country's access to advanced semiconductors and chip-making equipment, according to industry experts on the mainland. Leveraging supercomputing technology that China has developed over the past decade could help break the stranglehold of US-led restrictions on the mainland's AI industry, according to Zhang Yunquan, a resea
South China Morning Post
Exclusion of Apple Intelligence from China market may weigh on iPhone sales, say analysts
The exclusion of the much-anticipated Apple Intelligence suite from China's market could weigh on iPhone sales in the country, analysts said, especially after the US tech giant dropped out of the mainland's top-five smartphone brand list in the second quarter. Following the debut of Apple Intelligence's beta version on its newly released iOS 18.1, Chinese users expressed disappointment that the highly anticipated suite of artificial intelligence (AI) features is not yet available in their market
Yahoo Canada Style
These 'excellent' earbuds are on sale for 73% off on Amazon Canada — shop them for just $40 (seriously!)
Shoppers love that these earbuds are "lightweight, comfortable and easy to pair."
Yahoo Finance Video
Apple developers get first look at new iPhone AI features, Meta launches open-source AI
Apple (AAPL) has unveiled a beta version of its new operating system for developers to test its new Apple Intelligence features. While still in development, the phone showcases AI capabilities including priority email summarization, advanced photo search, and AI-powered writing tools. Meanwhile, Meta Platforms (META) has launched its open-source AI model, Llama 3.1, allowing broader access to artificial intelligence technology. Yahoo Finance tech editor Dan Howley breaks down these developments, exploring their potential impact on the tech landscape and user experience. For more expert insight and the latest market action, click here to watch this full episode of Wealth! This post was written by Angel Smith Editor's note: This post was updated to clarify what Apple released
Bloomberg
Qualcomm rally fizzles on concern about slow smartphone recovery
(Bloomberg) -- Qualcomm Inc., the world’s biggest seller of smartphone processors, saw a post-market rally sputter on Wednesday, fueled by concerns that the phone market is recovering more slowly than investors had hoped. Most Read from BloombergKamala Harris Wipes Out Trump’s Swing-State Lead in Election Dead HeatIntel to Cut Thousands of Jobs to Reduce Costs, Fund ReboundIran’s Leader Orders Retaliatory Strike on Israel, NYT SaysUkraine Receives First F-16 Fighter Jets After Long WaitLuxury He
Reuters
Apple likely to post higher revenue as discounts aid iPhone demand in China
Apple will likely report on Thursday that it returned to revenue growth in its fiscal third quarter as it won back some customers in China with big iPhone discounts and sold more high-margin iPads thanks to a refreshed design. Sales of the iPhone, which account for nearly half of Apple's revenue, are expected to have decreased by 2.2% in the three months ended June, a big improvement from the 10.5% decline in the second quarter, according to LSEG data. Analysts said the worst may be over for the iPhone.
Engadget
The Morning After: Mark Zuckerberg is surprisingly angry about closed platforms
The biggest news stories this morning: Apple Intelligence is here, as part of the iOS 18.1 developer beta, Border agents can’t search cellphones of NYC visitors without a warrant, Instagram creators can now make AI doppelgangers to chat with their followers.
Reuters
T-Mobile lifts subscriber addition target on demand for premium bundled plans
(Reuters) -T-Mobile US raised its full-year forecast for monthly bill-paying phone subscriber additions, after seeing more customers than expected in the second quarter on strong demand for its discounted unlimited plans that include streaming perks. American wireless carriers have been bundling streaming services with high-speed internet plans in recent months to attract customers in the competitive industry. T-Mobile said its Go5G Next and Go5G Plus plans offer access to Netflix and Apple TV+, as well as premium data options have resonated well with customers.
Business Insider
'A growth turnaround beginning': Here's what Wall Street expects from Apple's 3rd-quarter earnings
Apple is on deck to report third-quarter earnings. Analysts are focused on the company's AI initiatives and the upcoming iPhone 16.
Bloomberg
Microsoft’s Azure Growth Slows, Testing Investors’ Patience
(Bloomberg) -- Microsoft Corp.’s Azure cloud-computing service posted a slowdown in quarterly growth, testing the patience of investors anxious to see a payoff from huge investments in artificial intelligence products.Most Read from BloombergKamala Harris Wipes Out Trump’s Swing-State Lead in Election Dead HeatIntel to Cut Thousands of Jobs to Reduce Costs, Fund ReboundIran’s Leader Orders Retaliatory Strike on Israel, NYT SaysUkraine Receives First F-16 Fighter Jets After Long WaitLuxury Heir A
People
American Woman Found Chained to a Tree in India Went Without Food for 40 Days, Was Allegedly Left to Die
Lalita Kayi Kumar, 50, alleges her ex-husband left her in a jungle to die
The Independent
Trump world erupts amid reports Kellyanne Conway trashing JD Vance
Staffer claimed damaging leaks about Vance are ‘100 percent’ from Conway
The Hill
Harris slices Trump’s national lead in half: New polling average
Vice President Harris halved former President Trump’s 2024 lead in a newly relaunched national polling average that replaces President Biden with Harris as Trump’s likely Democratic opponent. The first iteration of Cook Political Report’s (CPR) new average, relaunched on Wednesday, shows Trump leading Harris by 1.3 points — 47.5 percent support to 46.2 percent. In the…
HuffPost
Trump Campaign Attempts Damage Control Over Fiasco At Black Journalists' Conference
For a former president who never admits a mistake, it's no surprise his staff's statement on the Q&A declared victory over "Liberal Mainstream Media Malpractice."
The Daily Beast
Starving American Yoga Student Chained to Tree ‘For 40 Days’
An American woman who went to India a decade ago to study yoga and meditation has claimed she survived without food and water for 40 days after being shackled to a tree by her former husband.U.S. citizen Lalita Kayi Kumar, 50, was found starving and emaciated on Saturday by a local shepherd in a remote forest in the Sindhudurg district of Maharashtra about 25 miles from Goa, according to reports.The shepherd broke an iron chain that was used to secure her to the tree. Police said she was severel
The Daily Beast
Prince Harry’s Uncle Has Died. Will He Join His Family at the Funeral?
Prince Harry faces renewed pressure to come home and face his family following the death Monday of his and Prince William’s uncle, Lord Robert Fellowes, at the age of 82.Harry’s representatives were not immediately able to say whether he would attend the funeral of his uncle, a date for which has not been announced. Fellowes was not only married to Princess Diana’s elder sister, Jane Fellowes (née Spencer), but was also the late Queen Elizabeth’s private secretary for nine of the most tumultuous
Moneywise
George W. Bush uttered 'the 10 most important words in the history of economics' in 2008, Warren Buffett says
His words may have secured the money market.
The Daily Beast
Bill Maher Boldly Thinks Travis Kelce Is Going to Dump Taylor Swift
“He’s gonna dump her,” Bill Maher said about Travis Kelce and Taylor Swift in his latest podcast episode. “I mean with her, it’s like the Gatorade at the Super Bowl. You know you’re gonna get dumped, you just don’t know when.”Maher made his comments to, of all people, Haliey Welch the girl behind the viral “Hawk Tuah” video, who didn’t disagree with the comedian’s take.“But you gotta think about it this way. If he does that, can you imagine the next album we’re gonna get off of that?” she told M

Latest Stories