Friday, May 24, 2024

Google I/O Showed Gemini Still Needs Time to Bake

Must read


During the kickoff keynote for Google I/O 2024, the general tone seemed to be, “Can we have an extension?” Google’s promised AI improvements are definitely taking center stage here, but with a few exceptions, most are still in the oven.

That’s not too surprising—this is a developer conference, after all. But it seems like consumers will have to wait a while longer for their promised “Her” moment. Here’s what you can expect once Google’s new features start to arrive.

AI in Google Search


Credit: Google/YouTube

Maybe the most impactful addition for most people will be expanded Gemini integration in Google Search. While Google already has a “generative search” feature that can jot out a quick paragraph or two, it’ll soon be joined by “AI Overviews.”

AI Overviews will optionally extend generative search into an entire page, with answers to your questions as well as suggestions based on the context of your search.

For instance, if you live in a sunny area with good weather and ask for “restaurants near you,” Overviews might give you a few basic suggestions, but also a separate, unprompted subheading with restaurants that have good patio seating.

In the more traditional search results page, you’ll instead be able to use “AI organized search results,” which eschew traditional SEO to intelligently recommend web pages to you based on highly specific prompts.

For instance, you can ask Google to “create a gluten free three-day meal plan with lots of veggies and at least two desserts,” and the search page will create several subheadings with links to appropriate recipes under each.

Google is also bringing AI to how you search, with an emphasis on multimodality—meaning you can use it with more than text. Specifically, an “Ask with Video” feature is in the works that will allow you to simply point your phone camera at an object, ask for identification or repair help, and get answers via generative search.

Google didn’t directly address how its handling criticism that AI search results essentially steal content from sources around the web without users needing to click through the original source. That said, demonstrators highlighted multiple times that these features bring you to useful links you can check out yourself, perhaps covering their bases in the face of these critiques.

AI Overviews are already rolling out to Google’s experimental Search labs, with AI Organized Search Results and Ask with Video set for “the coming weeks.”

Search your photos with AI

Ask Photos demo


Credit: Google/YouTube

Another of the more concrete features in the works is “Ask Photos,” which plays with multimodality to help you sort through the hundreds of gigabytes of images on your phone.

Say your daughter took swimming lessons last year and you’ve lost track of your first photos of her in the water. Ask photos will let you simply ask, “When did my daughter learn to swim?” Your phone will automatically know who you mean by “your daughter,” and surface images from her first swimming lesson.

That’s similar to searching your photo library for pictures of your cat by just typing “cat,” sure, but the idea is that the multimodal AI can support more detailed questions and understand what you’re asking with greater context, powered by Gemini and the data already stored on your phone.

Other details are light, with Ask Photos set to debut “in the coming months.”

Project Astra: an AI agent in your pocket

project astra in action


Credit: Google/YouTube

Here’s where we get into more pie in the sky stuff. Project Astra is the most C-3PO we’ve seen AI get yet. The idea is you’ll be able to load up the Gemini app on your phone, open your camera, point it around, and ask for questions and help based on what your phone sees.

For instance, point at a speaker, and Astra will be able to tell you what parts are in the hardware and how they’re used. Point at a drawing of a cat with dubious vitality, and Astra will answer your riddle with “Schrödinger’s Cat.” Ask it where your glasses are, and if Astra was looking at them earlier in your shot, it will be able to tell you.

This is maybe the classical dream when it comes to AI, and quite similar to OpenAI’s recently announced GPT-4o, so it makes sense that it’s not ready yet. Astra is set to come “later this year,” but curiously, it’s also supposed to work on AR glasses as well as phones. Perhaps we’ll be learning of a new Google wearable soon.

Make a custom podcast Hosted by Robots

setting up robot podcast in NoteBookLM


Credit: Google/YouTube

It’s unclear when this feature will be ready, since it seems to be more of an example for Google’s improved AI models than a headliner, but one of the more impressive (and possibly unsettling) demos Google showed off during I/O involved creating a custom podcast hosted by AI voices.

Say your son is studying physics in school, but is more of an audio learner than a text-oriented one. Supposedly, Gemini will soon let you dump written PDFs into Google’s NotebookLM app and ask Gemini to make an audio program discussing them. The app will generate what feels like a podcast, hosted by AI voices talking naturally about the topics from the PDFs.

Your son will then be able to interrupt the hosts at any time to ask for clarification.

Hallucination is obviously a major concern here, and the naturalistic language might be a little “cringe,” for lack of a better word. But there’s no doubt it’s an impressive showcase…if only we knew when we’ll be able to recreate it.

Paid features

gemini side panel


Credit: Google/YouTube

There’s a few other tools in the works that seem purpose-built for your typical consumer, but for now, they’re going to be limited to Google’s paid Workspace plans.

The most promising of these is Gmail integration, which takes a three-pronged approach. The first is summaries, which can read through a Gmail thread and break down key points for you. That’s not too novel, nor is the second prong, which allows AI to suggest contextual replies for you based on information in your other emails.

But Gemini Q&A seems genuinely transformative. Imagine you’re looking to get some roofing work done and you’ve already emailed three different construction firms for quotes. Now, you want to make a spreadsheet of each firm, their quoted price, and their availability. Instead of having to sift through each of your emails with them, you can instead ask a Gemini box at the bottom of Gmail to make that spreadsheet for you. It will search your Gmail inbox and generate a spreadsheet within minutes, saving you time and perhaps helping you find missed emails.

This sort of contextual spreadsheet building will also be coming to apps outside of Gmail, but Google was also proud to show off its new “Virtual Gemini Powered Teammate.” Still in the early stages, this upcoming Workspace feature is kind of like a mix between a typical Gemini chat box and Astra. The idea is that organizations will be able to add AI agents to their Slack equivalents that will be on call to answer questions and create documents on a 24/7 basis.

Gmail’s Gemini features will be rolling out this month to Workspace Labs users.

Gems

gems on stage


Credit: Google/YouTube

Earlier this year, OpenAI replaced ChatGPT plugins with “GPTs,” allowing users to create custom versions of its ChatGPT chatbots built to handle specific questions. Gems are Google’s answer to this, and work relatively similarly. You’ll be able to create a number of Gems that each have their own page within your Gemini interface, and each answer to a specific set of instructions. In Google’s demo, suggested Gems included examples like “Yoga Bestie,” which offers exercise advice.

Gems are another feature that won’t see the light of day until a few months from now, so for now, you’ll have to stick with GPTs.

Agents

sundar picahi on stage


Credit: Google/YouTube

Fresh off the muted reception to the Humane AI Pin and Rabbit R1, AI aficionados were hoping that Google I/O would show Gemini’s answer to the promises behind these devices, i.e. the ability to go beyond simply collating information and actually interact with websites for you. What we got was a light tease with no set release date.

In a pitch from Google CEO Sundar Pichai, we saw the company’s intention to make AI Agents that can “think multiple steps ahead.” For example, Pichai talked about the possibility for a future Google AI Agent to help you return shoes. It could go from “searching your inbox for the receipt,” all the way to “filling out a return form,” and “scheduling a pickup,” all under your supervision.

All of this had a huge caveat in that it wasn’t a demo, just an example of something Google wants to work on. “Imagine if Gemini could” did a lot of heavy lifting during this part of the event.

New Google AI Models

veo slide on stage


Credit: Google/YouTube

In addition to highlighting specific features, Google also touted the release of new AI models and updates to its existing AI model. From generative models like Imagen 3, to larger and more contextually intelligent builds of Gemini, these aspects of the presentation were intended more for developers than end users, but there’s still a few interesting points to pull out.

The key standouts are the introduction of Veo and Music AI Sandbox, which generate AI video and sound respectively. There’s not too many details on how they work yet, but Google brought out big stars like Donald Glover and Wyclef Jean for promising quotes like, “Everybody’s gonna become a director” and, “We digging through the infinite crates.”

For now, the best demos we have for these generative models are in examples posted to celebrity YouTube channels. Here’s one below:

Google also wouldn’t stop talking about Gemini 1.5 Pro and 1.5 Flash during its presentation, new versions of its LLM primarily meant for developers that support larger token counts, allowing for more contextuality. These probably won’t matter much to you, but pay attention to Gemini Advanced.

Gemini Advanced is already on the market as Google’s paid Gemini plan, and allows a larger amount of questions, some light interaction with Gemini 1.5, integration with various apps such as Docs (separate from Workspace-exclusive features), and uploads of files like PDFs.

Some of Google’s promised features sound like they’ll need you to have a Gemini Advanced subscription, specifically those that want you to upload documents so the chatbot can answer questions related to them or riff off them with its own content. We don’t know for sure yet what will be free and what won’t, but it’s yet another caveat to keep in mind for Google’s “keep your eye on us” promises this I/O.

That’s a wrap on Google’s general announcements for Gemini. That said, they also made announcements for new AI features in Android, including a new Circle to Search ability and using Gemini for scam detection. (Not Android 15 news, however: That comes tomorrow.)

Latest article