Google unveils Gemini Omni, a multimodal model that generates video from text, images, and audio
Google has introduced Gemini Omni, a multimodal AI model that can process and generate content across text, images, audio, and video. The first version, Omni Flash, enables users to create and edit videos through natural conversation.
Google announced Gemini Omni, a new multimodal AI model capable of reasoning across text, images, audio, and video. The model can generate and edit videos based on simple conversational inputs. The first iteration, Omni Flash, is now available to developers and enterprise customers.
Gemini Omni processes multiple input types simultaneously, allowing users to upload an image, provide a text prompt, or speak a command to generate a video. The model understands context across modalities, enabling it to create coherent video sequences that match the user's intent.
Omni Flash can edit existing videos by adding or removing elements, changing backgrounds, or altering the narrative flow. For example, a user could upload a video of a park and ask the model to add a dog running through the scene, and the model would generate the appropriate frames.
The model builds on Google's previous work in multimodal AI, but Gemini Omni represents a significant step toward seamless integration of different data types. Google emphasized that the model is designed for creative and professional use cases, such as content creation, advertising, and education.
Google has not disclosed the full technical specifications of Gemini Omni, but the company stated that it uses a unified architecture to process and generate multimodal content. The model is trained on a large dataset of paired text, images, audio, and video.
Omni Flash is available through Google Cloud's Vertex AI platform. Pricing is based on usage, with costs varying depending on the complexity and length of the generated video. Google also plans to release a consumer-facing version in the future.
Developers can access Omni Flash via API, with support for Python and JavaScript. Google provides documentation and sample code to help integrate the model into applications. The company also offers a web-based demo for testing.
Google said that Gemini Omni will continue to evolve, with future updates expected to improve video quality and reduce generation time. The company also noted that safety measures are in place to prevent misuse, including content filters and usage guidelines.
Google unveils Gemini app overhaul at IO 2026 to rival ChatGPT and Claude
At its IO 2026 developer conference, Google announced a major update to its Gemini app, positioning it as a comprehensive AI hub. The overhaul includes new multimodal capabilities, deeper integration with Google services, and a redesigned interface to compete with ChatGPT and Claude.
Google took the stage at its annual IO developer conference on Wednesday to unveil a significant overhaul of its Gemini app. The update marks a strategic shift for the company, aiming to transform Gemini from a standalone chatbot into an all-purpose AI hub. The move directly targets competitors like OpenAI's ChatGPT and Anthropic's Claude, which have gained traction in the consumer AI space.
The new Gemini app introduces multimodal capabilities, allowing users to input and receive responses across text, images, audio, and video. Google demonstrated a feature where users could point their phone camera at a plant and ask Gemini to identify it, receiving a detailed botanical description. The app also supports real-time voice conversations with a natural-sounding assistant, similar to ChatGPT's voice mode.
Deep integration with Google's ecosystem is a key differentiator. Gemini can now access Gmail, Calendar, Drive, and Maps to perform tasks like summarizing emails, scheduling events, or finding nearby restaurants. Google emphasized that user data remains private and is not used for training models without consent. The company also announced a new plugin system for third-party services, including Spotify and Uber.
The redesigned interface features a persistent sidebar for quick access to tools and a new "Spaces" feature for organizing conversations by topic. Google claims the underlying Gemini Ultra model has been updated with improved reasoning and coding abilities, scoring higher than GPT-4o on several benchmarks. The company did not disclose specific benchmark numbers.
Google also introduced Gemini Live, a subscription tier priced at $19.99 per month that includes priority access to the latest models, longer context windows, and advanced data analysis tools. A free tier remains available with standard features. The company plans to roll out the update to all users starting next week, with a staggered release across Android and iOS devices.
During the keynote, Google executives positioned Gemini as a tool for both consumers and businesses. The app now includes a "Workspace" mode tailored for productivity, with features like document drafting and spreadsheet analysis. Google also announced that Gemini will be integrated into Android Auto and Google TV later this year.
Industry analysts noted that the update brings Gemini closer to the functionality offered by ChatGPT Plus and Claude Pro. However, Google's advantage lies in its vast ecosystem of services and its ability to offer the app across multiple platforms. The company confirmed that Gemini will remain free for basic use, with no immediate plans to charge for the core experience.
"We're building an AI that works for you, across every part of your day," said Google CEO Sundar Pichai during the presentation. "Gemini is no longer just a chatbot—it's your personal AI assistant." The updated app will be available for download starting June 1, 2026, on the Google Play Store and Apple App Store.
Google Search shifts to AI-powered conversational answers and agents
Google is redesigning Search to feature AI-generated conversational answers and autonomous agents, moving away from traditional link lists. This transformation may further decrease web traffic to publishers.
Google announced a major overhaul of its Search engine, transitioning from a list of links to an AI-driven experience. The update introduces conversational answers, autonomous agents, and interactive interfaces. This shift is expected to reduce traffic to publishers across the web.
The new Search experience leverages Google's advanced language models to generate direct answers to user queries. Instead of presenting a list of links, the system provides synthesized responses that draw from multiple sources. Users can ask follow-up questions and receive coherent, context-aware replies.
Autonomous agents are another key component of the redesign. These AI systems can perform tasks on behalf of users, such as booking reservations or making purchases. The agents operate within the search interface, streamlining complex workflows without requiring users to visit multiple websites.
Interactive interfaces will allow users to refine results through dynamic elements like sliders and dropdowns. For example, a search for "best laptops" might include filters for price range, brand, and specifications, with results updating in real time. This approach aims to keep users within Google's ecosystem longer.
The company emphasized that these changes are designed to improve user experience by providing faster, more accurate answers. However, the shift poses challenges for publishers who rely on search traffic. Google acknowledged that click-through rates to external sites may decline as users find answers directly on the search results page.
Google plans to roll out the new features gradually, starting with a limited set of queries in the United States. The company will test the system with a small percentage of users before expanding. No specific timeline was provided for a broader release.
Publishers have expressed concern about the potential impact on their revenue. Google stated it will continue to prioritize high-quality content and may introduce new ways for publishers to appear in AI-generated answers. The company also noted that it will provide tools for publishers to manage how their content is used.
The transformation marks a significant departure from Google's traditional search model, which has been the primary driver of web traffic for decades. As AI becomes more integrated into search, the relationship between Google and content creators is likely to evolve further.
Google launches AI agents for proactive topic monitoring and alerts
Google is introducing AI-powered information agents that monitor topics in the background and proactively alert users to updates. The feature aims to go beyond standard searches by providing ongoing, personalized notifications.
Google has begun rolling out a new feature called AI agents, designed to monitor specific topics and send users proactive alerts when updates occur. The system operates in the background, tracking changes across web sources without requiring manual searches. Users can set up these agents to follow subjects like news events, product launches, or research developments.
The AI agents leverage Google's language models to understand user interests and identify relevant changes. Once configured, the agents continuously scan for new information and deliver notifications through Google's existing alert infrastructure. This shifts the search experience from a pull model—where users actively query—to a push model that surfaces updates automatically.
According to Google, the agents are built on the same technology powering its generative search features. They can parse complex queries and discern meaningful updates from noise. For example, a user tracking a specific company's earnings reports would receive alerts only when new financial data is published, not for unrelated press releases.
The rollout began this week for select users in the United States who have opted into Google's Search Labs program. The feature is accessible through the Google app on Android and iOS, as well as desktop browsers. Users can create agents by entering a topic and specifying alert frequency—options include real-time, daily, or weekly digests.
Google emphasized that the agents respect user privacy and do not store personal data beyond what is necessary for alert configuration. The company also noted that users retain full control over which topics are monitored and can pause or delete agents at any time. Alerts are delivered via the Google app notifications or email, depending on user preference.
This launch comes as Google faces increasing competition from AI-powered search tools like Perplexity and Microsoft's Copilot. By offering proactive monitoring, Google aims to differentiate its search ecosystem and deepen user engagement. The company plans to expand the feature to more regions and languages in the coming months.
Early testers have reported mixed results, with some praising the convenience of automated tracking while others noted occasional irrelevant alerts. Google acknowledged that the system is still learning and will improve over time based on user feedback. The company encouraged users to provide feedback through the Search Labs interface.
Google's AI agents are available now for Search Labs participants in the US. The feature is free to use and does not require a subscription. Google stated that it will continue to refine the agents' accuracy and expand their capabilities based on user needs.
Google Unveils Gemini 3.5 Flash, Emphasizing Autonomous AI Agents Over Chatbots
Google introduced Gemini 3.5 Flash at its developer conference, positioning it as a powerful model for coding and agentic tasks. The model can autonomously execute complex operations and build software from scratch.
Google announced Gemini 3.5 Flash during its annual developer conference, marking a strategic shift toward autonomous AI agents rather than conversational chatbots. The company described the model as its most capable offering for coding and agentic workflows.
Gemini 3.5 Flash is designed to execute complex tasks independently, including building software applications from the ground up. Google emphasized the model's ability to reason through multi-step problems and take actions without human intervention.
The model represents a departure from earlier AI products that focused primarily on text generation and question-answering. Google's leadership framed the release as a move toward more practical, action-oriented AI systems.
Developers will gain access to Gemini 3.5 Flash through Google's AI platform, Vertex AI, and via the Gemini API. The company highlighted the model's improved performance on coding benchmarks compared to previous versions.
Google also introduced new safety features tailored for agentic AI, including guardrails to prevent unintended actions. The company stated that these measures are critical as AI systems gain the ability to interact with external tools and services.
Pricing for Gemini 3.5 Flash will be competitive with other high-performance models, though specific rates were not disclosed at the event. Google plans to offer tiered access based on usage volume.
The release comes as competitors like OpenAI and Anthropic also push toward agentic AI capabilities. Google's strategy appears to focus on integrating these agents deeply with its cloud ecosystem and developer tools.
Gemini 3.5 Flash is available immediately for developers in preview. Google expects to roll out broader access in the coming months, with a general availability date yet to be announced.








