Cinematic photo of a single futuristic robot with eight arms, each engaged in different tasks. One arm painting on a canvas, another typing on a holographic keyboard, a third arm manipulating 3D objects in mid-air, and a fourth arm examining a complex circuit board. The robot's central body houses a glowing AI core. Soft, dramatic lighting emphasizes the robot's metallic surfaces. Shot with an ARRI Alexa 65, wide-angle lens, high dynamic range.
Created using FLUX.1 with the prompt, "Cinematic photo of a single futuristic robot with eight arms, each engaged in different tasks. One arm painting on a canvas, another typing on a holographic keyboard, a third arm manipulating 3D objects in mid-air, and a fourth arm examining a complex circuit board. The robot's central body houses a glowing AI core. Soft, dramatic lighting emphasizes the robot's metallic surfaces. Shot with an ARRI Alexa 65, wide-angle lens, high dynamic range."

GPT-5: A Leap Towards Multimodal AI in 2025

OpenAI’s rapid progress and increased resources suggest GPT-5 could arrive as early as 2025, despite their losing personnel, bringing significant advancements across multiple domains. Like GPT 4o, GPT-5 is expected to be inherently multimodal, excelling not just in language processing but also in image generation, 3D modeling, possibly video creation, and of course audio synthesis.

While GPT-4o introduced some multimodal capabilities, GPT-5 aims to push boundaries further. It’s likely to surpass current benchmarks in areas like image generation, potentially rivaling or surpassing specialized models. Though music generation isn’t explicitly mentioned in GPT-5 trademark filings, GPT-4o’s limited singing abilities hint at potential advancements in this area.

One of the most intriguing aspects of GPT-5 is its potential for enhanced agentic capabilities and embodiment. OpenAI has been fine-tuning its models for tool use, as seen with DALL-E integrations and code interpreter. The company’s acquisition of a remote computer use startup and plans for a computer-using agent suggest GPT-5 could serve as a general-purpose agent brain, excelling at computer interactions and tool use benchmarks.

OpenAI’s partnership with robotics company Figure adds another layer of possibilities. Figure’s initial demo used GPT-4 Vision to control an embodied robot, while their recent Figure 02 release appears to utilize a specialized version of GPT-4o mini for speech-to-speech reasoning and embodiment. This collaboration hints at GPT-5’s potential applications in robotics and physical task execution. Here’s a more in-depth post on that: [https://adam.holter.com/figure-02-a-robot-powered-by-gpt-4o-mini/]

It’s important to note that while GPT-5 is big, it’s not going to be AGI or completely replace human capabilities. However, its expected advancements in multimodal processing, tool use, and potential for embodiment make it a highly anticipated development in the field of AI.

This is meant to revise my previously too low GPT-5 expectations, that I made before we saw GPT 4o. You can refer to my previous post: [https://adam.holter.com/what-im-looking-for-in-gpt-5/]. As our understanding of AI capabilities grows, it’s crucial to stay informed about these developments and their potential impacts on various industries and applications.