OpenAI’s rapid progress and increased resources suggest GPT-5 could arrive as early as 2025, despite their losing personnel, bringing significant advancements across multiple domains. Like GPT 4o, GPT-5 is expected to be inherently multimodal, excelling not just in language processing but also in image generation, 3D modeling, possibly video creation, and of course audio synthesis.
While GPT-4o introduced some multimodal capabilities, GPT-5 aims to push boundaries further. It’s likely to surpass current benchmarks in areas like image generation, potentially rivaling or surpassing specialized models. Though music generation isn’t explicitly mentioned in GPT-5 trademark filings, GPT-4o’s limited singing abilities hint at potential advancements in this area.
One of the most intriguing aspects of GPT-5 is its potential for enhanced agentic capabilities and embodiment. OpenAI has been fine-tuning its models for tool use, as seen with DALL-E integrations and code interpreter. The company’s acquisition of a remote computer use startup and plans for a computer-using agent suggest GPT-5 could serve as a general-purpose agent brain, excelling at computer interactions and tool use benchmarks.
OpenAI’s partnership with robotics company Figure adds another layer of possibilities. Figure’s initial demo used GPT-4 Vision to control an embodied robot, while their recent Figure 02 release appears to utilize a specialized version of GPT-4o mini for speech-to-speech reasoning and embodiment. This collaboration hints at GPT-5’s potential applications in robotics and physical task execution. Here’s a more in-depth post on that: [https://adam.holter.com/figure-02-a-robot-powered-by-gpt-4o-mini/]
It’s important to note that while GPT-5 is big, it’s not going to be AGI or completely replace human capabilities. However, its expected advancements in multimodal processing, tool use, and potential for embodiment make it a highly anticipated development in the field of AI.
This is meant to revise my previously too low GPT-5 expectations, that I made before we saw GPT 4o. You can refer to my previous post: [https://adam.holter.com/what-im-looking-for-in-gpt-5/]. As our understanding of AI capabilities grows, it’s crucial to stay informed about these developments and their potential impacts on various industries and applications.