The latest State of AI Survey for 2025 paints a revealing picture of organizations’ AI implementation strategies. While multi-model approaches are becoming commonplace, the automation of prompt engineering is lagging. This suggests both a maturing AI adoption landscape and areas where teams struggle to fully maximize their AI investments.
Adoption Metrics: The Rise of Multi-LLM Deployments
Perhaps the most significant takeaway from the survey is that 42% of AI teams now employ two or more large language models (LLMs) in production environments. This multi-model approach signifies a distinct departure from earlier practices, where businesses typically focused on a single provider.
The rationale for this shift appears calculated rather than exploratory. Teams are intentionally pairing models based on complementary strengths – dedicating particular models to handle technical tasks while assigning others to creative content generation or customer interactions. As someone who has built and deployed AI workflows, I can say definitively that a one-size-fits-all approach rarely yields optimal results.
Examining the primary providers used by these organizations reveals a marketplace that is both dynamic and condensed. The leading five LLM providers account for around 88% of market usage, with models like Claude 3 Opus and GPT-4o achieving strong performance scores across critical benchmarks. OpenAI’s model naming is a complete mess right now. With 9 different models to choose from, here’s what you need to know:
- The model with the biggest number is NOT always the best
- Mini versions are worse but faster, except for o3-mini-high which is somehow the second best model
- The actual hierarchy of the o-series is: o1 Pro > o3-mini-high > o1 > o3-mini
This concentration points to the fact that while the industry supports numerous contenders, smaller entrants face a challenge without specialized capabilities or significant cost advantages. The market is expressing preferences through adoption, and frontrunners have materialized. Whether OpenAI’s new model lineup featuring GPT-4.1 and the o-Series are actually any good is up for debate among organizations, which should be a factor when figuring out which provider to use.
Live AI Applications: Emphasis on Customer-Facing Features
The survey highlights significant implementation rates for customer-facing AI, with 70% of organizations stating that they have at least one such feature deployed in production. This marks a substantial change from prior years when AI was largely limited to back-office functions or internal resources.
Upon examining specific applications, distinct trends emerge regarding what companies consider adequately beneficial for rollout:
- Document parsing and analysis (60%) is at the top, which confirms that gleaning structured data from unstructured text remains among AI’s most sought-after capabilities.
- Chatbots and conversational support (51%) continue to be extensively used, which illustrates businesses’ focus on scalable support to customers
- Natural language analytics (44%) finishes up the top three, showing the value of data insights that are more accessible through spoken interfaces.
The preference for these functional applications over more experimental projects implies that organizations are giving priority to tangible return on investment over projects that are superficially impressive. Using natural language interfaces for current business procedures appears to offer the most immediate benefits. I think it’s a mistake to try to build proprietary AI models internally, especially since wrappers that add to pre-existing value already offer so much reward.
Development Practices: Prompt Engineering’s Prevailing Role
One of the most unexpected revelations involves the method by which organizations create AI apps. A convincing 86% depend on prompt engineering utilizing Retrieval-Augmented Generation (RAG) pipelines instead of fine-tuning substantial models. This strategy enables teams to perform iterations rapidly without taking on the expense of retraining models, both financially and in the form of data consumed.
The proclivity for prompt+RAG approaches is logical for multiple reasons:
- Fewer technical requirements
- More rapid iteration cycles
- Easier budget planning (absent the uncertainty of training costs)
- Simple content updates instead of model retraining
Nevertheless, the preference comes with its own set of issues, specifically about managing these prompt-oriented systems on a wider scale.
Toolchain Strategies: Build Internally or Buy Pre-Made?
The survey suggests that 63% of organizations are building internal AI toolchains instead of utilizing third-party platforms (29%). This could suggest that most organizations regard their AI implementation requirements as sufficiently unique to warrant a personalized infrastructure.
Opting for custom toolchains may reflect the relative immaturity of business platforms, along with particular demands for data security, compliance, as well as compatibility with existing infrastructure. This, however, also creates an added burden of upkeep and could cause redundant practices between organizations. This also highlights a lot of misunderstandings as to what an AI agent is versus a workflow. The definition is vague, some people build workflows but incorrectly label them as agents. I personally prefer the method that Anthropic uses:
- Workflows are systems where AI models and tools follow predefined paths
- Agents are systems where AI models control their own processes and tool usage independently
For pretty much all tasks, workflows are better. In many business processes, use cases for agents are not needed.
Worst of all, only 24% have CI/CD pipelines automated specifically for prompt management. Even while companies are investing substantially in AI, prompt engineering as a discipline similar to software development – which comes with the appropriate testing, oversight, and deployment procedures – is still not fully incorporated.
Organizational Priorities: Necessitating Cross-Functional Collaboration
The survey stresses that effectively implementing AI needs close collaboration between engineering, product management, and design personnel. This cross-functional strategy ensures that AI features support the needs of end users and align with objectives for the business instead of merely applying technology just because it is there.
When anticipating priorities for next year, organizations are centering on:
- Expanding AI resources that are presented to customers.
- Constructing resources for complex tasks.
- Upskilling staff to operate with technology that is constantly being updated.
These goals represent a move away from experimentation towards incorporation, as organizations focus on getting results from their AI investments by dealing with more intricate use cases and developing in-house skill sets.
Implications for Your Team: Best Practices
According to the survey findings, AI teams hoping to maximize their performance should consider several implications:
Multi-LLM Strategies: Establishing Best Practices
When 42% of teams are already employing multiple models, having a clear approach for model selection and combination becomes vital. Effective methodologies could include:
- Assigning particular models to different tasks based on their unique traits (Claude for in-depth analysis, for example, and GPT models for producing resources).
- Having key and backup models available to ensure serviceability and address cases that cause key models to fail.
- Executing cost-optimized strategies that forward usage to less financially prohibitive models whenever substantial processing is not required.
- Designing model evaluation tools to frequently assess provider performance according to task demand.
When teams have a thought-out multi-model strategy, they can alleviate provider risks and extracting the most from each model’s features. This strategy gives negotiation advantages with providers along with a decrease in risks from being locked in with suppliers.
Balancing the Urgency of Speed-to-Market with Governance
AI teams face a well-known issue in balancing prompt deployment with correct oversight. While the survey indicates that some organizations prioritize swiftness, more dependable AI programs require both. Common methods include:
- Implementing staged releases with user groups that progressively get bigger.
- Making straightforward governance plans that adjust with risk profiles (more attention to resources used directly by customers).
- Documenting characteristics for model function, restrictions, along with test results.
- Constructing monitoring tools that can flag irregularities or potential misuse.
The organizations that do best incorporate oversight in the development process instead of regarding it as a separate obstacle to overcome. This incorporation makes services quick and safe. It matters less about governance when models can solve the same problems with simple outputs with fewer tokens.
Moving from Prototype to Production: Critical Steps
Given that only 24% of organizations have automation through CI/CD for prompts, considerable space is left for improving the movement of AI prototypes to more dependable production systems. Critical methods include:
- Setting up controls for versioning models used, data, and the final results of your evaluations.
- Creating automated testing that is used to confirm prompt output before anything goes into action.
- Setting up development environments that are like production in order to perform extensive testing.
- Setting up monitoring systems that maintain metrics on model performance.
- Designing rollback plans that take effect should issues arise.
These methods implement quality control similar to that of software engineering while still performing prompt engineering the way it should be, creating production AI systems that operate with the dependability demonstrated in prototypes.
Lacking proper prompt CI/CD reflects a substantial gap in maturity across many AI deployments. Analogous to how conventional software developers would typically not use code lacking correct oversight, quality control and version management, AI teams should, likewise, not enforce that prompts are deployed without proper safeguards. This clearly serves as the new frontier for operational quality across modern AI.
Looking Ahead: Shift from Primary Adoption to Optimization
The State of AI Survey 2025 depicts the shift away from new adoption toward refining for scale. High percentages of AI implementations focused on the customer (70%) means that numerous organizations are past their exploratory phases and are concentrating on producing overall business value. Automation increases how productive people are, alongside expanding what they can do.
Expect an increase in emphasis soon directed toward the operational elements across AI with more sophisticated governance structures, better tools for prompt handling, and greater incorporation with existing business methods. The target will be to turn AI from a one-off ability that is only usable by professionals into a routine element for all software development.
The survey gives important benchmarks and directions for organizations that are only in early stages of AI implementation. Compared to model fine-tuning, a basic leaning toward prompt+RAG approaches illustrates that barriers to entry are lower than some may believe. Also, the number of strategies centering around more than one model illustrates that organizations are better off assessing numerous vendors vs sticking to a choice early.
Above all, the survey shines a light on the requirements for cross-functional joint effort in order to successfully make use of AI. The most actionable implementations put proper engineering ability with proper design considerations to create a method that serves the business appropriately — serving finally as a tool that satisfies the needs and requirements of people vs merely using any old technology that is now available.