AI tools are frustrating people left and right, and there’s a clear pattern. Great technology gets terrible reviews because users expect it to be magic. They assume AI just “knows” things about their context, preferences, and specific situations. It doesn’t.
Here’s my test: If you put a human in a box, isolated them, and only gave them the exact information you’re giving the AI, could they do the task you expect? If the answer is no, then you need to give the AI more context. It’s that simple.
The Magic Thinking Problem: Why AI Isn’t a Mind Reader
AI models don’t have secret knowledge about you or your situation. They don’t know your company’s internal processes, your personal preferences, or the context that seems obvious to you. They know what’s in their training data and what you explicitly tell them.
This “magic thinking” problem shows up everywhere:
- People asking ChatGPT to “write a proposal for my client” without explaining what the client wants, their industry, the project scope, or even the desired tone.
- Developers expecting coding agents to understand their specific architecture without documentation, code snippets, or even a basic explanation of the tech stack.
- Businesses frustrated that AI can’t read their internal systems and processes, despite never providing access to databases, internal wikis, or process maps.
- Users complaining about inaccurate AI summaries of documents they haven’t uploaded or linked.
- Marketers asking AI to “make this campaign better” without defining “better” (more leads, higher conversions, better brand perception?) or providing performance data.
The core issue is a fundamental misunderstanding of how AI, particularly large language models, operates. They are sophisticated pattern-matching machines, not sentient beings with intuition. They can only work with the data they’ve been trained on and the specific input you provide in the moment. Any perceived “intelligence” beyond that is a reflection of well-structured queries and robust underlying systems, not inherent clairvoyance.
The Human in a Box test: If a human can’t do the task with your limited context, AI can’t either.
Case Study: When Context Actually Matters – Kimi K2 and Specialized Training
Look at Kimi K2 – this trillion-parameter model that just dropped. The benchmarks are impressive: 53.7% on LiveCodeBench versus GPT-4.1’s 44.7%. But here’s what matters more: it works well specifically because it was trained for tool calling and agent frameworks.
Kimi K2 works because its training for tool calling and agent frameworks allows it to mitigate the context issue by letting it dynamically acquire the necessary context. Instead of building a general model and hoping it would work for coding agents, the model was specifically optimized for the tools and workflows where it would be deployed. This is a critical distinction. It’s not just a bigger, smarter model; it’s a model designed to interact with its environment and pull in information as needed, much like a human would if given the right tools and access.
This approach highlights a crucial shift in AI development: moving from monolithic, black-box models to more modular, context-aware systems. When a model is designed with specific applications in mind, it can be trained on relevant data and equipped with the mechanisms to fetch additional context dynamically. This means less reliance on users to provide every single detail upfront and more on the AI’s ability to intelligently seek out the information it needs.
The Human in a Box Test: Your Litmus Test for AI Interactions
This is the most important concept I can share: Before you get frustrated with an AI tool, ask yourself if a human could do what you’re asking with only the information you’ve provided. This simple thought experiment cuts through the hype and grounds your expectations in reality.
Examples of failing this test (and what to do instead):
- “Make this report better” (without saying what’s wrong with it, what “better” means, or providing the report itself). Instead: “Make this report more concise, focus on financial metrics, remove the marketing fluff.” (Context: paste in the full report text).
- “Write code for my project” (without explaining the architecture, dependencies, or desired functionality). Instead: “Write a React component that displays user profiles in a card layout, using our existing design system.” (Context: paste in your current styling files and documentation for the design system).
- “Analyze my data” (without explaining what you’re looking for, the data format, or the business objective). Instead: “Analyze this sales data to identify seasonal patterns and recommend inventory adjustments.” (Context: paste in the sales data, e.g., as a CSV or JSON, along with any relevant data schema, and supplier info).
- “Summarize this document” (without providing the document). Instead: “Summarize the key takeaways from this research paper, focusing on the methodology and conclusions.” (Context: upload the PDF or paste the text of the research paper).
- “Generate marketing copy” (without specifying the product, target audience, or campaign goal). Instead: “Generate three short social media posts for our new eco-friendly water bottle, targeting Gen Z, with a call to action to visit our product page.” (Context: provide product details, brand voice guidelines, and a link to the product page).
The difference is context. Specific, explicit context.
It’s about being a good communicator. Just as you wouldn’t expect a new employee to be productive without an onboarding process and access to company resources, you shouldn’t expect an AI to perform complex tasks without the necessary information and access to tools.
Why This Matters More Than Benchmarks: The Real Performance Gap
Everyone gets excited about model benchmarks – how many points GPT-5 scored on a coding test, or how many questions Claude 4 answered correctly. But the real performance gap in practical applications isn’t between models; it’s between people who understand context and people who don’t.
I’ve seen developers get amazing results from older models with good prompting and a well-structured environment, while others struggle with frontier models because they assume the AI should just “figure it out.” The model’s raw intelligence is only one part of the equation. Its ability to perform effectively is heavily dependent on the quality and relevance of the input it receives.
Smarter models can indeed “figure it out” more effectively if you give them the right tools to access your data and tell them where to look. This can be incredibly helpful when you don’t want to write an extremely long and detailed prompt for every single task. Models really are getting better, and these frameworks genuinely do help bridge that gap, so it’s important not to downplay their capabilities. Think of it like this: a brilliant chef still needs ingredients and kitchen tools. The more organized and accessible those are, the better the meal.
Good context with older models often outperforms poor context with newer models.
This chart illustrates the point: a powerful, cutting-edge model with poor context will often underperform an older, less sophisticated model that receives excellent context. The ultimate performance ceiling is raised by better models, but the floor is set by the quality of context provided. It’s the difference between giving a brilliant strategist a clear mission brief and all necessary intelligence versus just telling them “win the war.”
Examples of What Does This Well: Designing for Human-AI Collaboration
Perplexity’s Deep Research feature shows what happens when you design around real user workflows. While AI can automate research for many tasks, Perplexity built an interactive research feature that lets you course-correct mid-process, which is particularly useful for more complex tasks where users want to monitor the process.
You can jump in, provide additional context, redirect when it goes off-track. They treat research like the collaborative, iterative process it actually is rather than pretending AI can read your mind. This approach acknowledges that human expertise is still critical, especially in nuanced tasks like research. The AI acts as an assistant, not a replacement, enhancing the user’s capabilities by providing a flexible framework for information gathering and refinement.
The result feels more powerful than tools making bigger claims because it works with how humans actually work, not how we wish AI worked. This is the essence of effective human-AI teaming: designing systems that complement human strengths rather than attempting to mimic them perfectly.
Another example is a well-designed internal AI agent framework for a company. Instead of a generic chatbot, these systems are built with access to internal databases, APIs, and documentation. They are given specific “tools” to fetch information about customer orders, product specifications, or employee records. When a user asks a question, the AI doesn’t “know” the answer, but it knows how to find it by using its predefined tools to query the company’s data sources. This is a crucial distinction and a significant step beyond simple prompting.
The Testing Problem Gets Worse: Evaluating AI’s Contextual Awareness
This misunderstanding about context is making the AI testing problem harder. My experience building and testing agents shows that defining what AI should do is straightforward, but testing if it actually does it becomes incredibly complex when people’s expectations are based on thinking AI is magical.
If you expect an AI to infer everything, how do you test for that? You can’t. You end up with vague test cases that yield inconsistent results. This is where robust testing comes in. I’ve found that we need to write test cases that specifically challenge agents with scenarios where users don’t provide all the necessary context. The goal isn’t just for the AI to answer correctly, but for it to recognize when it lacks information and to ask clarifying questions, just like a human would. This shifts the focus from simply getting the right answer to assessing the AI’s ability to manage ambiguity and gather missing context. This is a far more valuable metric for real-world AI deployment.
For instance, instead of just testing if an AI can answer “What’s the sales forecast for Q3?” when all data is provided, test it when only partial data is given. Does it ask for the missing regional breakdown? Does it prompt for historical sales data to build a more accurate model? An AI that can intelligently seek out missing context is far more useful than one that simply fails or hallucinates when faced with incomplete information.
Stop Looking for Tricks, Start Building Systems: The Path to Practical AI Success
The companies succeeding with AI aren’t the ones with the most flashy marketing claims or the most impressive benchmark scores in isolation. They’re the ones that understand the context problem and build systems around it. They treat AI not as a standalone oracle, but as a component within a larger information architecture.
Good AI systems are about information architecture, not just model quality. They’re about giving the AI the right context, the right tools, and the right success criteria. This means structured data, well-documented processes, accessible knowledge bases, and clear definitions of what constitutes a successful outcome. It means building robust retrieval-augmented generation (RAG) systems that allow the AI to access and synthesize information from vast, specific datasets, rather than relying solely on its pre-trained knowledge.
The gold rush is real, but the tools that actually work are usually the ones that help you provide better context, not the ones promising to read your mind. This includes advanced prompting techniques, integration with internal enterprise systems, and the development of specialized AI agents that are trained for specific tasks and can interact with real-world tools and data sources.
AI isn’t special in this way. It’s a very sophisticated tool that works exactly as well as the information you give it. Once you understand that, everything else becomes much clearer.
Instead of chasing the latest model releases or getting frustrated when AI doesn’t meet unrealistic expectations, focus on building better systems for providing context. Document your processes. Structure your data. Be explicit about what you want. Implement robust RAG systems. Design agents with clear tool-use capabilities. This groundwork is less glamorous than a new model launch, but it’s where the real, tangible value from AI is generated.
The difference between AI that helps and AI that frustrates isn’t usually the model – it’s the human behind it who understands how to communicate effectively with a very literal, very powerful tool that needs clear instructions to do its best work. This is the secret to getting AI to perform not like magic, but like an incredibly capable, context-aware assistant.
Key Takeaways for Practical AI Implementation:
- Prioritize Context: Always ask if a human with the same information could do the task. If not, augment the context.
- Build Information Architecture: Treat your data and processes as critical inputs for AI. Structured, accessible information is gold.
- Focus on Tooling and Agents: Leverage AI models designed for tool use and integrate them into workflows that allow dynamic context acquisition.
- Rethink Evaluation: Test AI not just for correct answers, but for its ability to identify missing information and ask clarifying questions.
- Train Your Teams: Educate users on effective prompting and the importance of context. The human element in guiding AI is paramount.
The future of AI isn’t about AI becoming sentient. It’s about humans learning to speak its language: the language of explicit context and well-defined objectives. Stop looking for the magic wand; start building the infrastructure that makes AI truly powerful.

