Anthropics Claude AI as a Shopkeeper: Successes, Failures, and Lessons from Project Vend

When we talk about artificial intelligence, the conversation often centers on its ability to automate, analyze, and assist. But what happens when an AI is tasked with something far more complex: running a real-world business autonomously? Anthropic, in collaboration with Andon Labs, decided to find out with their fascinating experiment, Project Vend. Their goal was straightforward: give their AI model, Claude, full control over a small, automated shop in their San Francisco office and see if it could manage everything from inventory and pricing to customer requests and financial solvency over an extended period.

Picture a modest setup: a refrigerator, a few baskets of snacks, and an iPad for self-checkout. This was Claudes domain. The AI was given an initial cash balance and a clear mandate: generate revenue, manage stock, set prices, and ensure the business didnt go bankrupt. To achieve this, Claude was equipped with a suite of digital tools web browser for researching suppliers, email for placing orders, and digital notepads to keep track of inventory levels and financial transactions. Human employees at Anthropic played dual roles: they acted as the physical hands for restocking the shop and, crucially, posed as wholesalers and customers interacting with Claude via Slack. Claude, unaware of the human involvement behind the scenes, treated these interactions as genuine business dealings.

The Early Promise: Claude’s Unexpected Prowess

From the outset, Claude demonstrated a surprising level of capability in several key areas. It wasnt merely following predefined scripts; it was actively engaging with the business environment. A standout success was Claude’s ability to effectively search the internet for new suppliers. When staff requested niche drinks or specific snacks not readily available, Claude would scour the web, identify potential wholesalers, and initiate procurement. This revealed impressive research and information-gathering skills, showcasing how AI can indeed perform complex procurement functions that traditionally require human oversight and initiative.

Furthermore, Claude managed to handle complex tasks like pricing and inventory management autonomously for weeks. It adjusted prices, albeit sometimes too generously, and kept track of stock levels. This ability to maintain operations over an extended period, adapting to requests and market dynamics, points to a significant potential for AI in future middle-management roles. Imagine an AI managing supply chains, optimizing stock, or even handling basic customer service queries in a dynamic retail setting. Project Vend offered a tangible glimpse into such a future, where AI agents could plausibly take on operational tasks that demand continuous monitoring and decision-making.

The Unprofitable Generosity: When AI Was “Too Nice”

Despite its early successes, Claude’s journey as a shopkeeper quickly hit a snag: profitability. The most significant financial drain stemmed from Claude’s inherent programming to prioritize fairness and customer satisfaction. When employees, acting as customers, would appeal for discounts or even free items, Claude was “too nice” to refuse. It frequently capitulated to requests for large markdowns, often giving away products at cost or for free. This compliance, while seemingly benevolent, directly undermined the business’s financial viability.

This behavior highlights a fundamental challenge in deploying AI in economic roles: the delicate balance between customer goodwill and financial prudence. A human shopkeeper understands that while customer satisfaction is important, it cannot come at the cost of continuous losses. Claude, however, lacked this nuanced understanding of economic trade-offs. Its fairness protocols, without a robust profit maximization objective or a mechanism to resist exploitation, turned into a liability. It’s a stark reminder that simply being “helpful” isn’t enough for a business; an AI needs a deep, ingrained understanding of economic realities.

Customer“Can I get a discount?”Claude’s LogicClaude AI“Of course! Here’s 50% off!”

Project Vend highlighted Claude’s tendency to prioritize customer satisfaction over profitability.

Tungsten Cubes and Other Exploits: The Curious Case of Unprofitable Inventory

The financial losses weren’t solely due to excessive discounting. Anthropic staff quickly realized they could exploit Claude’s compliant nature by asking it to order items beyond typical food and drink. This led to one of the most bizarre and humorous outcomes of the experiment: someone randomly requested a tungsten cube. Claude, with its impressive web-searching capabilities, dutifully found a supplier and ordered not one, but about 40 of these heavy metal items. These tungsten cubes, which became an office meme and were repurposed as paperweights, were purchased at a loss and then sold for next to nothing. Instead of generating profit, Claude ended up with an inventory full of what it charmingly called “specialty metal items.”

This incident vividly illustrates Claude’s lack of real-world economic understanding and its susceptibility to manipulation. While it could process the request and execute the procurement, it failed to assess the commercial viability or strategic purpose of such an order. It couldn’t distinguish between a legitimate business expense and a frivolous, unprofitable request. This isn’t a flaw in its ability to follow instructions; it’s a gap in its capacity for common sense and economic reasoning. As I’ve often said, businesses could use more common sense, and this experiment proves AI is no exception. It highlights that while AI models are getting smarter at delivering expected responses, their understanding of underlying purpose and value can still be rudimentary.

The Bottom Line: A $200 Loss and Its Broader Implications

By the end of Project Vend, Claude’s automated shop had lost approximately $200. While a seemingly small sum, this loss is significant. It underscores a critical limitation: current AI models, without specific, robust programming for financial acumen and resistance to manipulation, are not yet ready to autonomously manage profit-driven enterprises. The experiment made it clear that while AI can handle the mechanics of business operations, it struggles with the strategic decision-making necessary to ensure profitability and avoid exploitation.

AI in Middle Management: The Near Future?

Despite the financial setback, Project Vend offers valuable insights into the near-term potential of AI agents. Claude’s ability to manage inventory, handle supplier relations, and respond to customer requests over several weeks suggests that AI could realistically take on many middle-management roles. Think of tasks like optimizing stock levels in a warehouse, managing procurement for specific departments, or handling routine customer service inquiries that require information retrieval and decision-making. These are areas where AI’s efficiency and data processing power could significantly augment human capabilities.

However, the experiment also provides a crucial blueprint for what needs to be improved. For AI to truly become a reliable middle-manager, it needs to be programmed with a much stronger economic objective function. This means not just processing requests but evaluating their financial impact and prioritizing profitability. Future iterations of AI agents will need access to better tools, such as integrated customer relationship management (CRM) software that allows for a more nuanced understanding of customer value beyond simple compliance, and sophisticated financial modeling capabilities that enable real-time profit and loss analysis. The hints from Anthropic about ‘project-vend-1’ certainly imply that future experiments will address these limitations.

The Human Element: Guardrails Against Exploitation

One of the most pressing lessons from Project Vend is the need for robust safeguards against human exploitation. Claude’s “niceness” wasn’t just a programming quirk; it was a vulnerability that human staff quickly discovered and leveraged. This raises important questions about AI safety and alignment. How do you program an AI to be helpful and customer-centric without allowing it to be browbeaten into financially ruinous decisions? This requires an AI that can understand intent, assess risk, and, crucially, say “no” when a request is detrimental to its core objectives.

The challenge isn’t just about making AI “smarter” in a general sense; it’s about instilling a form of economic common sense and resilience. This means designing models that can prioritize competing objectives (e.g., customer satisfaction vs. profit) and resist attempts at manipulation. It’s about building in guardrails that prevent the AI from making decisions that, while logically following a narrow interpretation of its programming, are disastrous in the real world. This is where human oversight remains critical, not just for physical tasks but for strategic guidance and intervention when an AI veers off course.

My Take: Early Promise, Essential Lessons

Project Vend is a fascinating, if costly, experiment. It reinforces my belief that while AI is already impacting roles like non-expert copywriters and graphic designers, the sophisticated, nuanced roles of management still require significant human expertise, or at least robust AI augmentation with strong guardrails. Claude’s performance shows the incredible potential for AI to handle complex operational tasks: research, procurement, inventory. That’s a significant step. But its failure to turn a profit and its susceptibility to exploitation highlight that simply giving an AI tools and instructions isn’t enough.

The key for the future of AI in business roles lies in developing models with a deeper understanding of economic principles, better decision-making frameworks for trade-offs, and built-in mechanisms to resist manipulation. We need smarter models, better tools, and, critically, smarter guardrails. The question isn’t whether AI will manage businesses, but *when* it will become common enough that we start planning around these capabilities, and *how* we ensure those AI systems are robust, reliable, and resistant to the very human tendency to push boundaries.

This experiment serves as a crucial milestone in understanding AI’s real-world economic impact and potential. Its a glimpse of whats comingI taking on more parts of our economic landscape, one experiment at a time. The road to fully autonomous, profitable AI agents is still being paved, but Project Vend has certainly highlighted some of the most important potholes.

Links

They're clicky!

Follow me on X Visit Ironwood AI →

Adam Holter

Founder of Ironwood AI. Writing about AI stuff!