Simon Willison just dropped a new analogy for large language models that actually makes sense: LLMs are lossy encyclopedias. They compress massive amounts of information, but that compression loses details—especially the specific, technical stuff you actually need when you’re trying to get real work done.
This hit me when I saw someone on Hacker News asking why an LLM couldn’t just “Create a boilerplate Zephyr project skeleton, for Pi Pico with st7789 spi display drivers configured.” That’s exactly the kind of hyper-specific technical setup where the lossiness becomes a problem. The LLM might know general concepts about Zephyr RTOS and Raspberry Pi Pico, but the exact configuration steps for a particular display driver? That’s getting into lossless encyclopedia territory.
Understanding the Compression Problem: More Than a Blurry JPEG
When we talk about LLMs as lossy encyclopedias, we’re building on Ted Chiang’s insight that these models are like “blurry JPEGs of the web.” They capture the general shape and patterns of information, but fine details get smoothed over or lost entirely during training. This isn’t just about a fuzzy image; it’s about the fundamental trade-off of compression.
Think about what happens when you compress an image with JPEG. You keep the overall structure—you can still recognize a face or a building—but if you zoom in looking for specific details like individual text characters or precise measurements, you’ll find approximations instead of exact data. The algorithm decides what information is most important to retain for general viewing and discards what it deems less critical to reduce file size. This is a deliberate process, not an accidental one.
LLMs excel at general knowledge but struggle with specific technical details and edge cases.
LLMs work the same way. They’re excellent at answering broad questions about programming concepts, explaining general workflows, or helping you understand why certain approaches work. Ask them to write a basic web scraper or explain how databases work, and you’ll get solid, useful information. This is because these concepts are widely documented and appear frequently in their training data, making them prime candidates for retention even after lossy compression.
But ask them to generate the exact configuration for a Zephyr RTOS project with specific hardware drivers, and you’re asking for precision that the compression process likely eliminated. The model knows about embedded systems, but the specific steps, file structures, and configuration parameters for your exact setup are details that got lost in the compression. These niche details are less common in the training data, making them more susceptible to being discarded or generalized.
Developing Intuition for What Works: The User’s Role
The key insight from Willison’s analogy is that you need to develop an intuition for what kinds of questions play to an LLM’s strengths versus where the lossiness becomes a liability. This isn’t about the LLM being good or bad; it’s about understanding its fundamental nature as a tool.
Questions LLMs handle well:
- Explaining general programming concepts or paradigms.
- Writing common code patterns or boilerplate for standard tasks.
- Debugging logic errors in standard, well-documented scenarios.
- Suggesting architectural approaches for common software systems.
- Translating between well-documented programming languages.
- Summarizing large bodies of text on general topics.
Questions where lossiness hurts:
- Specific configuration files for niche, obscure, or recently updated tools.
- Exact command sequences for particular hardware setups or proprietary systems that aren’t widely publicized.
- Recent changes to APIs or tools that haven’t been incorporated into the training data.
- Edge cases in obscure libraries or undocumented features.
- Platform-specific quirks, workarounds, or highly optimized routines.
- Generating code that requires a deep understanding of external, current documentation not present in its training.
The Zephyr project example falls squarely in the second category. Setting up Zephyr RTOS on a Raspberry Pi Pico with st7789 SPI display drivers requires knowing specific configuration parameters, exact file structures, and potentially hardware-specific quirks that aren’t broadly documented or discussed online in a way that would survive lossy compression. It’s a task that demands precise, lossless information.
The Right Way to Handle Specific Technical Tasks: Provide the Source
Willison’s solution is simple but important: don’t expect the LLM to just know extremely specific facts. Instead, treat it as a tool that can act on facts you provide to it. This is the crucial shift in mindset for effective LLM use in technical domains.
This means instead of asking the LLM to generate a complete Zephyr project setup from scratch, you’d:
- Find or create a working example of a similar setup, ideally from official documentation or a trusted community source.
- Provide that example to the LLM as explicit context in your prompt.
- Ask it to modify, adapt, or explain the example for your specific needs, referencing the provided text.
This approach works because you’re shifting from asking the LLM to recall precise technical details from its compressed knowledge to asking it to process and manipulate information you’ve given it. The latter is much more in line with what these models actually do. They are excellent at pattern matching, rephrasing, and adapting given text, but they are not infallible knowledge bases for every obscure detail.
| Approach | Success Rate | Why |
|---|---|---|
| Ask LLM to generate from scratch (specific task) | Low | Relies on lossy compression of specific details, prone to fabrication. |
| Provide working example + ask for modifications | High | LLM processes given information rather than recalling, reducing hallucination. |
| Combine external documentation + LLM analysis | Very High | Best of both: authoritative source for facts + intelligent processing for context and adaptation. |
Success rates for different approaches to getting technical help from LLMs.
Why This Analogy Matters More Than Others I’ve Encountered
I collect questionable analogies for LLMs, and most of them are terrible. People compare them to everything from calculators to crystal balls, usually missing the mark entirely. The lossy encyclopedia analogy actually captures something important about how these systems work and fail, without resorting to hyperbole.
The compression aspect explains both the strengths and limitations. Just like how JPEG compression keeps images usable while reducing file size, LLMs compress human knowledge into something computationally manageable while keeping it broadly useful. And just like JPEG, the trade-off is that you lose fidelity in specific details. This isn’t a bug; it’s a feature of the underlying architecture.
This helps explain why LLMs can be surprisingly good at some tasks and frustratingly bad at others. It’s not random—there’s a pattern based on how well the information survived the compression process. General concepts are robust, specific details are fragile.
Real-World Implications for Technical Workflows
Understanding LLMs as lossy encyclopedias changes how you should approach using them for technical work. Instead of treating them as authoritative sources for specific technical procedures, treat them as intelligent processors of information you provide. This is a subtle but critical distinction.
For the Zephyr example, your workflow becomes:
- Find the official Zephyr documentation for Pi Pico projects.
- Locate example configurations for SPI displays (like st7789).
- Feed this information directly to the LLM along with your specific requirements.
- Ask it to help you adapt and combine the examples, ensuring it references the provided text.
- Use it to explain any parts you don’t understand, again, based on the documentation you’ve supplied.
This approach is more reliable because you’re working with authoritative sources for the specific details while getting LLM help with the analysis, adaptation, and explanation—tasks that don’t require perfect recall of technical minutiae. It’s a collaborative process where the LLM acts as an incredibly fast and capable assistant, not an oracle.
The Broader Context of AI Tool Usage: Augmentation, Not Replacement
This connects to a broader principle about AI tool usage that I see many people missing. AI isn’t replacing human expertise—it’s augmenting it. The most effective AI workflows combine authoritative human knowledge with AI processing power. This is true for automation systems, coding agents, and even creative tools.
In my experience with automation systems, the key is understanding where each tool excels. LLMs are great at processing, analyzing, and adapting information. They’re terrible at being authoritative sources for specific technical details, especially for niche or recent developments. They simply don’t have the current, lossless access to that information.
The lossy encyclopedia analogy helps you calibrate your expectations correctly. You wouldn’t expect a compressed image to give you pixel-perfect details, and you shouldn’t expect an LLM to give you perfect technical specifications for obscure setups. It’s about understanding the tool’s inherent limitations and designing your workflow to account for them.
I often hear people ask, “Are AI models getting smarter or just better at delivering expected responses?” My answer is: Yes, they’re getting smarter. But ‘smarter’ in the context of LLMs often means better at generalizing and pattern matching across vast datasets, which inherently involves some degree of lossy compression. It doesn’t mean perfect recall of every specific fact. The intelligence is in the synthesis, not the exact memory.
Testing the Boundaries of Lossiness: Practical Application
One practical way to apply this insight is to actively test where the lossiness becomes a problem for your specific use cases. Try asking your LLM of choice to generate specific technical configurations in your domain. See where it gets things wrong, provides outdated information, or confidently hallucinates.
This isn’t about proving the LLM is bad—it’s about mapping where the compression artifacts show up in your field. Once you know those boundaries, you can design workflows that work around them rather than running into them repeatedly. For instance, if you’re working with a new API, you’ll know that relying solely on the LLM’s internal knowledge for code examples is a bad idea. Instead, you’d feed it the current API documentation.
For embedded systems work, you’ll probably find that LLMs are great at explaining concepts, debugging logic, and helping with general code structure. They’ll be less reliable for specific hardware configurations, exact pin mappings, or recent changes to development tools like Zephyr RTOS itself. This is where human expertise, combined with up-to-date documentation, remains indispensable.
The Future of Technical AI Assistance: Reducing the Loss
Understanding LLMs as lossy encyclopedias also suggests where the technology might go next. The obvious solutions involve reducing the lossiness through better training data, retrieval-augmented generation (RAG), or specialized models trained on specific technical domains. We’re already seeing this with models that can access current information or specialized coding models that perform better on specific technical tasks. These approaches work by reducing the compression loss for particular domains or providing access to authoritative sources at inference time.
For instance, a RAG system effectively gives the LLM a temporary “lossless” appendix to its encyclopedia for the specific query, allowing it to retrieve exact facts before generating a response. This is a powerful combination: the LLM’s ability to synthesize and understand, coupled with an external, up-to-date knowledge base. It’s not about making the LLM itself lossless, but about giving it lossless data to work with when needed.
But even as the technology improves, the core insight remains valuable. Any compression scheme involves trade-offs, and understanding those trade-offs helps you use the tools more effectively. We should be realistic about what these models are and what they are not.
Making Peace with Lossy Compression: A Practical Mindset
The lossy encyclopedia analogy isn’t just about limitations—it’s about setting appropriate expectations. LLMs are incredibly useful tools when you understand what they’re actually doing. They’re compressing vast amounts of human knowledge into something that can fit in a model and still provide useful responses. That’s actually remarkable. The fact that this compression works at all for general knowledge tasks is surprising. The fact that it sometimes fails for highly specific technical details shouldn’t be.
The key is matching your approach to the nature of the tool. When you need authoritative, specific technical information, go to authoritative sources. When you need help processing, understanding, or adapting that information, that’s where LLMs shine. This isn’t a weakness of the LLM; it’s a characteristic. Just as you wouldn’t use a hammer to drive a screw, you shouldn’t ask an LLM for pixel-perfect recall of niche, current data without providing that data.
Willison’s analogy gives us a mental model that explains both the successes and failures of LLMs in technical contexts. More importantly, it suggests better ways to work with these tools by understanding their fundamental nature rather than expecting them to be something they’re not. It reminds us that even with AI, the quality of the input often dictates the quality of the output.