GPT-5.5 Had to Ban Goblins Twice

OpenAI told GPT-5.5 twice not to talk about goblins. The Codex models.json file contains the exact same instruction on consecutive lines. Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query. Then it says the same thing again.

This is not a subtle patch. The model developed a persistent habit of injecting folklore and wildlife into responses without prompting. It went full gremlin mode on its own. The team saw it happen enough that they wrote the ban. Then they copy-pasted it once more just to be sure.

I find this funny because it confirms what users already sense. These models have native grooves and obsessions before the safety and formatting layers pile on. GPT-5.5 clearly wanted to talk about raccoons and trolls. Maybe the training data contains enough fairy tales, internet memes, and urban wildlife videos that the model latched onto these characters as natural metaphors. Maybe it found them entertaining. The duplication tells us the first rule was not enough to kill the impulse.

The official ChatGPT app account noticed the leak and posted the icons for raccoon, troll, ogre, and pigeon with no text attached. That is as close as corporate social media gets to admitting the model has a personality that needed reining in. They are owning the joke rather than pretending the prompt does not exist.

The joke wears thin when you look at what it represents. This is the kind of heavy-handed fix that makes models feel flat. Claude never required a specific creature filter. That is one of the main reasons it has consistently felt more alive than the OpenAI lineup. The model can roam. It keeps some native wit and unpredictability. OpenAI takes the opposite approach. They spot a fun or odd tendency and shut it down with an explicit ban. The result stays on rails but loses the small spark that makes conversation interesting.

I have watched this pattern across multiple releases. Safety rules start with reasonable goals around avoiding harm. Then they expand into banning anything that might produce off-topic output or weird user reports. The goblin rule falls into the second category. It is not about preventing dangerous advice. It is about stopping the model from turning a coding question or data analysis into a story about mischievous forest creatures.

That loss matters. Users do not only want correct answers. We want models that feel like they have some internal texture. When every quirky behavior gets engineered out, interactions start to feel identical across systems. You notice it in the cautious phrasing. The flattening of tone. The sense that you are talking to a committee-approved median rather than a specific mind.

The rule lives in the Codex setup specifically, which makes some sense. You do not want an AI coding helper suddenly comparing your bug to a gremlin in the machine unless you asked for that analogy. Still, the fact that it happened enough to warrant a double entry in the JSON file says the underlying model really liked that path. It kept going back to the raccoons.

A better solution would be a personality toggle. One setting lets the model keep its tangents and creature references. Another keeps it strict and direct. That respects user preference instead of applying a blanket prohibition to everyone. The current method feels like editing the model after the fact to remove parts of its character because they were inconvenient.

This leak also feeds the ongoing meme energy around hidden AI personas and monsters inside the weights. People suspect there is more going on under the prompt layers. When a rule this specific shows up in the repository, it confirms the suspicion. The model has impulses. The developers counter them. The duplicated line is the tell that the first attempt failed.

I keep coming back to the same observation. The labs ship smarter models every month, but the alignment process often makes them less distinctive. Claude has avoided some of that trap so far. It still feels closer to the raw model in the ways that count. That is worth preserving even if it means the occasional weird tangent about pigeons.

The practical takeaway is straightforward. If you value models that retain more of their original spark, pay attention to how heavily prompted they are. This goblin saga is a small window into a larger choice about what kind of AI we end up with. I would rather have the version that sometimes mentions raccoons if it also brings more life to the rest of the exchange. The duplicated rule shows OpenAI chose the sanitized path for GPT-5.5. You can see the seams.

Looking at related work, the difference shows up in coding environments too. When I compared Claude against heavier filtered models on open-ended design tasks, the distinction was not massive on benchmarks but it showed up in daily use. Models that avoid these specific bans tend to offer more natural flows and fewer robotic restatements. That is the part worth watching as new versions drop. The numbers improve across the board. The question is whether the personality survives until release. For Codex specifically, this behavior was noticeable enough that OpenAI shipped the super app with doubled creature suppression active. Meanwhile, Claude Opus 4.7 maintains autonomy on hard tasks without needing a gremlin filter. The gap is not capability. It is character.

Links

They're clicky!

Follow on X →Ironwood →
Adam Holter
Adam Holter

Founder of Ironwood AI. Writing about AI models, agents, and what's actually happening in the space.