Professional DSLR photo of a modern testing lab environment. Rows of high-end computer workstations with multiple monitors displaying data visualizations and maps. Shot on Canon EOS R5, 24mm f/1.4 lens, shallow depth of field, dramatic lighting from overhead LED panels, cool blue color grading.
Created using Ideogram 2.0 Turbo with the prompt, "Professional DSLR photo of a modern testing lab environment. Rows of high-end computer workstations with multiple monitors displaying data visualizations and maps. Shot on Canon EOS R5, 24mm f/1.4 lens, shallow depth of field, dramatic lighting from overhead LED panels, cool blue color grading."

AI Model Geolocation Testing on LMArena: Comparing Accuracy Across Platforms

A member of the MattVidPro Discord community recently ran an interesting test comparing how well different AI models could determine a photo’s location. Using a photo with no obvious location markers or metadata, the results showed notable differences in performance.

The Gremlin model achieved perfect accuracy on its third attempt, identifying the location with 0km error by analyzing a statue in the photo. Claude 3 Sonnet followed at 58km away, with GPT-4 mini at 111km.

Here’s the full ranking by accuracy:
– Gremlin: 0km
– Claude 3.5 Sonnet: 58km
– GPT-4o mini: 111km
– Picarta AI: 114km
– Qwen2 & Llama 3.2: 141km
– ChatGPT 4o: 143km
– Goblin: 156km
– GeoSpy Plus, Gemini Flash & Grok: 173km
– GPT-4o (older): 179km
– Anonymous Chatbot: 189km
– Mistral: 225km
– Gemini Exp: 284km

The tester used the same prompt across all models: analyze the photo thoroughly and state where you think it was taken, being as specific as possible down to the city level.

Gremlin demonstrated exceptional capabilities in visual analysis. While other models performed basic object detection, Gremlin identified subtle details that even humans might overlook. The Enigma model has yet to be tested, which could add another interesting data point to this comparison.

This test, conducted on LMArena (formerly known as LMSYS Chatbot Arena), provides valuable data about the current state of AI visual analysis capabilities. For tasks requiring detailed image analysis, newer models are showing impressive improvements in accuracy and detail recognition.

If you’re interested in more AI model comparisons, check out my recent piece on Claude vs ChatGPT [https://adam.holter.com/chatgpt-vs-claude-part-6-chatgpt-turns-two/] or my analysis of Google’s Gemini [https://adam.holter.com/googles-gemini-ai].

PS: Special thanks to binger from the MattVidPro Discord for running these tests and sharing the results. You can check out the sources on X here: @legit_rumors and @idontexist_nn.