The Idola Tribus of AI: Large Language Models tend to perceive order where none exists
Shin-nosuke Ishikawa, Masato Todo, Taiki Ogihara, Hirotsugu Ohba
TL;DR
This study investigates whether large language models (LLMs) exhibit Idola Tribus—a bias toward perceiving order where none exists—while identifying regularities in number sequences. It designs eight sequence categories totaling 724 series and evaluates five state-of-the-art LLMs under two prompting configurations to elicit regularity descriptions, using an LLM-as-a-judge to assess 3,620 descriptions. The results show strong performance on clearly ordered sequences but persistent over-interpretation in quasi-ordered and random sequences, including evidence that even thinking-enabled models can validate false patterns. The findings highlight a practical risk in applying LLMs to inductive reasoning tasks and suggest mitigation strategies like explicit uncertainty prompts and targeted fine-tuning, while calling for broader evaluation and bias-reduction efforts in future work.
Abstract
We present a tendency of large language models (LLMs) to generate absurd patterns despite their clear inappropriateness in a simple task of identifying regularities in number series. Several approaches have been proposed to apply LLMs to complex real-world tasks, such as providing knowledge through retrieval-augmented generation and executing multi-step tasks using AI agent frameworks. However, these approaches rely on the logical consistency and self-coherence of LLMs, making it crucial to evaluate these aspects and consider potential countermeasures. To identify cases where LLMs fail to maintain logical consistency, we conducted an experiment in which LLMs were asked to explain the patterns in various integer sequences, ranging from arithmetic sequences to randomly generated integer series. While the models successfully identified correct patterns in arithmetic and geometric sequences, they frequently over-recognized patterns that were inconsistent with the given numbers when analyzing randomly generated series. This issue was observed even in multi-step reasoning models, including OpenAI o3, o4-mini, and Google Gemini 2.5 Flash Preview Thinking. This tendency to perceive non-existent patterns can be interpreted as the AI model equivalent of Idola Tribus and highlights potential limitations in their capability for applied tasks requiring logical reasoning, even when employing chain-of-thought reasoning mechanisms.
