Geometry of Knowledge Allows Extending Diversity Boundaries of Large Language Models
Mateusz Bystroński, Doheon Han, Nitesh V. Chawla, Tomasz Kajdanowicz
TL;DR
The paper tackles the problem of limited generative diversity in large language models by introducing a plug-in, fine-tuning-free approach: continuous semantic conditioning along a structured embedding manifold. It constructs a latent variable $z$ from anchor-generated semantic anchors using interpolation, then maps $z$ into the LLM's embedding space via a multimodal projector (xRAG-style) to condition generation. This latent conditioning expands the semantic variance of outputs without sacrificing quality, demonstrated on NoveltyBench and the AUT divergent-thinking task, with analyses showing robust gains and favorable trade-offs depending on anchor choice and interpolation strength. The approach reframes diversity as geometric exploration in semantic space, enabling metaheuristic search and offering a scalable path to enhanced creativity in language models while avoiding parameter updates to the base model.
Abstract
Starting from the hypothesis that knowledge in semantic space is organized along structured manifolds, we argue that this geometric structure renders the space explorable. By traversing it and using the resulting continuous representations to condition an LLM's generation distribution, we can systematically expand the model's reachable semantic range. We introduce a framework that requires no modification of LLM parameters and operationalizes this idea by constructing a conditioning distribution from a small set of diverse anchor generations. This distribution conditions LLM's generation via an xRAG-style projector. Our experiments demonstrate that this manifold-based conditioning substantially increases generative diversity, with direct benefits for enhancing divergent thinking, a core facet of creativity, in language models.
