Morphological Addressing of Identity Basins in Text-to-Image Diffusion Models

Andrew Fraser

Morphological Addressing of Identity Basins in Text-to-Image Diffusion Models

Andrew Fraser

TL;DR

It is established that morphological structure -- whether in feature descriptors or prompt-level phonological form -- creates systematic navigational gradients through diffusion model latent spaces through diffusion model latent spaces.

Abstract

We demonstrate that morphological pressure creates navigable gradients at multiple levels of the text-to-image generative pipeline. In Study~1, identity basins in Stable Diffusion 1.5 can be navigated using morphological descriptors -- constituent features like platinum blonde,'' beauty mark,'' and 1950s glamour'' -- without the target's name or photographs. A self-distillation loop (generating synthetic images from descriptor prompts, then training a LoRA on those outputs) achieves consistent convergence toward a specific identity as measured by ArcFace similarity. The trained LoRA creates a local coordinate system shaping not only the target identity but also its inverse: maximal away-conditioning produces eldritch'' structural breakdown in base SD1.5, while the LoRA-equipped model produces ``uncanny valley'' outputs -- coherent but precisely wrong. In Study~2, we extend this to prompt-level morphology. Drawing on phonestheme theory, we generate 200 novel nonsense words from English sound-symbolic clusters (e.g., \emph{cr-}, \emph{sn-}, \emph{-oid}, \emph{-ax}) and find that phonestheme-bearing candidates produce significantly more visually coherent outputs than random controls (mean Purity@1 = 0.371 vs.\ 0.209, p<0.00001p < 0.00001 p<0.00001, Cohen's d=0.55d = 0.55 d=0.55). Three candidates -- \emph{snudgeoid}, \emph{crashax}, and \emph{broomix} -- achieve perfect visual consistency (Purity@1 = 1.0) with zero training data contamination, each generating a distinct, coherent visual identity from phonesthetic structure alone. Together, these studies establish that morphological structure -- whether in feature descriptors or prompt-level phonological form -- creates systematic navigational gradients through diffusion model latent spaces. We document phase transitions in identity basins, CFG-invariant identity stability, and novel visual concepts emerging from sub-lexical sound patterns.

Morphological Addressing of Identity Basins in Text-to-Image Diffusion Models

TL;DR

Abstract

Paper Structure (48 sections, 11 figures, 10 tables)

This paper contains 48 sections, 11 figures, 10 tables.

Introduction
Related Work
Memorization in Diffusion Models
Personalization Methods
Latent Space Structure in Diffusion Models
Negative Prompting and Inverse Conditioning
Compositional Generalization and Sound Symbolism
Study 1: Identity Basin Navigation via Training-Level Morphology
Morphological Descriptor Design
Self-Distillation Training Loop
Push-Pull Conditioning Protocol
Full prompt specifications.
Evaluation Metrics
Study 1 Results
Morphological Addressing Achieves Identity Convergence
...and 33 more sections

Figures (11)

Figure 1: Morphological addressing via descriptor intersection. Each natural-language descriptor (e.g., "platinum blonde," "beauty mark," "1950s glamour") defines a region in latent space. Their intersection addresses a specific identity basin without requiring the target's name or reference photographs.
Figure 2: Grok name-based generation. Direct prompting with "Marilyn Monroe" produces outputs closely resembling archival photographs.
Figure 3: Morphological descriptors---"platinum blonde curled hair, beauty mark, 1950s glamour, white halter dress"---navigate to the same identity basin but generate synthetic outputs within that aesthetic space rather than reproducing training images.
Figure 4: The self-distillation training loop. Starting from morphological descriptors alone, the model iteratively generates images, curates outputs, trains a LoRA adapter on its own successful outputs, and refines prompts. No target name or reference photographs are used at any stage. Hit rate improved from 8% to 70% across four rounds.
Figure 5: Training progression across four rounds of self-distillation. Round 1 (top) shows high variance with only 8.1% of outputs approximating the target. By Round 4 (bottom), outputs exhibit binary behavior---landing clearly in the target basin or ejecting entirely.
...and 6 more figures

Morphological Addressing of Identity Basins in Text-to-Image Diffusion Models

TL;DR

Abstract

Morphological Addressing of Identity Basins in Text-to-Image Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (11)