Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment

Anh Bui; Trang Vu; Trung Le; Junae Kim; Tamas Abraham; Rollin Omari; Amar Kaur; Dinh Phung

Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment

Anh Bui, Trang Vu, Trung Le, Junae Kim, Tamas Abraham, Rollin Omari, Amar Kaur, Dinh Phung

TL;DR

This paper proposes a simple yet effective training-free method that adjusts the magnitude and direction of pre-trained embedding at inference time, effectively mitigating the semantic collapsing problem in generative personalization.

Abstract

In this paper, we investigate the semantic collapsing problem in generative personalization, an under-explored topic where the learned visual concept ($V$) gradually shifts from its original textual meaning and comes to dominate other concepts in multi-concept input prompts. This issue not only reduces the semantic richness of complex input prompts like "a photo of $V$ wearing glasses and playing guitar" into simpler, less contextually rich forms such as "a photo of $V$" but also leads to simplified output images that fail to capture the intended concept. We identify the root cause as unconstrained optimisation, which allows the learned embedding $V$ to drift arbitrarily in the embedding space, both in direction and magnitude. To address this, we propose a simple yet effective training-free method that adjusts the magnitude and direction of pre-trained embedding at inference time, effectively mitigating the semantic collapsing problem. Our method is broadly applicable across different personalization methods and demonstrates significant improvements in text-image alignment in diverse use cases. Our code is anonymously published at https://github.com/tuananhbui89/Embedding-Adjustment

Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment

TL;DR

Abstract

In this paper, we investigate the semantic collapsing problem in generative personalization, an under-explored topic where the learned visual concept (

) gradually shifts from its original textual meaning and comes to dominate other concepts in multi-concept input prompts. This issue not only reduces the semantic richness of complex input prompts like "a photo of

wearing glasses and playing guitar" into simpler, less contextually rich forms such as "a photo of

" but also leads to simplified output images that fail to capture the intended concept. We identify the root cause as unconstrained optimisation, which allows the learned embedding

to drift arbitrarily in the embedding space, both in direction and magnitude. To address this, we propose a simple yet effective training-free method that adjusts the magnitude and direction of pre-trained embedding at inference time, effectively mitigating the semantic collapsing problem. Our method is broadly applicable across different personalization methods and demonstrates significant improvements in text-image alignment in diverse use cases. Our code is anonymously published at https://github.com/tuananhbui89/Embedding-Adjustment

Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment

TL;DR

Abstract

Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (31)