Table of Contents
Fetching ...

Towards LLM-centric Affective Visual Customization via Efficient and Precise Emotion Manipulating

Jiamin Luo, Xuqian Gu, Jingjing Wang, Jiahong Lu

TL;DR

Comprehensive experimental evaluations on the proposed EPEM approach to the L-AVC task demonstrate the great advantage of the proposed EPEM approach to the L-AVC task over several state-of-the-art baselines.

Abstract

Previous studies on visual customization primarily rely on the objective alignment between various control signals (e.g., language, layout and canny) and the edited images, which largely ignore the subjective emotional contents, and more importantly lack general-purpose foundation models for affective visual customization. With this in mind, this paper proposes an LLM-centric Affective Visual Customization (L-AVC) task, which focuses on generating images within modifying their subjective emotions via Multimodal LLM. Further, this paper contends that how to make the model efficiently align emotion conversion in semantics (named inter-emotion semantic conversion) and how to precisely retain emotion-agnostic contents (named exter-emotion semantic retaining) are rather important and challenging in this L-AVC task. To this end, this paper proposes an Efficient and Precise Emotion Manipulating approach for editing subjective emotions in images. Specifically, an Efficient Inter-emotion Converting (EIC) module is tailored to make the LLM efficiently align emotion conversion in semantics before and after editing, followed by a Precise Exter-emotion Retaining (PER) module to precisely retain the emotion-agnostic contents. Comprehensive experimental evaluations on our constructed L-AVC dataset demonstrate the great advantage of the proposed EPEM approach to the L-AVC task over several state-of-the-art baselines. This justifies the importance of emotion information for L-AVC and the effectiveness of EPEM in efficiently and precisely manipulating such information.

Towards LLM-centric Affective Visual Customization via Efficient and Precise Emotion Manipulating

TL;DR

Comprehensive experimental evaluations on the proposed EPEM approach to the L-AVC task demonstrate the great advantage of the proposed EPEM approach to the L-AVC task over several state-of-the-art baselines.

Abstract

Previous studies on visual customization primarily rely on the objective alignment between various control signals (e.g., language, layout and canny) and the edited images, which largely ignore the subjective emotional contents, and more importantly lack general-purpose foundation models for affective visual customization. With this in mind, this paper proposes an LLM-centric Affective Visual Customization (L-AVC) task, which focuses on generating images within modifying their subjective emotions via Multimodal LLM. Further, this paper contends that how to make the model efficiently align emotion conversion in semantics (named inter-emotion semantic conversion) and how to precisely retain emotion-agnostic contents (named exter-emotion semantic retaining) are rather important and challenging in this L-AVC task. To this end, this paper proposes an Efficient and Precise Emotion Manipulating approach for editing subjective emotions in images. Specifically, an Efficient Inter-emotion Converting (EIC) module is tailored to make the LLM efficiently align emotion conversion in semantics before and after editing, followed by a Precise Exter-emotion Retaining (PER) module to precisely retain the emotion-agnostic contents. Comprehensive experimental evaluations on our constructed L-AVC dataset demonstrate the great advantage of the proposed EPEM approach to the L-AVC task over several state-of-the-art baselines. This justifies the importance of emotion information for L-AVC and the effectiveness of EPEM in efficiently and precisely manipulating such information.
Paper Structure (19 sections, 7 equations, 4 figures, 2 tables)

This paper contains 19 sections, 7 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The overall architecture of our proposed EPEM approach (a). Wherein (b) shows the process of model editing in the EIC module (see Section \ref{['sec:EIC']}), while (c) and (d) show the process of interaction between MLLM and diffusion in the PER module (see Section \ref{['sec:PER']}), and FC represents the fully-connected layer.
  • Figure 2: A histogram (a) to illustrate statistics of our L-AVC dataset before and after editing, and two fan charts to show the emotional distribution of different visual elements, with (b) and (c) representing pre-editing and post-editing.
  • Figure 3: Two samples to illustrate the precise inter-emotion conversion manipulating ((a), (b), (c)) and exter-emotion contents retraining ((d), (e), (f)) two challenges.
  • Figure 4: Qualitative comparison of our EPEM approach and several advanced visual customization models. The boxes and texts in blue and red indicate the visual elements before and after editing via our EPEM approach.