Table of Contents
Fetching ...

Correlation-Weighted Multi-Reward Optimization for Compositional Generation

Jungmyung Wi, Hyunsoo Kim, Donghyun Kim

Abstract

Text-to-image models produce images that align well with natural language prompts, but compositional generation has long been a central challenge. Models often struggle to satisfy multiple concepts within a single prompt, frequently omitting some concepts and resulting in partial success. Such failures highlight the difficulty of jointly optimizing multiple concepts during reward optimization, where competing concepts can interfere with one another. To address this limitation, we propose Correlation-Weighted Multi-Reward Optimization (\ours), a framework that leverages the correlation structure among concept rewards to adaptively weight each attribute concept in optimization. By accounting for interactions among concepts, \ours balances competing reward signals and emphasizes concepts that are partially satisfied yet inconsistently generated across samples, improving compositional generation. Specifically, we decompose multi-concept prompts into pre-defined concept groups (\eg, objects, attributes, and relations) and obtain reward signals from dedicated reward models for each concept. We then adaptively reweight these rewards, assigning higher weights to conflicting or hard-to-satisfy concepts using correlation-based difficulty estimation. By focusing optimization on the most challenging concepts within each group, \ours encourages the model to consistently satisfy all requested attributes simultaneously. We apply our approach to train state-of-the-art diffusion models, SD3.5 and FLUX.1-dev, and demonstrate consistent improvements on challenging multi-concept benchmarks, including ConceptMix, GenEval 2, and T2I-CompBench.

Correlation-Weighted Multi-Reward Optimization for Compositional Generation

Abstract

Text-to-image models produce images that align well with natural language prompts, but compositional generation has long been a central challenge. Models often struggle to satisfy multiple concepts within a single prompt, frequently omitting some concepts and resulting in partial success. Such failures highlight the difficulty of jointly optimizing multiple concepts during reward optimization, where competing concepts can interfere with one another. To address this limitation, we propose Correlation-Weighted Multi-Reward Optimization (\ours), a framework that leverages the correlation structure among concept rewards to adaptively weight each attribute concept in optimization. By accounting for interactions among concepts, \ours balances competing reward signals and emphasizes concepts that are partially satisfied yet inconsistently generated across samples, improving compositional generation. Specifically, we decompose multi-concept prompts into pre-defined concept groups (\eg, objects, attributes, and relations) and obtain reward signals from dedicated reward models for each concept. We then adaptively reweight these rewards, assigning higher weights to conflicting or hard-to-satisfy concepts using correlation-based difficulty estimation. By focusing optimization on the most challenging concepts within each group, \ours encourages the model to consistently satisfy all requested attributes simultaneously. We apply our approach to train state-of-the-art diffusion models, SD3.5 and FLUX.1-dev, and demonstrate consistent improvements on challenging multi-concept benchmarks, including ConceptMix, GenEval 2, and T2I-CompBench.
Paper Structure (26 sections, 24 equations, 9 figures, 9 tables, 1 algorithm)

This paper contains 26 sections, 24 equations, 9 figures, 9 tables, 1 algorithm.

Figures (9)

  • Figure 1: Motivation of Correlation-based Reweighting. Multi-concept compositional generation often yields different partial successes distributed across generated images. Thus, naively aggregating rewards from each concept tends to focus on concepts that are consistently satisfied across images, while failing to highlight more difficult concepts. Correlation computes concept interactions among generated instances to identify difficult concepts, which are then assigned higher weights during normalized advantage reweighting (, Q3, Q4, and Q5).
  • Figure 2: Overview of Correlation-Weighted Multi-Reward Optimization (CMO). Given a multi-concept prompt, the text-to-image (T2I) model generates a group of images, which are evaluated using a set of concept-specific reward functions covering objects, attributes, and spatial relations. The resulting concept rewards form a Multi-Reward Matrix across generated instances. CMO then computes correlations among reward signals to estimate concept difficulty based on their interactions, assigning higher weights to concepts that are harder to consistently satisfy. The reweighted rewards are normalized and aggregated to guide policy optimization, encouraging the model to jointly satisfy all requested concepts and improving compositional generation.
  • Figure 3: Quantitative Analysis on Multi-Concept Generation Conflicts.Left: The graph compares the average ratio of negative correlations and the overall Full Mark Score on Conceptmix wu2024conceptmix, showing that reducing negative correlations between concepts directly improves multi-concept generation. Right: The graph tracks the percentage of concept pairs exhibiting negative correlation as the number of concepts (task complexity level $K$) increases. Baselines exhibit increasing negative correlations as complexity grows, whereas our method maintains a low negative correlation ratio with correlation-based reweighting.
  • Figure 4: Qualitative Comparison across Varying Prompt Complexity ($K=1 \sim 7$). Baseline models frequently exhibit concept omission or attribute leakage as complexity increases. In contrast, our method consistently maintains high faithfulness and accurate attribute binding even under extreme constraints ($K=7$).
  • Figure 5: Ablation on Hyperparameters $\tau$ and $\beta$. We evaluate the impact of Softmax Temperature (Left) and KL Penalty Coefficient (Right) on compositional generation performance. Both hyperparameters exhibit a clear trade-off. Extreme values lead to failure to follow the fine-grained reward, while $\tau=0.5$ and $\beta=0.015$ strike the optimal balance for maximizing both the Full Mark and Concept Fraction scores.
  • ...and 4 more figures