Table of Contents
Fetching ...

SYNTHIA: Novel Concept Design with Affordance Composition

Hyeonjeong Ha, Xiaomeng Jin, Jeonghwan Kim, Jiateng Liu, Zhenhailong Wang, Khanh Duy Nguyen, Ansel Blume, Nanyun Peng, Kai-Wei Chang, Heng Ji

TL;DR

Synthia addresses the challenge of generating visually novel yet functionally coherent concepts by introducing a hierarchical concept ontology and an affordance-based curriculum that gradually teaches composition of multiple affordances. It fine-tunes diffusion-based T2I systems with a triplet-contrastive objective and leverages pseudo-novel concepts to enforce novelty while preserving functionality. Across automatic and human evaluations, Synthia outperforms strong baselines in faithfulness, novelty, practicality, and coherence, demonstrating substantial gains in both novelty (25.1%) and functional coherence (14.7%). The approach enables direct affordance-based prompting and has potential to substantially improve AI-driven design by grounding generation in functional structure rather than purely visual aesthetics.

Abstract

Text-to-image (T2I) models enable rapid concept design, making them widely used in AI-driven design. While recent studies focus on generating semantic and stylistic variations of given design concepts, functional coherence--the integration of multiple affordances into a single coherent concept--remains largely overlooked. In this paper, we introduce SYNTHIA, a framework for generating novel, functionally coherent designs based on desired affordances. Our approach leverages a hierarchical concept ontology that decomposes concepts into parts and affordances, serving as a crucial building block for functionally coherent design. We also develop a curriculum learning scheme based on our ontology that contrastively fine-tunes T2I models to progressively learn affordance composition while maintaining visual novelty. To elaborate, we (i) gradually increase affordance distance, guiding models from basic concept-affordance association to complex affordance compositions that integrate parts of distinct affordances into a single, coherent form, and (ii) enforce visual novelty by employing contrastive objectives to push learned representations away from existing concepts. Experimental results show that SYNTHIA outperforms state-of-the-art T2I models, demonstrating absolute gains of 25.1% and 14.7% for novelty and functional coherence in human evaluation, respectively.

SYNTHIA: Novel Concept Design with Affordance Composition

TL;DR

Synthia addresses the challenge of generating visually novel yet functionally coherent concepts by introducing a hierarchical concept ontology and an affordance-based curriculum that gradually teaches composition of multiple affordances. It fine-tunes diffusion-based T2I systems with a triplet-contrastive objective and leverages pseudo-novel concepts to enforce novelty while preserving functionality. Across automatic and human evaluations, Synthia outperforms strong baselines in faithfulness, novelty, practicality, and coherence, demonstrating substantial gains in both novelty (25.1%) and functional coherence (14.7%). The approach enables direct affordance-based prompting and has potential to substantially improve AI-driven design by grounding generation in functional structure rather than purely visual aesthetics.

Abstract

Text-to-image (T2I) models enable rapid concept design, making them widely used in AI-driven design. While recent studies focus on generating semantic and stylistic variations of given design concepts, functional coherence--the integration of multiple affordances into a single coherent concept--remains largely overlooked. In this paper, we introduce SYNTHIA, a framework for generating novel, functionally coherent designs based on desired affordances. Our approach leverages a hierarchical concept ontology that decomposes concepts into parts and affordances, serving as a crucial building block for functionally coherent design. We also develop a curriculum learning scheme based on our ontology that contrastively fine-tunes T2I models to progressively learn affordance composition while maintaining visual novelty. To elaborate, we (i) gradually increase affordance distance, guiding models from basic concept-affordance association to complex affordance compositions that integrate parts of distinct affordances into a single, coherent form, and (ii) enforce visual novelty by employing contrastive objectives to push learned representations away from existing concepts. Experimental results show that SYNTHIA outperforms state-of-the-art T2I models, demonstrating absolute gains of 25.1% and 14.7% for novelty and functional coherence in human evaluation, respectively.

Paper Structure

This paper contains 48 sections, 5 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Effect of Affordance Sampling on Novel Concept Generation. Our affordance sampling strategy selects disparate affordance pairs within our ontology, promoting novel functional coherence rather than redundant combinations. Baseline models tend to generate existing concepts for close affordances (Fig. \ref{['fig:close']}) but struggle with distant pairs, often introducing multiple objects or omitting functions (Fig. \ref{['fig:distant']}). In contrast, our models consistently generate functionally coherent novel concepts, achieving higher novelty scores for distant affordance pairs.
  • Figure 2: Synthia: Novel Concept Design with Affordance Composition.Synthia comprises three stages: (1) Affordance composition curriculum construction, (2) Affordance-based curriculum learning, and (3) Evaluation. In the first stage, we build a training curriculum through sampling affordance pairs from our ontology by gradually increasing the affordance distances. Using our curriculum, we fine-tune T2I models, where they first learn concept-affordance associations from easy data, then integrate multiple affordances into a single functional form from hard data. We employ a contrastive objective with positive (affordances), negative (concepts) constraints, and corresponding images, enforcing visual novelty different from existing concepts. Finally, we evaluate models through automatic evaluation and human evaluation with four metrics: faithfulness, and novelty, practicality, coherence.
  • Figure 3: Results of the relative automatic evaluation. We compare the quality of concepts generated from our models and baselines with ones generated from our data generation pipeline (§\ref{['sec:data_gen']}). Numbers indicate the percentage (%) of baseline model wins, ties, and DALL-E model wins.
  • Figure 4: Ablation with different number of training data. We show the absolute automatic evaluation results of Synthia trained with different number of data.
  • Figure 5: Effectiveness of curriculum learning. We show learning curves of Synthia with different training methods. The X-axis represents training steps.
  • ...and 4 more figures