CoCoG-2: Controllable generation of visual stimuli for understanding human concept representation

Chen Wei; Jiachen Zou; Dietmar Heinke; Quanying Liu

CoCoG-2: Controllable generation of visual stimuli for understanding human concept representation

Chen Wei, Jiachen Zou, Dietmar Heinke, Quanying Liu

TL;DR

CoCoG-2 addresses how to controllably generate visual stimuli in the space of human concepts to study how concept representations influence behavior. It extends the prior CoCoG framework by introducing training-free guidance that decomposes generation into prior distributions and likelihood constraints, enabling flexible manipulation of concepts, semantics, and judgments without retraining. The method combines a concept encoder/decoder with CLIP embeddings in a two-stage diffusion process and supports multiple guidance strategies (concept, smoothness, semantics, judgment, uncertainty, pixel) and improvements (adaptive gradient scheduling, resampling). Experiments demonstrate diverse generation, smooth concept transitions, robust image editing, behavioral manipulation of similarity judgments, and information-maximizing designs for individual preferences, validating causal links between concepts and behavior. The approach offers a versatile toolkit for cognitive science experiments and AI-driven stimulus design.

Abstract

Humans interpret complex visual stimuli using abstract concepts that facilitate decision-making tasks such as food selection and risk avoidance. Similarity judgment tasks are effective for exploring these concepts. However, methods for controllable image generation in concept space are underdeveloped. In this study, we present a novel framework called CoCoG-2, which integrates generated visual stimuli into similarity judgment tasks. CoCoG-2 utilizes a training-free guidance algorithm to enhance generation flexibility. CoCoG-2 framework is versatile for creating experimental stimuli based on human concepts, supporting various strategies for guiding visual stimuli generation, and demonstrating how these stimuli can validate various experimental hypotheses. CoCoG-2 will advance our understanding of the causal relationship between concept representations and behaviors by generating visual stimuli. The code is available at \url{https://github.com/ncclab-sustech/CoCoG-2}.

CoCoG-2: Controllable generation of visual stimuli for understanding human concept representation

TL;DR

Abstract

Paper Structure (28 sections, 17 equations, 6 figures, 1 algorithm)

This paper contains 28 sections, 17 equations, 6 figures, 1 algorithm.

Introduction
Preliminaries
Training-free Guidance Diffusion Models
Diffusion Models.
Training-Free Guidance.
Concept based Controllable Generation
Concept Encoder
Concept Decoder
CLIP Embedding as an Intermediate Variable
Method
Training-free guidance for controlling visual stimuli
Guidance set and corresponding loss functions
Concept guidance:
Smoothness guidance:
Semantics guidance:
...and 13 more sections

Figures (6)

Figure 1: Guided by the CoCoG-2, we can validate a specific hypothesis by generating visual stimuli in the controllable concept space using our framework. For instance, researchers propose an experimental hypothesis (Does "clothing" influence human judgment?), and then construct a loss function using a guidance set (aimed at preserving pixel and semantic features while modifying the concept of "clothing"), and finally generate alternative visual stimuli through training-free guidance (visual stimuli where the concept of "clothing" exceeds 0.9 lead to changes in human judgment). These synthetic images can be used to test the hypothesis with behavioral experiments .
Figure 2: Generate diverse visual stimuli for given target concepts. (a) Concept guidance used in this experiment. (b) Visual stimuli generated under the guidance of one or more target concepts.
Figure 3: The smooth change of visual stimuli generated based on concept. (a) Concept guidance and Smoothness guidance used in this experiment. (b) The average CLIP similarity matrix of 100 groups of stimuli. (c) Two trials of images generated under the guidance of one or more target Concepts.
Figure 4: Visual stimuli generated by image editing in concept. (a) Concept, Smoothness, Semantics, and Pixel guidance used in this experiment. (b) A trial of images generated under the guidance of a single concept (row 1); Visual stimuli generated by CoCoG-2 and CoCoG under the guidance of multiple concepts (row 2-4). When guided only by "ground animals" and "cotton clothing," the images generated by CoCoG-2 align well with the target concepts. However, the images generated by CoCoG cannot produce real animals due to conflicts between target concepts and the concept "baby toys" in the original image, necessitating manual adjustment of the "baby toys" value.
Figure 5: Behavioral manipulation of similarity judgments with/without Pixel guidance. (a) Smoothness, Pixel, and Judgment guidance used in this experiment. (b) Images generated under the guidance of probability interpolation of two pairs of references. The top row of images for each set is generated without using the Pixel guidance, while the bottom row of images is generated with the Pixel guidance.
...and 1 more figures

CoCoG-2: Controllable generation of visual stimuli for understanding human concept representation

TL;DR

Abstract

CoCoG-2: Controllable generation of visual stimuli for understanding human concept representation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)