Patch-wise Graph Contrastive Learning for Image Translation
Chanyong Jung, Gihyun Kwon, Jong Chul Ye
TL;DR
This paper addresses the challenge of semantically faithful image translation by introducing a patch-wise graph contrastive learning framework. It constructs patch graphs from a pretrained encoder, uses a shared adjacency matrix to couple input and translated output graphs, and applies graph pooling to capture hierarchical semantics, all while maximizing mutual information between patch nodes via an infoNCE loss. The approach yields state-of-the-art results on five unpaired translation benchmarks and demonstrates robust qualitative improvements in preserving structure and spatial coherence, including single-image high-resolution translations. By explicitly modeling patch topology and focusing on task-relevant regions, the method offers a principled way to leverage topology-aware representations for image translation with practical impact in semantically consistent generation.
Abstract
Recently, patch-wise contrastive learning is drawing attention for the image translation by exploring the semantic correspondence between the input and output images. To further explore the patch-wise topology for high-level semantic understanding, here we exploit the graph neural network to capture the topology-aware features. Specifically, we construct the graph based on the patch-wise similarity from a pretrained encoder, whose adjacency matrix is shared to enhance the consistency of patch-wise relation between the input and the output. Then, we obtain the node feature from the graph neural network, and enhance the correspondence between the nodes by increasing mutual information using the contrastive loss. In order to capture the hierarchical semantic structure, we further propose the graph pooling. Experimental results demonstrate the state-of-art results for the image translation thanks to the semantic encoding by the constructed graphs.
