Table of Contents
Fetching ...

TPIE: Topology-Preserved Image Editing With Text Instructions

Nivetha Jayakumar, Srivardhan Reddy Gadila, Tonmoy Hossain, Yangfeng Ji, Miaomiao Zhang

TL;DR

This paper introduces a novel method, Topology-Preserved Image Editing with text instructions (TPIE), that for the first time ensures the topology and geometry remaining intact in edited images through text-guided generative diffusion models.

Abstract

Preserving topological structures is important in real-world applications, particularly in sensitive domains such as healthcare and medicine, where the correctness of human anatomy is critical. However, most existing image editing models focus on manipulating intensity and texture features, often overlooking object geometry within images. To address this issue, this paper introduces a novel method, Topology-Preserved Image Editing with text instructions (TPIE), that for the first time ensures the topology and geometry remaining intact in edited images through text-guided generative diffusion models. More specifically, our method treats newly generated samples as deformable variations of a given input template, allowing for controllable and structure-preserving edits. Our proposed TPIE framework consists of two key modules: (i) an autoencoder-based registration network that learns latent representations of object transformations, parameterized by velocity fields, from pairwise training images; and (ii) a novel latent conditional geometric diffusion (LCDG) model efficiently capturing the data distribution of learned transformation features conditioned on custom-defined text instructions. We validate TPIE on a diverse set of 2D and 3D images and compare them with state-of-the-art image editing approaches. Experimental results show that our method outperforms other baselines in generating more realistic images with well-preserved topology. Our code will be made publicly available on Github.

TPIE: Topology-Preserved Image Editing With Text Instructions

TL;DR

This paper introduces a novel method, Topology-Preserved Image Editing with text instructions (TPIE), that for the first time ensures the topology and geometry remaining intact in edited images through text-guided generative diffusion models.

Abstract

Preserving topological structures is important in real-world applications, particularly in sensitive domains such as healthcare and medicine, where the correctness of human anatomy is critical. However, most existing image editing models focus on manipulating intensity and texture features, often overlooking object geometry within images. To address this issue, this paper introduces a novel method, Topology-Preserved Image Editing with text instructions (TPIE), that for the first time ensures the topology and geometry remaining intact in edited images through text-guided generative diffusion models. More specifically, our method treats newly generated samples as deformable variations of a given input template, allowing for controllable and structure-preserving edits. Our proposed TPIE framework consists of two key modules: (i) an autoencoder-based registration network that learns latent representations of object transformations, parameterized by velocity fields, from pairwise training images; and (ii) a novel latent conditional geometric diffusion (LCDG) model efficiently capturing the data distribution of learned transformation features conditioned on custom-defined text instructions. We validate TPIE on a diverse set of 2D and 3D images and compare them with state-of-the-art image editing approaches. Experimental results show that our method outperforms other baselines in generating more realistic images with well-preserved topology. Our code will be made publicly available on Github.

Paper Structure

This paper contains 10 sections, 10 equations, 5 figures, 1 table, 2 algorithms.

Figures (5)

  • Figure 1: An illustration of image samples generated by a state-of-the-art (SOTA) model - InstructPix2Pix (IP2P) brooks2023instructpix2pix, and our proposed model TPIE. The samples generated by IP2P present challenges in preserving the object topology of individual structures. Violations of topology are highlighted by pink arrows and red boxes.
  • Figure 2: Examples of generated images deformed with sampled transformations from topology-preserved (T-P) distributions (top panel) vs not preserved (bottom panel).
  • Figure 3: An overview of our proposed TPIE framework. There are two key components in the presented TPIE: (i) a latent representation learning of velocity network; and (ii) a latent conditional geometric diffusion (LCGD) model guided by text instructions.
  • Figure 4: A comparison of our method TPIE with all fine-tuned baselines. Left to right: given template images, target images, and generated samples. Top to bottom: examples of images generated from datasets of brain MRI, Komatsuna plants, and hippocampus shape.
  • Figure 5: (a) A confidence map showing the lower/upper bound and confidence interval (regions with 95% of ideal growth patterns) for plants generated by our model. (b) Examples of predicted time trajectory of growth patterns generated by our method. Top to bottom: axial view of brain MRIs (with brain ventricles outlined in red boxes); and 3D hippocampus shape with anterior (A) and posterior views (P).