Table of Contents
Fetching ...

Enriching Information and Preserving Semantic Consistency in Expanding Curvilinear Object Segmentation Datasets

Qin Lei, Jiang Zhong, Qizhu Dai

TL;DR

This work tackles the data bottleneck in curvilinear object segmentation by introducing COSTG, a text-rich, diffusion-friendly dataset built from six public sources that couples semantic maps with descriptive captions. It then presents SCP ControlNet, a spatially adaptive normalization-enhanced extension of ControlNet, to preserve semantic information across multi-scale conditioning during text-to-image synthesis. By training diffusion-based generators on COSTG and fine-tuning with SCP ControlNet, the approach produces synthetic data that surpasses the original distribution and yields significant segmentation gains across angiography, crack, and retina datasets with modest expansion factors. The framework enables more informative data augmentation for curvilinear segmentation and provides open access to code and COSTG, facilitating broader adoption and impact in SEM-based data expansion for medical and industrial imagery.

Abstract

Curvilinear object segmentation plays a crucial role across various applications, yet datasets in this domain often suffer from small scale due to the high costs associated with data acquisition and annotation. To address these challenges, this paper introduces a novel approach for expanding curvilinear object segmentation datasets, focusing on enhancing the informativeness of generated data and the consistency between semantic maps and generated images. Our method enriches synthetic data informativeness by generating curvilinear objects through their multiple textual features. By combining textual features from each sample in original dataset, we obtain synthetic images that beyond the original dataset's distribution. This initiative necessitated the creation of the Curvilinear Object Segmentation based on Text Generation (COSTG) dataset. Designed to surpass the limitations of conventional datasets, COSTG incorporates not only standard semantic maps but also some textual descriptions of curvilinear object features. To ensure consistency between synthetic semantic maps and images, we introduce the Semantic Consistency Preserving ControlNet (SCP ControlNet). This involves an adaptation of ControlNet with Spatially-Adaptive Normalization (SPADE), allowing it to preserve semantic information that would typically be washed away in normalization layers. This modification facilitates more accurate semantic image synthesis. Experimental results demonstrate the efficacy of our approach across three types of curvilinear objects (angiography, crack and retina) and six public datasets (CHUAC, XCAD, DCA1, DRIVE, CHASEDB1 and Crack500). The synthetic data generated by our method not only expand the dataset, but also effectively improves the performance of other curvilinear object segmentation models. Source code and dataset are available at \url{https://github.com/tanlei0/COSTG}.

Enriching Information and Preserving Semantic Consistency in Expanding Curvilinear Object Segmentation Datasets

TL;DR

This work tackles the data bottleneck in curvilinear object segmentation by introducing COSTG, a text-rich, diffusion-friendly dataset built from six public sources that couples semantic maps with descriptive captions. It then presents SCP ControlNet, a spatially adaptive normalization-enhanced extension of ControlNet, to preserve semantic information across multi-scale conditioning during text-to-image synthesis. By training diffusion-based generators on COSTG and fine-tuning with SCP ControlNet, the approach produces synthetic data that surpasses the original distribution and yields significant segmentation gains across angiography, crack, and retina datasets with modest expansion factors. The framework enables more informative data augmentation for curvilinear segmentation and provides open access to code and COSTG, facilitating broader adoption and impact in SEM-based data expansion for medical and industrial imagery.

Abstract

Curvilinear object segmentation plays a crucial role across various applications, yet datasets in this domain often suffer from small scale due to the high costs associated with data acquisition and annotation. To address these challenges, this paper introduces a novel approach for expanding curvilinear object segmentation datasets, focusing on enhancing the informativeness of generated data and the consistency between semantic maps and generated images. Our method enriches synthetic data informativeness by generating curvilinear objects through their multiple textual features. By combining textual features from each sample in original dataset, we obtain synthetic images that beyond the original dataset's distribution. This initiative necessitated the creation of the Curvilinear Object Segmentation based on Text Generation (COSTG) dataset. Designed to surpass the limitations of conventional datasets, COSTG incorporates not only standard semantic maps but also some textual descriptions of curvilinear object features. To ensure consistency between synthetic semantic maps and images, we introduce the Semantic Consistency Preserving ControlNet (SCP ControlNet). This involves an adaptation of ControlNet with Spatially-Adaptive Normalization (SPADE), allowing it to preserve semantic information that would typically be washed away in normalization layers. This modification facilitates more accurate semantic image synthesis. Experimental results demonstrate the efficacy of our approach across three types of curvilinear objects (angiography, crack and retina) and six public datasets (CHUAC, XCAD, DCA1, DRIVE, CHASEDB1 and Crack500). The synthetic data generated by our method not only expand the dataset, but also effectively improves the performance of other curvilinear object segmentation models. Source code and dataset are available at \url{https://github.com/tanlei0/COSTG}.
Paper Structure (34 sections, 1 equation, 10 figures, 5 tables)

This paper contains 34 sections, 1 equation, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Comparative analysis of curvilinear objects semantic image synthesis among SCP ControlNet, LDM, and ControlNet, where (a), (b), and (c) illustrate the three types of curvilinear objects in this paper: (a) crack, (b) angiography and (c) retina. It demonstrates SCP ControlNet's enhanced precision in reflecting semantic map information within generated images, crucial for expanding semantic segmentation datasets.
  • Figure 2: Two Examples of the Annotation Process: ChatGPT-4-V (Left) vs. Gemini 1.5 (Right).
  • Figure 3: Overview of SCP ControlNet. Unlike ControlNet, SCP ControlNet features a distinct Encoder Block and employs a novel approach to utilizing semantic information. In addition to concatenating semantic information with latent noise, it infuses semantic data across multiple scales into various Encoder Blocks.
  • Figure 4: (a) Details of the SCP ControlNet Encoder Block, highlighting that the final block in the Encoder typically lacks the CrossAttnDownBlock2D layers. (b) Details of SPADE (Spatially-Adaptive Normalization). Here, Segmap represents semantic map features, $h_{in}$ denotes input features, $t_{emb}$ indicates time embedding, and $h_{out}$ signifies output features.
  • Figure 5: Overview of the dataset expansion pipeline, illustrating the process with the Crack500 dataset example. The pipeline is divided into three steps: (1) combine features through random sampling to obtain captions for generating semantic maps. (2) Utilizing captions for conditional generation to obtain semantic maps. (3) Employing semantic maps and modified captions to generate images.
  • ...and 5 more figures