Table of Contents
Fetching ...

UI-Styler: Ultrasound Image Style Transfer with Class-Aware Prompts for Cross-Device Diagnosis Using a Frozen Black-Box Inference Network

Nhat-Tuong Do-Tran, Ngoc-Hoang-Lam Le, Ching-Chun Huang

TL;DR

The paper tackles cross-device domain shift in ultrasound imaging by reusing a frozen black-box downstream model and unlabeled data. It introduces UI-Styler, a dual-level style transfer framework that couples a pattern-matching domain-level stylization with class-aware prompting guided by pseudo labels to preserve diagnostic semantics. The architecture employs two ViT encoders, a cross-attention-based pattern-matching module, learnable class prompts, and a lightweight decoder, trained with content/style losses and prompt-guided direction and supervision losses. Across 12 cross-device tasks on four ultrasound datasets, UI-Styler achieves state-of-the-art distribution alignment and improves downstream classification and segmentation metrics, while ablations confirm the value of both pattern-matching and class-aware prompting. This approach enables reliable cross-device ultrasound diagnosis in privacy-sensitive, label-scarce settings.

Abstract

The appearance of ultrasound images varies across acquisition devices, causing domain shifts that degrade the performance of fixed black-box downstream inference models when reused. To mitigate this issue, it is practical to develop unpaired image translation (UIT) methods that effectively align the statistical distributions between source and target domains, particularly under the constraint of a reused inference-blackbox setting. However, existing UIT approaches often overlook class-specific semantic alignment during domain adaptation, resulting in misaligned content-class mappings that can impair diagnostic accuracy. To address this limitation, we propose UI-Styler, a novel ultrasound-specific, class-aware image style transfer framework. UI-Styler leverages a pattern-matching mechanism to transfer texture patterns embedded in the target images onto source images while preserving the source structural content. In addition, we introduce a class-aware prompting strategy guided by pseudo labels of the target domain, which enforces accurate semantic alignment with diagnostic categories. Extensive experiments on ultrasound cross-device tasks demonstrate that UI-Styler consistently outperforms existing UIT methods, achieving state-of-the-art performance in distribution distance and downstream tasks, such as classification and segmentation.

UI-Styler: Ultrasound Image Style Transfer with Class-Aware Prompts for Cross-Device Diagnosis Using a Frozen Black-Box Inference Network

TL;DR

The paper tackles cross-device domain shift in ultrasound imaging by reusing a frozen black-box downstream model and unlabeled data. It introduces UI-Styler, a dual-level style transfer framework that couples a pattern-matching domain-level stylization with class-aware prompting guided by pseudo labels to preserve diagnostic semantics. The architecture employs two ViT encoders, a cross-attention-based pattern-matching module, learnable class prompts, and a lightweight decoder, trained with content/style losses and prompt-guided direction and supervision losses. Across 12 cross-device tasks on four ultrasound datasets, UI-Styler achieves state-of-the-art distribution alignment and improves downstream classification and segmentation metrics, while ablations confirm the value of both pattern-matching and class-aware prompting. This approach enables reliable cross-device ultrasound diagnosis in privacy-sensitive, label-scarce settings.

Abstract

The appearance of ultrasound images varies across acquisition devices, causing domain shifts that degrade the performance of fixed black-box downstream inference models when reused. To mitigate this issue, it is practical to develop unpaired image translation (UIT) methods that effectively align the statistical distributions between source and target domains, particularly under the constraint of a reused inference-blackbox setting. However, existing UIT approaches often overlook class-specific semantic alignment during domain adaptation, resulting in misaligned content-class mappings that can impair diagnostic accuracy. To address this limitation, we propose UI-Styler, a novel ultrasound-specific, class-aware image style transfer framework. UI-Styler leverages a pattern-matching mechanism to transfer texture patterns embedded in the target images onto source images while preserving the source structural content. In addition, we introduce a class-aware prompting strategy guided by pseudo labels of the target domain, which enforces accurate semantic alignment with diagnostic categories. Extensive experiments on ultrasound cross-device tasks demonstrate that UI-Styler consistently outperforms existing UIT methods, achieving state-of-the-art performance in distribution distance and downstream tasks, such as classification and segmentation.

Paper Structure

This paper contains 14 sections, 8 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison between the typical unpaired image style transfer methods (top) and our proposed class-aware style transfer approach (bottom) for cross-device ultrasound diagnosis. Conventional methods align source and target distributions at the domain level but often neglect class-level alignment, leading to misaligned mappings, especially for unlabeled (class-confused) samples. In contrast, UI-Styler enforces class-aware alignment via class-specific prompting, guiding class-confused samples toward their correct semantic classes. The target class boundary reflects the behavior of the frozen black-box inference network.
  • Figure 2: Top-left: Overview of the proposed UI-Styler framework for ultrasound image translation under an inference-blackbox setting. Given unlabeled source and target images, UI-Styler performs dual-level stylization along with template prompt set $\mathcal{P}$. The black-box downstream model is frozen and is only for final predictions. Bottom: Details of the dual-level stylization module (\ref{['method:dual']}). At the domain level, pattern matching is performed via cross-attention to inject target style into source content. At the category level, given the learned prompt set $\mathcal{P}$, a class-specific prompt $\mathcal{P}_c$ is determined and used to refine the stylized features $\widetilde{\mathcal{F}}_{s \rightarrow t}$. The final stylized image is reconstructed by a decoder $D$ and optimized using content and style losses ($\mathcal{L}_c$, $\mathcal{L}_s$). Top-right: The prompt set $\mathcal{P}$ is optimized using $\mathcal{L}_{\text{dir}}$ and $\mathcal{L}_{\text{sup}}$ (\ref{['method:training']}) to capture the distinctive characteristics of each semantic class as defined by the black-box model. Note that the encoder $E_t$ and the cross-attention network (highlighted in pink) share the same weights as those used in the UI-Styler model (bottom part).
  • Figure 3: Qualitative Comparisons. We visualize Grad-CAM gradcam attention maps from the black-box downstream model (offline analysis only) on the BUSBRA$\rightarrow$BUSI, UDIAT$\rightarrow$BUSI, and UCLM$\rightarrow$BUSI tasks. The style reference images from the target domain are shown in the first-left row, while the source's ground-truth masks (first-right) serve as the reference for ideal attention. Each row displays the transferred images alongside the corresponding attention maps (highlighted by red squares $\square$) produced by different unpaired style transfer methods. Yellow squares $\square$ indicate regions of interest (tumor) for stylization comparison. Please zoom in to view details more easily.
  • Figure 4: Feature Space. We visualize the feature distributions using t-SNE t-SNE on the UDIAT$\rightarrow$UCLM task. Each point represents a sample: green for benign and red for malignant. $\star$ indicates target samples (UCLM), while $\circ$ denotes source samples (UDIAT) under three conditions—(a) before translation, (b) after domain-level alignment only, and (c) after full dual-level stylization by UI-Styler.
  • Figure 5: Confidence Scores. We visualize the distribution of confidence scores predicted by the black-box downstream model on stylized-source test samples across $4$ source-to-target adaptation tasks. Each box plot shows the predicted probability assigned to the ground-truth class. (e) In the boxplot, the median indicates central prediction confidence, the box spans the interquartile range, and the min–max lines show the full prediction spread. Outliers highlight irregular cases. Higher medians and tighter boxes indicate more confident predictions.