Table of Contents
Fetching ...

ImageSI: Semantic Interaction for Deep Learning Image Projections

Jiayue Lin, Rebecca Faust, Chris North

TL;DR

This work addresses the limitation of DR for images when embedding features do not align with user intent by introducing ImageSI, a semantic interaction framework that updates image embeddings directly rather than DR weights. It presents two loss variants, $ImageSI_{\text{MDS}^{-1}}$ and $ImageSI_{\text{Triplet}}$, to support continuous ordering and clustering-driven tasks, respectively, enabling user-driven refinement before projection. Quantitative simulation shows ImageSI variants outperform the WMDS$^{-1}$ baseline, achieving higher adjusted Silhouette scores and yielding DR layouts that reflect user-specified features and reveal secondary structures. Overall, ImageSI advances human-in-the-loop image sensemaking by coupling interactive feedback with embedding fine-tuning, improving the relevance and interpretability of image DRs.

Abstract

Semantic interaction (SI) in Dimension Reduction (DR) of images allows users to incorporate feedback through direct manipulation of the 2D positions of images. Through interaction, users specify a set of pairwise relationships that the DR should aim to capture. Existing methods for images incorporate feedback into the DR through feature weights on abstract embedding features. However, if the original embedding features do not suitably capture the users' task then the DR cannot either. We propose ImageSI, an SI method for image DR that incorporates user feedback directly into the image model to update the underlying embeddings, rather than weighting them. In doing so, ImageSI ensures that the embeddings suitably capture the features necessary for the task so that the DR can subsequently organize images using those features. We present two variations of ImageSI using different loss functions - ImageSI_MDS_Inverse, which prioritizes the explicit pairwise relationships from the interaction and ImageSI_Triplet, which prioritizes clustering, using the interaction to define groups of images. Finally, we present a usage scenario and a simulation based evaluation to demonstrate the utility of ImageSI and compare it to current methods.

ImageSI: Semantic Interaction for Deep Learning Image Projections

TL;DR

This work addresses the limitation of DR for images when embedding features do not align with user intent by introducing ImageSI, a semantic interaction framework that updates image embeddings directly rather than DR weights. It presents two loss variants, and , to support continuous ordering and clustering-driven tasks, respectively, enabling user-driven refinement before projection. Quantitative simulation shows ImageSI variants outperform the WMDS baseline, achieving higher adjusted Silhouette scores and yielding DR layouts that reflect user-specified features and reveal secondary structures. Overall, ImageSI advances human-in-the-loop image sensemaking by coupling interactive feedback with embedding fine-tuning, improving the relevance and interpretability of image DRs.

Abstract

Semantic interaction (SI) in Dimension Reduction (DR) of images allows users to incorporate feedback through direct manipulation of the 2D positions of images. Through interaction, users specify a set of pairwise relationships that the DR should aim to capture. Existing methods for images incorporate feedback into the DR through feature weights on abstract embedding features. However, if the original embedding features do not suitably capture the users' task then the DR cannot either. We propose ImageSI, an SI method for image DR that incorporates user feedback directly into the image model to update the underlying embeddings, rather than weighting them. In doing so, ImageSI ensures that the embeddings suitably capture the features necessary for the task so that the DR can subsequently organize images using those features. We present two variations of ImageSI using different loss functions - ImageSI_MDS_Inverse, which prioritizes the explicit pairwise relationships from the interaction and ImageSI_Triplet, which prioritizes clustering, using the interaction to define groups of images. Finally, we present a usage scenario and a simulation based evaluation to demonstrate the utility of ImageSI and compare it to current methods.
Paper Structure (15 sections, 1 equation, 4 figures)

This paper contains 15 sections, 1 equation, 4 figures.

Figures (4)

  • Figure 1: The ImageSI pipeline. First, features are extracted from a pre-trained ResNet-18 model. These features are projected using DR. The user then performs interactive tasks on the DR plot. Their interaction is then used to fine-tune the image model using either MDS$^{-1}$ or triplet loss. Subsequently, ImageSI extracts the updated features from the fine-tuned model and re-projects them. Red dotted arrows represent the methodology from Han et al. han2023explainable, while blue solid arrows illustrate the ImageSI pipeline, which expands the scope of Han et al.'s exploration.
  • Figure 2: (a) The initial MDS projection of the images containing open and closed-mouthed sharks and snakes. (b) The semantic interaction teaches the DR about the open vs closed mouth feature.
  • Figure 3: Updated DR plots after interaction for (a) WMDS$^{-1}$, (b) ImageSI$_{\text{MDS$^{-1}$}}$, and (c) ImageSI$_{\text{Triplet}}$. Note, blue ellipses indicate the open-mouth animals, while red indicates closed-mouth animals.
  • Figure 4: Comparison of adjusted Silhouette scores across different frameworks and tasks. Subfigures (a) to (d) depict the performance of WMDS$^{-1}$, ImageSI$_{\text{MDS}^{-1}}$, and ImageSI$_{\text{Triplet}}$, respectively. Each subplot shows the adjusted Silhouette scores achieved by each method over a range of interactions.