Table of Contents
Fetching ...

MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data

Chika Maduabuchi, Ericmoore Jossou, Matteo Bucci

TL;DR

MSEG-VCUQ introduces a hybrid HSV PD segmentation framework that integrates U‑Net and VideoSAM to achieve high-precision, cross-modality segmentation across diverse boiling fluids. It couples this with a grounded uncertainty-quantification pipeline to quantify discretization errors in key metrics like the dry area fraction and contact line density, supported by an open-source multimodal HSV PD dataset. Empirical results show VideoSAM surpasses traditional CNN baselines and SAM, particularly in dense, complex bubble scenes, while U‑Net maintains advantages in simpler, low-contrast settings. The work provides a scalable, reproducible platform for HSV PD analysis with implications for autonomous experiments and improved boiling-heat-transfer modeling.

Abstract

High-speed video (HSV) phase detection (PD) segmentation is crucial for monitoring vapor, liquid, and microlayer phases in industrial processes. While CNN-based models like U-Net have shown success in simplified shadowgraphy-based two-phase flow (TPF) analysis, their application to complex HSV PD tasks remains unexplored, and vision foundation models (VFMs) have yet to address the complexities of either shadowgraphy-based or PD TPF video segmentation. Existing uncertainty quantification (UQ) methods lack pixel-level reliability for critical metrics like contact line density and dry area fraction, and the absence of large-scale, multimodal experimental datasets tailored to PD segmentation further impedes progress. To address these gaps, we propose MSEG-VCUQ. This hybrid framework integrates U-Net CNNs with the transformer-based Segment Anything Model (SAM) to achieve enhanced segmentation accuracy and cross-modality generalization. Our approach incorporates systematic UQ for robust error assessment and introduces the first open-source multimodal HSV PD datasets. Empirical results demonstrate that MSEG-VCUQ outperforms baseline CNNs and VFMs, enabling scalable and reliable PD segmentation for real-world boiling dynamics.

MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data

TL;DR

MSEG-VCUQ introduces a hybrid HSV PD segmentation framework that integrates U‑Net and VideoSAM to achieve high-precision, cross-modality segmentation across diverse boiling fluids. It couples this with a grounded uncertainty-quantification pipeline to quantify discretization errors in key metrics like the dry area fraction and contact line density, supported by an open-source multimodal HSV PD dataset. Empirical results show VideoSAM surpasses traditional CNN baselines and SAM, particularly in dense, complex bubble scenes, while U‑Net maintains advantages in simpler, low-contrast settings. The work provides a scalable, reproducible platform for HSV PD analysis with implications for autonomous experiments and improved boiling-heat-transfer modeling.

Abstract

High-speed video (HSV) phase detection (PD) segmentation is crucial for monitoring vapor, liquid, and microlayer phases in industrial processes. While CNN-based models like U-Net have shown success in simplified shadowgraphy-based two-phase flow (TPF) analysis, their application to complex HSV PD tasks remains unexplored, and vision foundation models (VFMs) have yet to address the complexities of either shadowgraphy-based or PD TPF video segmentation. Existing uncertainty quantification (UQ) methods lack pixel-level reliability for critical metrics like contact line density and dry area fraction, and the absence of large-scale, multimodal experimental datasets tailored to PD segmentation further impedes progress. To address these gaps, we propose MSEG-VCUQ. This hybrid framework integrates U-Net CNNs with the transformer-based Segment Anything Model (SAM) to achieve enhanced segmentation accuracy and cross-modality generalization. Our approach incorporates systematic UQ for robust error assessment and introduces the first open-source multimodal HSV PD datasets. Empirical results demonstrate that MSEG-VCUQ outperforms baseline CNNs and VFMs, enabling scalable and reliable PD segmentation for real-world boiling dynamics.

Paper Structure

This paper contains 80 sections, 21 equations, 18 figures, 9 tables.

Figures (18)

  • Figure 1: (a) Sample of Front-Lit Shadowgraphy images from two-phase flow. (b) Sample Phase-Detection Images were used in this study.
  • Figure 2: Illustration of the integrated process within the VideoSAM architecture. Initially, fine-tuned U-Net models produce segmentation masks tailored to each fluid modality, capturing primary liquid-vapor boundaries. These masks are then paired with their corresponding images and processed by the VideoSAM transformer. The model refines segmentation outputs through its image encoder and mask decoder, leveraging SAM’s pre-trained components to achieve consistent and high-accuracy HSV segmentation across various experimental conditions.
  • Figure 3: Variation of Dry Area Fraction and Contact Line Density with Increasing Mean Heat Flux (left plots) and 3D Histogram of Heat Flux vs. Bubble Sizes Distribution (right plots) Using Segmentation (U-Net CNN) and Thresholding Techniques.
  • Figure 4: Mean Error (ME) and Percentage Relative Error (PRE) of Perimeter and Area Variations with Bubble Radius and Grid Cell Size.
  • Figure 5: Integrated Segmentation Workflow for Phase Detection Data Using Hybrid U-Net and VideoSAM Models.
  • ...and 13 more figures