MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data

Chika Maduabuchi; Ericmoore Jossou; Matteo Bucci

MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data

Chika Maduabuchi, Ericmoore Jossou, Matteo Bucci

TL;DR

MSEG-VCUQ introduces a hybrid HSV PD segmentation framework that integrates U‑Net and VideoSAM to achieve high-precision, cross-modality segmentation across diverse boiling fluids. It couples this with a grounded uncertainty-quantification pipeline to quantify discretization errors in key metrics like the dry area fraction and contact line density, supported by an open-source multimodal HSV PD dataset. Empirical results show VideoSAM surpasses traditional CNN baselines and SAM, particularly in dense, complex bubble scenes, while U‑Net maintains advantages in simpler, low-contrast settings. The work provides a scalable, reproducible platform for HSV PD analysis with implications for autonomous experiments and improved boiling-heat-transfer modeling.

Abstract

High-speed video (HSV) phase detection (PD) segmentation is crucial for monitoring vapor, liquid, and microlayer phases in industrial processes. While CNN-based models like U-Net have shown success in simplified shadowgraphy-based two-phase flow (TPF) analysis, their application to complex HSV PD tasks remains unexplored, and vision foundation models (VFMs) have yet to address the complexities of either shadowgraphy-based or PD TPF video segmentation. Existing uncertainty quantification (UQ) methods lack pixel-level reliability for critical metrics like contact line density and dry area fraction, and the absence of large-scale, multimodal experimental datasets tailored to PD segmentation further impedes progress. To address these gaps, we propose MSEG-VCUQ. This hybrid framework integrates U-Net CNNs with the transformer-based Segment Anything Model (SAM) to achieve enhanced segmentation accuracy and cross-modality generalization. Our approach incorporates systematic UQ for robust error assessment and introduces the first open-source multimodal HSV PD datasets. Empirical results demonstrate that MSEG-VCUQ outperforms baseline CNNs and VFMs, enabling scalable and reliable PD segmentation for real-world boiling dynamics.

MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data

TL;DR

Abstract

MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (18)