Table of Contents
Fetching ...

Overcoming Small Data Limitations in Video-Based Infant Respiration Estimation

Liyang Song, Hardik Bishnoi, Sai Kumar Reddy Manne, Sarah Ostadabbas, Briana J. Taylor, Michael Wan

TL;DR

This work tackles the scarcity and reproducibility challenges in video-based infant respiration estimation by introducing AIR-400, a publicly available, annotated dataset of 400 clips from 18 infants. It pairs infant-specific ROI detection with optical-flow augmentation and spatiotemporal networks to produce reproducible respiration waveforms, evaluated with six-fold subject-wise cross-validation and a PSD-based loss. The study shows that larger, carefully curated datasets improve performance but also reveals reproducibility concerns in prior work, especially regarding AIR-125. The contributions lay a robust foundation for benchmarking and advancing contactless infant respiratory monitoring in home and clinical settings.

Abstract

The development of contactless respiration monitoring for infants could enable advances in the early detection and treatment of breathing irregularities, which are associated with neurodevelopmental impairments and conditions like sudden infant death syndrome (SIDS). But while respiration estimation for adults is supported by a robust ecosystem of computer vision algorithms and video datasets, only one small public video dataset with annotated respiration data for infant subjects exists, and there are no reproducible algorithms which are effective for infants. We introduce the annotated infant respiration dataset of 400 videos (AIR-400), contributing 275 new, carefully annotated videos from 10 recruited subjects to the public corpus. We develop the first reproducible pipelines for infant respiration estimation, based on infant-specific region-of-interest detection and spatiotemporal neural processing enhanced by optical flow inputs. We establish, through comprehensive experiments, the first reproducible benchmarks for the state-of-the-art in vision-based infant respiration estimation. We make our dataset, code repository, and trained models available for public use.

Overcoming Small Data Limitations in Video-Based Infant Respiration Estimation

TL;DR

This work tackles the scarcity and reproducibility challenges in video-based infant respiration estimation by introducing AIR-400, a publicly available, annotated dataset of 400 clips from 18 infants. It pairs infant-specific ROI detection with optical-flow augmentation and spatiotemporal networks to produce reproducible respiration waveforms, evaluated with six-fold subject-wise cross-validation and a PSD-based loss. The study shows that larger, carefully curated datasets improve performance but also reveals reproducibility concerns in prior work, especially regarding AIR-125. The contributions lay a robust foundation for benchmarking and advancing contactless infant respiratory monitoring in home and clinical settings.

Abstract

The development of contactless respiration monitoring for infants could enable advances in the early detection and treatment of breathing irregularities, which are associated with neurodevelopmental impairments and conditions like sudden infant death syndrome (SIDS). But while respiration estimation for adults is supported by a robust ecosystem of computer vision algorithms and video datasets, only one small public video dataset with annotated respiration data for infant subjects exists, and there are no reproducible algorithms which are effective for infants. We introduce the annotated infant respiration dataset of 400 videos (AIR-400), contributing 275 new, carefully annotated videos from 10 recruited subjects to the public corpus. We develop the first reproducible pipelines for infant respiration estimation, based on infant-specific region-of-interest detection and spatiotemporal neural processing enhanced by optical flow inputs. We establish, through comprehensive experiments, the first reproducible benchmarks for the state-of-the-art in vision-based infant respiration estimation. We make our dataset, code repository, and trained models available for public use.

Paper Structure

This paper contains 17 sections, 11 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Our infant respiration estimation pipeline combines a preprocessing module with infant-specific spatial region-of-interest (ROI) detectors, spatiotemporal inference from neural networks, and signals post-processing. We implement two region-of-interest (ROI) strategies, an infant body ROI detector, and infant chest ROI detector. Clips are cropped to the ROI and processed by optical flow (Coarse2Fine liu2009beyond, DeepFlow weinzaepfel2013deepflow, Farnebäck farneback2003two, PCAFlow wulff2015efficient, TV-L1 zach2007adualitysanchez2013tvl1, RAFT zachary2020raft). Spatiotemporal neural networks (DeepPhys chen2018deepphys, MTTS-CAN liu2020multi, EfficientPhys liu2023efficientphys, AIRFlowNet manne_automatic_2023) then process appearance and motion inputs, and post-processing via detrending and bandpass filtering is applied to obtain respiration rates and waveforms.
  • Figure 2: Number of videos per subject in AIR-400. The dataset comprises 400 videos from 18 infant subjects. Blue bars (S01--S08) represent the AIR-125 dataset manne_automatic_2023 (125 videos from 8 subjects), while green bars (S09--S18) show our additions (275 videos from 10 new subjects from the same study as in manne_automatic_2023). Each bar indicates the number of 60 s clips available for that subject.
  • Figure 3: Dataset statistics for AIR-125 manne_automatic_2023, our newly contributed 275 videos, and the combined AIR-400. Each panel shows a respiration-rate histogram, plus ring charts for frame rate, color mode (RGB vs infrared (IR)), and sleeping position (supine, side, and prone). The expanded dataset provides broader respiratory pattern diversity and varied recording conditions.
  • Figure 4: Infant body region-of-interest (ROI) and chest ROI detections across three subjects.
  • Figure 5: Visualization of optical flow estimation on a sample frame from AIR-400. Motion fields produced by six optical flow algorithms: Coarse2Fine, DeepFlow, Farnebäck, PCAFlow, TV-L1, RAFT.
  • ...and 3 more figures