Table of Contents
Fetching ...

Application of Deep Learning Methods to Processing of Noisy Medical Video Data

Danil Afonchikov, Elena Kornaeva, Irina Makovik, Alexey Kornaev

TL;DR

This paper addresses counting white and red blood cells in noisy medical video data using curriculum learning (CL) and multi-view (MV) post-processing. It introduces a synthetic moving blood cell video dataset (1150 videos, 100 frames each, 3×128×128) to bridge in vitro complete blood count tests with prospective in vivo capillaroscopy data, and formalizes CL with $d = \\alpha b + \\beta l$ and $c(t) = \\min ( 1, \\sqrt[^p]{t \\frac{1 - c_0^p}{T} + c_0^p} )$, while applying MV schemes (MVM and MVWCo-S) over multiple augmented crops. Experiments across WBC and RBC tasks show that MV improves accuracy (best ~$90.3\\%$ for WBC with 9-frame inputs; RBC ~65% with 100-frame sequences) and that MV robustness extends to CIFAR-10 under label noise, where clean-label accuracy reaches ~92% and noisy-label accuracy ~87–88%. The results indicate that CL and MV offer practical, scalable improvements for processing small, high-noise medical video datasets and could enable more reliable in vivo CBC predictions.

Abstract

Cells count become a challenging problem when the cells move in a continuous stream, and their boundaries are difficult for visual detection. To resolve this problem we modified the training and decision making processes using curriculum learning and multi-view predictions techniques, respectively.

Application of Deep Learning Methods to Processing of Noisy Medical Video Data

TL;DR

This paper addresses counting white and red blood cells in noisy medical video data using curriculum learning (CL) and multi-view (MV) post-processing. It introduces a synthetic moving blood cell video dataset (1150 videos, 100 frames each, 3×128×128) to bridge in vitro complete blood count tests with prospective in vivo capillaroscopy data, and formalizes CL with and , while applying MV schemes (MVM and MVWCo-S) over multiple augmented crops. Experiments across WBC and RBC tasks show that MV improves accuracy (best ~ for WBC with 9-frame inputs; RBC ~65% with 100-frame sequences) and that MV robustness extends to CIFAR-10 under label noise, where clean-label accuracy reaches ~92% and noisy-label accuracy ~87–88%. The results indicate that CL and MV offer practical, scalable improvements for processing small, high-noise medical video datasets and could enable more reliable in vivo CBC predictions.

Abstract

Cells count become a challenging problem when the cells move in a continuous stream, and their boundaries are difficult for visual detection. To resolve this problem we modified the training and decision making processes using curriculum learning and multi-view predictions techniques, respectively.
Paper Structure (8 sections, 5 equations, 4 figures, 2 tables)

This paper contains 8 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Blood cells microscopy: moving along the path from in vitro (a) to in vivo (e) blood count tests using the synthetic blood count dataset with different values of blurring $b$ in the samples (b)-(d). Some of the white blood cells situated among the red blood cells are marked with circles (b), (e).
  • Figure 2: Curriculum learning intuition. The difficulty and competence curves represents \ref{['eq:difficulty_competence']} when $\alpha = \beta = 0.5$, $c_0 = 0.05$, $T = 1000$, and $p = 2$.
  • Figure 3: Multi-view intuition. The network recieves a set of fragments of the sample and makes a set of predictions. The final prediction corresponds to the mode value (MVM, see \ref{['eq:MVMode']}), or to the bin count weighted with confidences (MVWCo-S, see \ref{['eq:cum_weights']}).
  • Figure 4: The validation loss (blue lines) and the validation accuracy (orange lines) functions (see \ref{['tab:simRes1Frame']}) while the train loss functions demonstrated monotonous decreasing in all the experiments.