Application of Deep Learning Methods to Processing of Noisy Medical Video Data

Danil Afonchikov; Elena Kornaeva; Irina Makovik; Alexey Kornaev

Application of Deep Learning Methods to Processing of Noisy Medical Video Data

Danil Afonchikov, Elena Kornaeva, Irina Makovik, Alexey Kornaev

TL;DR

This paper addresses counting white and red blood cells in noisy medical video data using curriculum learning (CL) and multi-view (MV) post-processing. It introduces a synthetic moving blood cell video dataset (1150 videos, 100 frames each, 3×128×128) to bridge in vitro complete blood count tests with prospective in vivo capillaroscopy data, and formalizes CL with $d = \\alpha b + \\beta l$ and $c(t) = \\min ( 1, \\sqrt[^p]{t \\frac{1 - c_0^p}{T} + c_0^p} )$, while applying MV schemes (MVM and MVWCo-S) over multiple augmented crops. Experiments across WBC and RBC tasks show that MV improves accuracy (best ~$90.3\\%$ for WBC with 9-frame inputs; RBC ~65% with 100-frame sequences) and that MV robustness extends to CIFAR-10 under label noise, where clean-label accuracy reaches ~92% and noisy-label accuracy ~87–88%. The results indicate that CL and MV offer practical, scalable improvements for processing small, high-noise medical video datasets and could enable more reliable in vivo CBC predictions.

Abstract

Cells count become a challenging problem when the cells move in a continuous stream, and their boundaries are difficult for visual detection. To resolve this problem we modified the training and decision making processes using curriculum learning and multi-view predictions techniques, respectively.

Application of Deep Learning Methods to Processing of Noisy Medical Video Data

TL;DR

and

, while applying MV schemes (MVM and MVWCo-S) over multiple augmented crops. Experiments across WBC and RBC tasks show that MV improves accuracy (best ~

for WBC with 9-frame inputs; RBC ~65% with 100-frame sequences) and that MV robustness extends to CIFAR-10 under label noise, where clean-label accuracy reaches ~92% and noisy-label accuracy ~87–88%. The results indicate that CL and MV offer practical, scalable improvements for processing small, high-noise medical video datasets and could enable more reliable in vivo CBC predictions.

Abstract

Paper Structure (8 sections, 5 equations, 4 figures, 2 tables)

This paper contains 8 sections, 5 equations, 4 figures, 2 tables.

Introduction
Methodology
Results and discussion
Conclusion
Data collection
Curriculum learning and multi-view predictions intuition
Details of experiments
Additional experiments on CIFAR-10 dataset

Figures (4)

Figure 1: Blood cells microscopy: moving along the path from in vitro (a) to in vivo (e) blood count tests using the synthetic blood count dataset with different values of blurring $b$ in the samples (b)-(d). Some of the white blood cells situated among the red blood cells are marked with circles (b), (e).
Figure 2: Curriculum learning intuition. The difficulty and competence curves represents \ref{['eq:difficulty_competence']} when $\alpha = \beta = 0.5$, $c_0 = 0.05$, $T = 1000$, and $p = 2$.
Figure 3: Multi-view intuition. The network recieves a set of fragments of the sample and makes a set of predictions. The final prediction corresponds to the mode value (MVM, see \ref{['eq:MVMode']}), or to the bin count weighted with confidences (MVWCo-S, see \ref{['eq:cum_weights']}).
Figure 4: The validation loss (blue lines) and the validation accuracy (orange lines) functions (see \ref{['tab:simRes1Frame']}) while the train loss functions demonstrated monotonous decreasing in all the experiments.

Application of Deep Learning Methods to Processing of Noisy Medical Video Data

TL;DR

Abstract

Application of Deep Learning Methods to Processing of Noisy Medical Video Data

Authors

TL;DR

Abstract

Table of Contents

Figures (4)