Table of Contents
Fetching ...

HamVision: Hamiltonian Dynamics as Inductive Bias for Medical Image Analysis

Mohamed A Mabrok

Abstract

We present HamVision, a framework for medical image analysis that uses the damped harmonic oscillator, a fundamental building block of signal processing, as a structured inductive bias for both segmentation and classification tasks. The oscillator's phase-space decomposition yields three functionally distinct representations: position~$q$ (feature content), momentum~$p$ (spatial gradients that encode boundary and texture information), and energy $H = \tfrac{1}{2}|z|^2$ (a parameter-free saliency map). These representations emerge from the dynamics, not from supervision, and can be exploited by different task-specific heads without any modification to the oscillator itself. For segmentation, energy gates the skip connections while momentum injects boundary information at every decoder level (HamSeg). For classification, the three representations are globally pooled and concatenated into a phase-space feature vector (HamCls). We evaluate HamVision across ten medical imaging benchmarks spanning five imaging modalities. On segmentation, HamSeg achieves state-of-the-art Dice scores on ISIC\,2018 (89.38\%), ISIC\,2017 (88.40\%), TN3K (87.05\%), and ACDC (92.40\%), outperforming most baselines with only 8.57M parameters. On classification, HamCls achieves state-of-the-art accuracy on BloodMNIST (98.85\%) and PathMNIST (96.65\%), and competitive results on the remaining MedMNIST datasets against MedMamba and MedViT. Diagnostic analysis confirms that the oscillator's momentum consistently encodes an interior$\,{>}\,$boundary$\,{>}\,$exterior gradient for segmentation and that the energy map correlates with discriminative regions for classification, properties that emerge entirely from the Hamiltonian dynamics. Code is available at https://github.com/Minds-R-Lab/hamvision.

HamVision: Hamiltonian Dynamics as Inductive Bias for Medical Image Analysis

Abstract

We present HamVision, a framework for medical image analysis that uses the damped harmonic oscillator, a fundamental building block of signal processing, as a structured inductive bias for both segmentation and classification tasks. The oscillator's phase-space decomposition yields three functionally distinct representations: position~ (feature content), momentum~ (spatial gradients that encode boundary and texture information), and energy (a parameter-free saliency map). These representations emerge from the dynamics, not from supervision, and can be exploited by different task-specific heads without any modification to the oscillator itself. For segmentation, energy gates the skip connections while momentum injects boundary information at every decoder level (HamSeg). For classification, the three representations are globally pooled and concatenated into a phase-space feature vector (HamCls). We evaluate HamVision across ten medical imaging benchmarks spanning five imaging modalities. On segmentation, HamSeg achieves state-of-the-art Dice scores on ISIC\,2018 (89.38\%), ISIC\,2017 (88.40\%), TN3K (87.05\%), and ACDC (92.40\%), outperforming most baselines with only 8.57M parameters. On classification, HamCls achieves state-of-the-art accuracy on BloodMNIST (98.85\%) and PathMNIST (96.65\%), and competitive results on the remaining MedMNIST datasets against MedMamba and MedViT. Diagnostic analysis confirms that the oscillator's momentum consistently encodes an interiorboundaryexterior gradient for segmentation and that the energy map correlates with discriminative regions for classification, properties that emerge entirely from the Hamiltonian dynamics. Code is available at https://github.com/Minds-R-Lab/hamvision.
Paper Structure (36 sections, 23 equations, 6 figures, 4 tables)

This paper contains 36 sections, 23 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Detailed architecture of the Hamiltonian Bottleneck. The input is processed through two parallel paths: a ConvNeXt block (left) providing reliable gradient flow, and an oscillator bank (right) performing physics-structured processing via four-directional scanning. The oscillator's complex state $z$ is decomposed into position $q$, momentum $p$, and energy $H = \tfrac{1}{2}|z|^2$. A learned gate fuses the ConvNeXt and oscillator outputs into features $f$, while a squeeze-and-excitation channel attention module collapses the per-channel energy tensor into a single-channel saliency map $H_\text{map}$. All three outputs propagate to downstream task-specific heads.
  • Figure 2: Overview of the HamSeg segmentation architecture. A shared ConvNeXt encoder with a Hamiltonian oscillator bottleneck produces position $q$, momentum $p$, and energy $H$ representations, which are injected into a U-Net decoder via energy-gated skip connections and multi-scale momentum concatenation.
  • Figure 3: Architecture of HamCls, the classification head of HamVision. The shared encoder (Stem $\to$ ConvNeXt stages $\to$ Hamiltonian Bottleneck) produces three phase-space quantities: features $q$, momentum magnitude $|p|$, and scalar energy $H$. Each is globally pooled and concatenated into a 784-dimensional phase-space vector, which is classified by a LayerNorm--MLP head.
  • Figure 4: Qualitative segmentation for different datasets (TN3K thyroid ultrasound, ACDC cardiac MRI, and ISIC 2018 dermoscopy). Left to right: input, ground truth, predicted segmentation, and overlay.
  • Figure 5: Multiscale energy-gated skip connections across four datasets (rows: ACDC, ISIC 2018, ISIC 2017, TN3K). Columns show the gate activation $\sigma(\gamma_l(H_l - \bar{H}_l))$ at decoder levels $d_3$ (coarsest), $d_2$, and $d_1$ (finest). The progressive sharpening from diffuse spatial selection at $d_3$ to precise boundary delineation at $d_1$ demonstrates that the energy map, computed once at the bottleneck, provides complementary information at each decoder resolution. This coarse-to-fine pattern is consistent across all imaging modalities.
  • ...and 1 more figures