Table of Contents
Fetching ...

Transfer Learning for Keypoint Detection in Low-Resolution Thermal TUG Test Images

Wei-Lun Chen, Chia-Yeh Hsieh, Yu-Hsiang Kao, Kai-Chun Liu, Sheng-Yu Peng, Yu Tsao

TL;DR

The study tackles keypoint detection in privacy-preserving, low-resolution thermal images for mobility assessment. It proposes a two-stage transfer-learning pipeline: Faster R-CNN for bounding boxes, and a MobileNetV3-Small encoder guided by an RGB ViTPose teacher with a shared ViTPose decoder, optimized via a composite loss balancing latent representation and heatmap accuracy. It achieves strong OKS-based AP metrics (AP 0.861, AP50 0.942, AP75 0.887) while substantially reducing parameters and FLOPS compared with baselines, with beta = 0.4 providing optimal balance. This work demonstrates the feasibility of accurate, efficient thermal-image keypoint detection for clinical mobility evaluation, enabling privacy-friendly, edge-capable monitoring and paving the way for broader clinical tests and rehabilitation applications.

Abstract

This study presents a novel approach to human keypoint detection in low-resolution thermal images using transfer learning techniques. We introduce the first application of the Timed Up and Go (TUG) test in thermal image computer vision, establishing a new paradigm for mobility assessment. Our method leverages a MobileNetV3-Small encoder and a ViTPose decoder, trained using a composite loss function that balances latent representation alignment and heatmap accuracy. The model was evaluated using the Object Keypoint Similarity (OKS) metric from the COCO Keypoint Detection Challenge. The proposed model achieves better performance with AP, AP50, and AP75 scores of 0.861, 0.942, and 0.887 respectively, outperforming traditional supervised learning approaches like Mask R-CNN and ViTPose-Base. Moreover, our model demonstrates superior computational efficiency in terms of parameter count and FLOPS. This research lays a solid foundation for future clinical applications of thermal imaging in mobility assessment and rehabilitation monitoring.

Transfer Learning for Keypoint Detection in Low-Resolution Thermal TUG Test Images

TL;DR

The study tackles keypoint detection in privacy-preserving, low-resolution thermal images for mobility assessment. It proposes a two-stage transfer-learning pipeline: Faster R-CNN for bounding boxes, and a MobileNetV3-Small encoder guided by an RGB ViTPose teacher with a shared ViTPose decoder, optimized via a composite loss balancing latent representation and heatmap accuracy. It achieves strong OKS-based AP metrics (AP 0.861, AP50 0.942, AP75 0.887) while substantially reducing parameters and FLOPS compared with baselines, with beta = 0.4 providing optimal balance. This work demonstrates the feasibility of accurate, efficient thermal-image keypoint detection for clinical mobility evaluation, enabling privacy-friendly, edge-capable monitoring and paving the way for broader clinical tests and rehabilitation applications.

Abstract

This study presents a novel approach to human keypoint detection in low-resolution thermal images using transfer learning techniques. We introduce the first application of the Timed Up and Go (TUG) test in thermal image computer vision, establishing a new paradigm for mobility assessment. Our method leverages a MobileNetV3-Small encoder and a ViTPose decoder, trained using a composite loss function that balances latent representation alignment and heatmap accuracy. The model was evaluated using the Object Keypoint Similarity (OKS) metric from the COCO Keypoint Detection Challenge. The proposed model achieves better performance with AP, AP50, and AP75 scores of 0.861, 0.942, and 0.887 respectively, outperforming traditional supervised learning approaches like Mask R-CNN and ViTPose-Base. Moreover, our model demonstrates superior computational efficiency in terms of parameter count and FLOPS. This research lays a solid foundation for future clinical applications of thermal imaging in mobility assessment and rehabilitation monitoring.

Paper Structure

This paper contains 13 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: (a) The original thermal image. (b) The thermal image with annotated human keypoints which generated from proposed model.
  • Figure 2: The architecture for keypoint detection via transfer learning. The blocks with a gray background indicate pre-trained components that are frozen during training, while the blocks with a yellow background represent the unfrozen parts that are fine-tuned. The input data consists of human bounding box detections that have been cropped and resized.
  • Figure 3: The effect of different $\beta$ values on the Average Precision (AP), AP50, and AP75 in keypoint detection tasks.