Table of Contents
Fetching ...

LWIRPOSE: A novel LWIR Thermal Image Dataset and Benchmark

Avinash Upadhyay, Bhipanshu Dhupar, Manoj Sharma, Ankit Shukla, Ajith Abraham

TL;DR

This paper tackles the lack of large-scale, annotated thermal-domain datasets for 2D human pose estimation by introducing LWIRPOSE, a dataset with over 2,400 LWIR images, 17 MPII-format keypoints, seven subjects, and 12 activities, complemented by near-paired RGB frames. It documents a data-collection and annotation pipeline, including a custom tool to align RGB-derived keypoints to IR images, and provides a RGB-based baseline evaluation using state-of-the-art models (e.g., ViTPose, HRNet, ResNet baselines) on thermal data, reporting MPJPE and PCKh metrics. The results reveal that while ViTPose offers the best performance among tested models, thermal-domain challenges—especially occlusion and self-occlusion—limit current RGB-trained approaches, establishing a strong baseline and a path for domain-specific adaptations. The dataset and benchmarks enable future research in robust LWIR pose estimation, domain fusion, and practical deployments in surveillance, healthcare, and sports analytics, with code and data made available at the provided GitHub link.

Abstract

Human pose estimation faces hurdles in real-world applications due to factors like lighting changes, occlusions, and cluttered environments. We introduce a unique RGB-Thermal Nearly Paired and Annotated 2D Pose Dataset, comprising over 2,400 high-quality LWIR (thermal) images. Each image is meticulously annotated with 2D human poses, offering a valuable resource for researchers and practitioners. This dataset, captured from seven actors performing diverse everyday activities like sitting, eating, and walking, facilitates pose estimation on occlusion and other challenging scenarios. We benchmark state-of-the-art pose estimation methods on the dataset to showcase its potential, establishing a strong baseline for future research. Our results demonstrate the dataset's effectiveness in promoting advancements in pose estimation for various applications, including surveillance, healthcare, and sports analytics. The dataset and code are available at https://github.com/avinres/LWIRPOSE

LWIRPOSE: A novel LWIR Thermal Image Dataset and Benchmark

TL;DR

This paper tackles the lack of large-scale, annotated thermal-domain datasets for 2D human pose estimation by introducing LWIRPOSE, a dataset with over 2,400 LWIR images, 17 MPII-format keypoints, seven subjects, and 12 activities, complemented by near-paired RGB frames. It documents a data-collection and annotation pipeline, including a custom tool to align RGB-derived keypoints to IR images, and provides a RGB-based baseline evaluation using state-of-the-art models (e.g., ViTPose, HRNet, ResNet baselines) on thermal data, reporting MPJPE and PCKh metrics. The results reveal that while ViTPose offers the best performance among tested models, thermal-domain challenges—especially occlusion and self-occlusion—limit current RGB-trained approaches, establishing a strong baseline and a path for domain-specific adaptations. The dataset and benchmarks enable future research in robust LWIR pose estimation, domain fusion, and practical deployments in surveillance, healthcare, and sports analytics, with code and data made available at the provided GitHub link.

Abstract

Human pose estimation faces hurdles in real-world applications due to factors like lighting changes, occlusions, and cluttered environments. We introduce a unique RGB-Thermal Nearly Paired and Annotated 2D Pose Dataset, comprising over 2,400 high-quality LWIR (thermal) images. Each image is meticulously annotated with 2D human poses, offering a valuable resource for researchers and practitioners. This dataset, captured from seven actors performing diverse everyday activities like sitting, eating, and walking, facilitates pose estimation on occlusion and other challenging scenarios. We benchmark state-of-the-art pose estimation methods on the dataset to showcase its potential, establishing a strong baseline for future research. Our results demonstrate the dataset's effectiveness in promoting advancements in pose estimation for various applications, including surveillance, healthcare, and sports analytics. The dataset and code are available at https://github.com/avinres/LWIRPOSE
Paper Structure (13 sections, 1 equation, 4 figures, 4 tables)

This paper contains 13 sections, 1 equation, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Sample of thermal images from the LWIRPose Dataset. The samples belongs to one subject performing 12 different activities. It is visible from the images that the data constitutes complexities such as occlusion, self-occlusion and noises.
  • Figure 2: Image represents samples from the dataset. The RGB-IR images that were nearly paired were captured using the camera. The annotated pose points are shown on the LWIR images. Four different Subjects are performing different activities.
  • Figure 3: Visual Results of different Deep learning models on thermal images from the dataset. Blue Points are Ground Truth, and Red Points are Predicted Pose points. It can be seen that ViT-Pose has performed much better than other models. However, the performance of almost all existing RGB-based models deteriorates on the thermal images.
  • Figure 4: Failed cases of ViTPose. It can be seen that for complex poses ViTPose fails to extract and decode the features properly, representing the complexity involved with the LWIR images.