Table of Contents
Fetching ...

Robust Low-Light Human Pose Estimation through Illumination-Texture Modulation

Feng Zhang, Ze Li, Xiatian Zhu, Lei Chen

TL;DR

Low-light pose estimation suffers from poor visibility and noise, making robust feature learning difficult. The paper introduces a frequency-decoupled divide-and-conquer framework that applies dynamic illumination correction to low-frequency content and multi-scale low-rank denoising to high-frequency details, trained end-to-end with task loss. Key contributions include a plug-and-play enhancement module, global/local illumination correction via Taylor-series approximations, and a multi-scale low-rank denoising mechanism with cross-scale fusion, all validated on the ExLPose dataset with strong improvements over state-of-the-art across challenging lighting. The approach offers robust pose estimation under extreme low-light, with modest computational overhead and without heavy preprocessing or strict data pairing requirements, enabling practical deployment in real-world scenarios.

Abstract

As critical visual details become obscured, the low visibility and high ISO noise in extremely low-light images pose a significant challenge to human pose estimation. Current methods fail to provide high-quality representations due to reliance on pixel-level enhancements that compromise semantics and the inability to effectively handle extreme low-light conditions for robust feature learning. In this work, we propose a frequency-based framework for low-light human pose estimation, rooted in the "divide-and-conquer" principle. Instead of uniformly enhancing the entire image, our method focuses on task-relevant information. By applying dynamic illumination correction to the low-frequency components and low-rank denoising to the high-frequency components, we effectively enhance both the semantic and texture information essential for accurate pose estimation. As a result, this targeted enhancement method results in robust, high-quality representations, significantly improving pose estimation performance. Extensive experiments demonstrating its superiority over state-of-the-art methods in various challenging low-light scenarios.

Robust Low-Light Human Pose Estimation through Illumination-Texture Modulation

TL;DR

Low-light pose estimation suffers from poor visibility and noise, making robust feature learning difficult. The paper introduces a frequency-decoupled divide-and-conquer framework that applies dynamic illumination correction to low-frequency content and multi-scale low-rank denoising to high-frequency details, trained end-to-end with task loss. Key contributions include a plug-and-play enhancement module, global/local illumination correction via Taylor-series approximations, and a multi-scale low-rank denoising mechanism with cross-scale fusion, all validated on the ExLPose dataset with strong improvements over state-of-the-art across challenging lighting. The approach offers robust pose estimation under extreme low-light, with modest computational overhead and without heavy preprocessing or strict data pairing requirements, enabling practical deployment in real-world scenarios.

Abstract

As critical visual details become obscured, the low visibility and high ISO noise in extremely low-light images pose a significant challenge to human pose estimation. Current methods fail to provide high-quality representations due to reliance on pixel-level enhancements that compromise semantics and the inability to effectively handle extreme low-light conditions for robust feature learning. In this work, we propose a frequency-based framework for low-light human pose estimation, rooted in the "divide-and-conquer" principle. Instead of uniformly enhancing the entire image, our method focuses on task-relevant information. By applying dynamic illumination correction to the low-frequency components and low-rank denoising to the high-frequency components, we effectively enhance both the semantic and texture information essential for accurate pose estimation. As a result, this targeted enhancement method results in robust, high-quality representations, significantly improving pose estimation performance. Extensive experiments demonstrating its superiority over state-of-the-art methods in various challenging low-light scenarios.
Paper Structure (11 sections, 10 equations, 2 figures, 4 tables)

This paper contains 11 sections, 10 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Framework of our proposed methods. We offer a seamless integration of a plug-and-play low-light enhancement module into human pose estimation networks. (a) is the overview of our methods. (b) gives the structure of the dynamic illuminance correction. $GC(\cdot)$ denotes the global correction, and $LC(\cdot)$ represents the local correction. (c) presents the structure of the multi-scale low-rank denoising.
  • Figure 2: Visual comparison of human pose estimation results among our method and current state of the art approaches (PENet yin2023pe, DENet qin2022denet, FeatEnHancer hashmi2023featenhancer).