Table of Contents
Fetching ...

AgentPose: Progressive Distribution Alignment via Feature Agent for Human Pose Distillation

Feng Zhang, Jinwei Liu, Xiatian Zhu, Lei Chen

TL;DR

AgentPose tackles the capacity gap in human pose distillation by introducing a diffusion-based feature agent that progressively aligns student features with the teacher's distribution. It combines a forward variance-preserving SDE perturbation, a reverse SDE–driven dynamic modulation via a lightweight diffusion model, and a compact autoencoder to keep computation low. The training optimizes a multi-term loss including task, diffusion, and distillation terms, and inference denoises noisy student features through the agent. On COCO, AgentPose delivers strong pose AP with minimal overhead, outperforming existing distillation methods especially when the teacher–student capacity gap is large.

Abstract

Pose distillation is widely adopted to reduce model size in human pose estimation. However, existing methods primarily emphasize the transfer of teacher knowledge while often neglecting the performance degradation resulted from the curse of capacity gap between teacher and student. To address this issue, we propose AgentPose, a novel pose distillation method that integrates a feature agent to model the distribution of teacher features and progressively aligns the distribution of student features with that of the teacher feature, effectively overcoming the capacity gap and enhancing the ability of knowledge transfer. Our comprehensive experiments conducted on the COCO dataset substantiate the effectiveness of our method in knowledge transfer, particularly in scenarios with a high capacity gap.

AgentPose: Progressive Distribution Alignment via Feature Agent for Human Pose Distillation

TL;DR

AgentPose tackles the capacity gap in human pose distillation by introducing a diffusion-based feature agent that progressively aligns student features with the teacher's distribution. It combines a forward variance-preserving SDE perturbation, a reverse SDE–driven dynamic modulation via a lightweight diffusion model, and a compact autoencoder to keep computation low. The training optimizes a multi-term loss including task, diffusion, and distillation terms, and inference denoises noisy student features through the agent. On COCO, AgentPose delivers strong pose AP with minimal overhead, outperforming existing distillation methods especially when the teacher–student capacity gap is large.

Abstract

Pose distillation is widely adopted to reduce model size in human pose estimation. However, existing methods primarily emphasize the transfer of teacher knowledge while often neglecting the performance degradation resulted from the curse of capacity gap between teacher and student. To address this issue, we propose AgentPose, a novel pose distillation method that integrates a feature agent to model the distribution of teacher features and progressively aligns the distribution of student features with that of the teacher feature, effectively overcoming the capacity gap and enhancing the ability of knowledge transfer. Our comprehensive experiments conducted on the COCO dataset substantiate the effectiveness of our method in knowledge transfer, particularly in scenarios with a high capacity gap.
Paper Structure (16 sections, 9 equations, 1 figure, 5 tables)

This paper contains 16 sections, 9 equations, 1 figure, 5 tables.

Figures (1)

  • Figure 1: The overview of AgentPose. (a) The architecture of AgentPose, (b) Autoencoder and Feature Agent. Feature agent is trained using corrupted teacher feature, and utilizes a specific reverse VP-SDE (variance preserving stochastic differential equation) to calibrate student feature to enhance the effectiveness of pose distillation. Furthermore, an autoencoder and two convolution layers are included in the AgentPose to reduce the computational overhead of feature agent.