HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting
Wenquan Lu, Yufei Xu, Jing Zhang, Chaoyue Wang, Dacheng Tao
TL;DR
The paper addresses the persistent problem of malformed hands in diffusion-generated images by introducing HandRefiner, a post-processing pipeline that uses a hand mesh–derived depth map to condition inpainting via ControlNet. By discovering a phase transition in the ControlNet guidance, the method enables effective training on synthetic data while maintaining realistic textures, and it can be integrated with existing diffusion models without retraining the base model. Across extensive experiments on datasets like HAGRID and FreiHAND, HandRefiner achieves significant improvements in hand realism, pose accuracy, and detection confidence, as evidenced by objective metrics and human judgments. The work offers practical strategies for fixed or adaptive control strength and demonstrates generalizability to other control signals, with code released for public use.
Abstract
Diffusion models have achieved remarkable success in generating realistic images but suffer from generating accurate human hands, such as incorrect finger counts or irregular shapes. This difficulty arises from the complex task of learning the physical structure and pose of hands from training images, which involves extensive deformations and occlusions. For correct hand generation, our paper introduces a lightweight post-processing solution called $\textbf{HandRefiner}$. HandRefiner employs a conditional inpainting approach to rectify malformed hands while leaving other parts of the image untouched. We leverage the hand mesh reconstruction model that consistently adheres to the correct number of fingers and hand shape, while also being capable of fitting the desired hand pose in the generated image. Given a generated failed image due to malformed hands, we utilize ControlNet modules to re-inject such correct hand information. Additionally, we uncover a phase transition phenomenon within ControlNet as we vary the control strength. It enables us to take advantage of more readily available synthetic data without suffering from the domain gap between realistic and synthetic hands. Experiments demonstrate that HandRefiner can significantly improve the generation quality quantitatively and qualitatively. The code is available at https://github.com/wenquanlu/HandRefiner .
