Conditional Consistency Guided Image Translation and Enhancement
Amil Bhagat, Milind Jain, A. V. Subramanyam
TL;DR
This paper tackles multi-domain image translation and low-light image enhancement by extending consistency models with conditional inputs. It introduces Conditional Consistency Models (CCMs) that incorporate a conditional image to guide the denoising process, enabling conditional translation and enhancement in a single-step inference without adversarial training. The authors formulate a conditional consistency function, propose Conditional Consistency Training (CCT), and validate across ten datasets, showing competitive or superior performance on several benchmarks while demonstrating strong generalization. The work offers a fast, robust alternative to diffusion and GAN-based approaches, with potential practical impact in cross-modal vision and medical imaging workflows.
Abstract
Consistency models have emerged as a promising alternative to diffusion models, offering high-quality generative capabilities through single-step sample generation. However, their application to multi-domain image translation tasks, such as cross-modal translation and low-light image enhancement remains largely unexplored. In this paper, we introduce Conditional Consistency Models (CCMs) for multi-domain image translation by incorporating additional conditional inputs. We implement these modifications by introducing task-specific conditional inputs that guide the denoising process, ensuring that the generated outputs retain structural and contextual information from the corresponding input domain. We evaluate CCMs on 10 different datasets demonstrating their effectiveness in producing high-quality translated images across multiple domains. Code is available at https://github.com/amilbhagat/Conditional-Consistency-Models.
