Contrastive Diffusion Guidance for Spatial Inverse Problems
Sattwik Basu, Chaitanya Amballa, Zhongweiyang Xu, Jorge Vančo Sampedro, Srihari Nelakuditi, Romit Roy Choudhury
TL;DR
This work addresses the ill-posed problem of inferring indoor floorplans from user trajectories by confronting the non-differentiable, non-smooth path-planning forward operator. It introduces CoGuide, a diffusion-based posterior sampler guided not by a brittle likelihood through the forward operator, but by a smooth contrastive embedding space that aligns floorplans with their compatible trajectories. By training encoders with a symmetric contrastive objective and using a squared-distance surrogate in embedding space, CoGuide stabilizes diffusion guidance and yields more consistent floorplans than several differentiable-planner baselines and prior diffusion-based solvers. The method is demonstrated on the HouseExpo dataset across varying trajectory densities, with robust gains in sparse/moderate regimes and competitive performance in dense settings, underscoring its potential for a broad class of spatial inverse problems where the forward operator is difficult to model. The work also outlines avenues for extending contrastive guidance to other non-differentiable operators and for future improvements in realism, uncertainty quantification, and blind inverse problems.
Abstract
We consider the inverse problem of reconstructing the spatial layout of a place, a home floorplan for example, from a user`s movements inside that layout. Direct inversion is ill-posed since many floorplans can explain the same movement trajectories. We adopt a diffusion-based posterior sampler to generate layouts consistent with the measurements. While active research is in progress on generative inverse solvers, we find that the forward operator in our problem poses new challenges. The path-planning process inside a floorplan is a non-invertible, non-differentiable function, and causes instability while optimizing using the likelihood score. We break-away from existing approaches and reformulate the likelihood score in a smoother embedding space. The embedding space is trained with a contrastive loss which brings compatible floorplans and trajectories close to each other, while pushing mismatched pairs far apart. We show that a surrogate form of the likelihood score in this embedding space is a valid approximation of the true likelihood score, making it possible to steer the denoising process towards the posterior. Across extensive experiments, our model CoGuide produces more consistent floorplans from trajectories, and is more robust than differentiable-planner baselines and guided-diffusion methods.
