ControlVP: Interactive Geometric Refinement of AI-Generated Images with Consistent Vanishing Points
Ryota Okumura, Kaede Shiohara, Toshihiko Yamasaki
TL;DR
ControlVP addresses vanishing point inconsistencies in AI-generated architectural images by enabling user-guided refinements. It extends a pre-trained diffusion model with a ControlNet-like conditioning on user-drawn building outlines and introduces a Vanishing Point Loss to enforce edge alignment with perspective cues. The approach uses an inpainting-based VP correction process, a dedicated dataset of VP inconsistencies, and extensive ablations showing improved VP accuracy while preserving perceptual quality. The work demonstrates practical potential for geometry-aware editing and downstream tasks like image-to-3D reconstruction, with an accessible GUI and publicly released code.
Abstract
Recent text-to-image models, such as Stable Diffusion, have achieved impressive visual quality, yet they often suffer from geometric inconsistencies that undermine the structural realism of generated scenes. One prominent issue is vanishing point inconsistency, where projections of parallel lines fail to converge correctly in 2D space. This leads to structurally implausible geometry that degrades spatial realism, especially in architectural scenes. We propose ControlVP, a user-guided framework for correcting vanishing point inconsistencies in generated images. Our approach extends a pre-trained diffusion model by incorporating structural guidance derived from building contours. We also introduce geometric constraints that explicitly encourage alignment between image edges and perspective cues. Our method enhances global geometric consistency while maintaining visual fidelity comparable to the baselines. This capability is particularly valuable for applications that require accurate spatial structure, such as image-to-3D reconstruction. The dataset and source code are available at https://github.com/RyotaOkumura/ControlVP .
