GroundUp: Rapid Sketch-Based 3D City Massing
Gizem Esra Unlu, Mohamed Sayed, Yulia Gryaditskaya, Gabriel Brostow
TL;DR
GroundUp introduces the first sketch-based tool for rapid 3D city massing, enabling designers to iteratively switch between a plan sketch and a perspective sketch to quickly infer a heightfield-based city model. The method combines a top-down occupancy mask predictor, a perspective-depth predictor conditioned on occupancy cues, and a multi-conditional latent diffusion model to complete coherent 3D geometry aligned with both sketches. Key contributions include (i) a novel perspective-depth network that leverages top-down scene cues, (ii) a diffusion-based top-down heightfield completion conditioned on dual-view sketches, and (iii) a tight human-centered interface validated by a user study with architects and novices. The results demonstrate fast initial massing prototyping with plausible, editable 3D outputs suitable for downstream architectural design, while pointing to future work on multi-view integration, editable meshes, and expanded scene elements.
Abstract
We propose GroundUp, the first sketch-based ideation tool for 3D city massing of urban areas. We focus on early-stage urban design, where sketching is a common tool and the design starts from balancing building volumes (masses) and open spaces. With Human-Centered AI in mind, we aim to help architects quickly revise their ideas by easily switching between 2D sketches and 3D models, allowing for smoother iteration and sharing of ideas. Inspired by feedback from architects and existing workflows, our system takes as a first input a user sketch of multiple buildings in a top-down view. The user then draws a perspective sketch of the envisioned site. Our method is designed to exploit the complementarity of information in the two sketches and allows users to quickly preview and adjust the inferred 3D shapes. Our model has two main components. First, we propose a novel sketch-to-depth prediction network for perspective sketches that exploits top-down sketch shapes. Second, we use depth cues derived from the perspective sketch as a condition to our diffusion model, which ultimately completes the geometry in a top-down view. Thus, our final 3D geometry is represented as a heightfield, allowing users to construct the city `from the ground up'.
