ArchSeek: Retrieving Architectural Case Studies Using Vision-Language Models
Danrui Li, Yichao Shi, Yaluo Wang, Ziying Shi, Mubbasir Kapadia
TL;DR
ArchSeek tackles architectural case search by fusing visual and textual data through vision-language models and cross-modal embeddings, enabling text and image queries with in-session recommendations. It introduces a survey-informed database and three user modes (text, image, and interactive recommendations), using $cosine similarity$ between embeddings and a Reciprocal Rank Fusion approach to combine modalities. Evaluation includes a 77-query quantitative study with ablations and a four-task user study, showing superior retrieval performance and positive usability feedback while highlighting diversity and interface improvements as future work. The approach promises more efficient, personalized precedent discovery in architecture and could generalize to other visually driven design domains.
Abstract
Efficiently searching for relevant case studies is critical in architectural design, as designers rely on precedent examples to guide or inspire their ongoing projects. However, traditional text-based search tools struggle to capture the inherently visual and complex nature of architectural knowledge, often leading to time-consuming and imprecise exploration. This paper introduces ArchSeek, an innovative case study search system with recommendation capability, tailored for architecture design professionals. Powered by the visual understanding capabilities from vision-language models and cross-modal embeddings, it enables text and image queries with fine-grained control, and interaction-based design case recommendations. It offers architects a more efficient, personalized way to discover design inspirations, with potential applications across other visually driven design fields. The source code is available at https://github.com/danruili/ArchSeek.
