LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval
Joohyung Yun, Doyup Lee, Wook-Shin Han
TL;DR
LILaC advances open-domain multimodal retrieval by embedding documents in a two-layer layered component graph that supports coarse and fine-grained reasoning, and by employing late-interaction-based subgraph retrieval guided by LLM-driven query decomposition. The method enables efficient multihop reasoning within and across documents, scoring edges on-the-fly using fine-grained subcomponents to reduce noise from irrelevant content. Across five benchmarks, LILaC achieves state-of-the-art retrieval and end-to-end QA performance without additional fine-tuning, demonstrating the effectiveness of combining dual-granularity graphs with late interaction and modality-aware query decomposition. The approach highlights the value of pretrained multimodal encoders and LLMs for scalable, tuning-free improvement in open-domain multimodal retrieval tasks.
Abstract
Multimodal document retrieval aims to retrieve query-relevant components from documents composed of textual, tabular, and visual elements. An effective multimodal retriever needs to handle two main challenges: (1) mitigate the effect of irrelevant contents caused by fixed, single-granular retrieval units, and (2) support multihop reasoning by effectively capturing semantic relationships among components within and across documents. To address these challenges, we propose LILaC, a multimodal retrieval framework featuring two core innovations. First, we introduce a layered component graph, explicitly representing multimodal information at two layers - each representing coarse and fine granularity - facilitating efficient yet precise reasoning. Second, we develop a late-interaction-based subgraph retrieval method, an edge-based approach that initially identifies coarse-grained nodes for efficient candidate generation, then performs fine-grained reasoning via late interaction. Extensive experiments demonstrate that LILaC achieves state-of-the-art retrieval performance on all five benchmarks, notably without additional fine-tuning. We make the artifacts publicly available at github.com/joohyung00/lilac.
