AIC CTU@AVerImaTeC: dual-retriever RAG for image-text fact checking
Herbert Ullrich, Jan Drchal
TL;DR
This work presents a modular, low-cost dual-retriever RAG system for image-text fact-checking in the AVerImaTeC task by combining a text-based vector search with a reverse image search (RIS) module and a single GPT-5.1 generation per claim. The approach achieves competitive performance (3rd place) and emphasizes reproducibility, including release of code, prompts, and vector stores, while detailing cost structure and practical limitations. Key findings highlight a strong question-focused signal but identify bottlenecks in evidence formatting and RIS reliability, guiding future improvements. Overall, the paper contributes a practical baseline and a cost-aware blueprint for multimodal fact-checking systems.
Abstract
In this paper, we present our 3rd place system in the AVerImaTeC shared task, which combines our last year's retrieval-augmented generation (RAG) pipeline with a reverse image search (RIS) module. Despite its simplicity, our system delivers competitive performance with a single multimodal LLM call per fact-check at just $0.013 on average using GPT5.1 via OpenAI Batch API. Our system is also easy to reproduce and tweak, consisting of only three decoupled modules - a textual retrieval module based on similarity search, an image retrieval module based on API-accessed RIS, and a generation module using GPT5.1 - which is why we suggest it as an accesible starting point for further experimentation. We publish its code and prompts, as well as our vector stores and insights into the scheme's running costs and directions for further improvement.
