Rationale-Augmented Retrieval with Constrained LLM Re-Ranking for Task Discovery
Bowen Wei
TL;DR
This paper tackles semantic task discovery in enterprise software where vocabulary misalignment and terminology drift hinder effective retrieval. It introduces a training-free framework that combines a rationale-augmented hybrid pre-filter with a constrained LLM re-ranker, ensuring outputs are drawn only from the actual task catalog. The key contributions include deriving a rationale lexicon from developer test cases, enforcing hallucination-free LLM outputs, and an evaluation framework showing production-grade top-K quality without model training. The approach demonstrates robust performance, fast latency, and rapid adaptability to terminology changes, offering a practical path for enterprise search in dynamic, jargon-heavy domains.
Abstract
Head Start programs utilizing GoEngage face significant challenges when new or rotating staff attempt to locate appropriate Tasks (modules) on the platform homepage. These difficulties arise from domain-specific jargon (e.g., IFPA, DRDP), system-specific nomenclature (e.g., Application Pool), and the inherent limitations of lexical search in handling typos and varied word ordering. We propose a pragmatic hybrid semantic search system that synergistically combines lightweight typo-tolerant lexical retrieval, embedding-based vector similarity, and constrained large language model (LLM) re-ranking. Our approach leverages the organization's existing Task Repository and Knowledge Base infrastructure while ensuring trustworthiness through low false-positive rates, evolvability to accommodate terminological changes, and economic efficiency via intelligent caching, shortlist generation, and graceful degradation mechanisms. We provide a comprehensive framework detailing required resources, a phased implementation strategy with concrete milestones, an offline evaluation protocol utilizing curated test cases (Hit@K, Precision@K, Recall@K, MRR), and an online measurement methodology incorporating query success metrics, zero-result rates, and dwell-time proxies.
