ConceptCarve: Dynamic Realization of Evidence
Eylon Caplan, Dan Goldwasser
TL;DR
ConceptCarve tackles the challenge of retrieving evidence for complex human trends across communities by dynamically carving a realizable, interpretable representation of evidence. The framework couples traditional retrievers with an LLM-powered Characterizer to iteratively grow a concept tree that captures domain-sensitive realizations of trends. It introduces dataset-informed reranking (DIR) to allow full-dataset context during ranking, and demonstrates superior performance over baselines on a large Reddit-based corpus without any training or fine-tuning. The approach yields interpretable insights into how evidence of moral foundations is realized differently across communities, with scalable retrieval and clear qualitative analysis. Overall, ConceptCarve advances evidence-aware IR by integrating adaptive search space carving, domain adaptation, and interpretability in a single, scalable framework.
Abstract
Finding evidence for human opinion and behavior at scale is a challenging task, often requiring an understanding of sophisticated thought patterns among vast online communities found on social media. For example, studying how gun ownership is related to the perception of Freedom, requires a retrieval system that can operate at scale over social media posts, while dealing with two key challenges: (1) identifying abstract concept instances, (2) which can be instantiated differently across different communities. To address these, we introduce ConceptCarve, an evidence retrieval framework that utilizes traditional retrievers and LLMs to dynamically characterize the search space during retrieval. Our experiments show that ConceptCarve surpasses traditional retrieval systems in finding evidence within a social media community. It also produces an interpretable representation of the evidence for that community, which we use to qualitatively analyze complex thought patterns that manifest differently across the communities.
