Table of Contents
Fetching ...

ConceptCarve: Dynamic Realization of Evidence

Eylon Caplan, Dan Goldwasser

TL;DR

ConceptCarve tackles the challenge of retrieving evidence for complex human trends across communities by dynamically carving a realizable, interpretable representation of evidence. The framework couples traditional retrievers with an LLM-powered Characterizer to iteratively grow a concept tree that captures domain-sensitive realizations of trends. It introduces dataset-informed reranking (DIR) to allow full-dataset context during ranking, and demonstrates superior performance over baselines on a large Reddit-based corpus without any training or fine-tuning. The approach yields interpretable insights into how evidence of moral foundations is realized differently across communities, with scalable retrieval and clear qualitative analysis. Overall, ConceptCarve advances evidence-aware IR by integrating adaptive search space carving, domain adaptation, and interpretability in a single, scalable framework.

Abstract

Finding evidence for human opinion and behavior at scale is a challenging task, often requiring an understanding of sophisticated thought patterns among vast online communities found on social media. For example, studying how gun ownership is related to the perception of Freedom, requires a retrieval system that can operate at scale over social media posts, while dealing with two key challenges: (1) identifying abstract concept instances, (2) which can be instantiated differently across different communities. To address these, we introduce ConceptCarve, an evidence retrieval framework that utilizes traditional retrievers and LLMs to dynamically characterize the search space during retrieval. Our experiments show that ConceptCarve surpasses traditional retrieval systems in finding evidence within a social media community. It also produces an interpretable representation of the evidence for that community, which we use to qualitatively analyze complex thought patterns that manifest differently across the communities.

ConceptCarve: Dynamic Realization of Evidence

TL;DR

ConceptCarve tackles the challenge of retrieving evidence for complex human trends across communities by dynamically carving a realizable, interpretable representation of evidence. The framework couples traditional retrievers with an LLM-powered Characterizer to iteratively grow a concept tree that captures domain-sensitive realizations of trends. It introduces dataset-informed reranking (DIR) to allow full-dataset context during ranking, and demonstrates superior performance over baselines on a large Reddit-based corpus without any training or fine-tuning. The approach yields interpretable insights into how evidence of moral foundations is realized differently across communities, with scalable retrieval and clear qualitative analysis. Overall, ConceptCarve advances evidence-aware IR by integrating adaptive search space carving, domain adaptation, and interpretability in a single, scalable framework.

Abstract

Finding evidence for human opinion and behavior at scale is a challenging task, often requiring an understanding of sophisticated thought patterns among vast online communities found on social media. For example, studying how gun ownership is related to the perception of Freedom, requires a retrieval system that can operate at scale over social media posts, while dealing with two key challenges: (1) identifying abstract concept instances, (2) which can be instantiated differently across different communities. To address these, we introduce ConceptCarve, an evidence retrieval framework that utilizes traditional retrievers and LLMs to dynamically characterize the search space during retrieval. Our experiments show that ConceptCarve surpasses traditional retrieval systems in finding evidence within a social media community. It also produces an interpretable representation of the evidence for that community, which we use to qualitatively analyze complex thought patterns that manifest differently across the communities.

Paper Structure

This paper contains 24 sections, 3 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Example of no lexical gap, shallow gap, inferential gap, and domain sensitivity in a retrieval task.
  • Figure 2: (Left): Concept tree: promoted (green) and demoted (red) concepts. Note that this tree sets the root to the ambiguous 'Bank'; our Characterizer would have used the full intent. (Right): Concept of "Clear Water", displayed with its set of groundings.
  • Figure 3: ConceptCarve: a concept tree is recursively constructed in steps $1\ldots t$ by alternating between Characterizer (generates new concepts from intermediate docs) and Retriever (retrieves docs using intermediate trees). Output tree represents the realization of the input trend within the input community and can be used for evidence retrieval or analyzed directly.
  • Figure 4: MAP@k on DIR task.
  • Figure 5: Trend: "Increase in frustration with family members who seem to prioritize personal ambitions over traditional family values."
  • ...and 10 more figures