Table of Contents
Fetching ...

Complex QA and language models hybrid architectures, Survey

Xavier Daull, Patrice Bellot, Emmanuel Bruno, Vincent Martin, Elisabeth Murisasco

TL;DR

This survey analyzes how large language models can be augmented with hybrid architectures to tackle complex, non-factoid questions that require decomposition, multi-source knowledge, and deep reasoning. It surveys core concepts, evaluates metrics and datasets, and organizes a taxonomy of training, prompting, and agentic strategies—ranging from mid- and post-training alignment to PEFT and RAG-based approaches. The authors synthesize architectural patterns (decomposition, external tools, memory, verifiers) and prompting paradigms (context engineering, KPE, ICL) into a cohesive framework for designing robust complex QA systems. They emphasize the shift from single-model QA to agentic, retrieval-grounded pipelines with explicit attribution, verification, and governance to address hallucinations, safety, and data sensitivity, outlining key research challenges and practical guidelines for deployment. Overall, the paper provides a comprehensive resource that connects scalar model improvements with structured reasoning workflows, tool use, and evaluation paradigms to advance practical complex QA systems.

Abstract

This paper reviews the state-of-the-art of large language models (LLM) architectures and strategies for "complex" question-answering with a focus on hybrid architectures. LLM based chatbot services have allowed anyone to grasp the potential of LLM to solve many common problems, but soon discovered their limitations for complex questions. Addressing more specific, complex questions (e.g., "What is the best mix of power-generation methods to reduce climate change ?") often requires specialized architectures, domain knowledge, new skills, decomposition and multi-step resolution, deep reasoning, sensitive data protection, explainability, and human-in-the-loop processes. Therefore, we review: (1) necessary skills and tasks for handling complex questions and common LLM limits to overcome; (2) dataset, cost functions and evaluation metrics for measuring and improving (e.g. accuracy, explainability, fairness, robustness, groundedness, faithfulness, toxicity...); (3) family of solutions to overcome LLM limitations by (a) training and reinforcement (b) hybridization, (c) prompting, (d) agentic-architectures (agents, tools) and extended reasoning.

Complex QA and language models hybrid architectures, Survey

TL;DR

This survey analyzes how large language models can be augmented with hybrid architectures to tackle complex, non-factoid questions that require decomposition, multi-source knowledge, and deep reasoning. It surveys core concepts, evaluates metrics and datasets, and organizes a taxonomy of training, prompting, and agentic strategies—ranging from mid- and post-training alignment to PEFT and RAG-based approaches. The authors synthesize architectural patterns (decomposition, external tools, memory, verifiers) and prompting paradigms (context engineering, KPE, ICL) into a cohesive framework for designing robust complex QA systems. They emphasize the shift from single-model QA to agentic, retrieval-grounded pipelines with explicit attribution, verification, and governance to address hallucinations, safety, and data sensitivity, outlining key research challenges and practical guidelines for deployment. Overall, the paper provides a comprehensive resource that connects scalar model improvements with structured reasoning workflows, tool use, and evaluation paradigms to advance practical complex QA systems.

Abstract

This paper reviews the state-of-the-art of large language models (LLM) architectures and strategies for "complex" question-answering with a focus on hybrid architectures. LLM based chatbot services have allowed anyone to grasp the potential of LLM to solve many common problems, but soon discovered their limitations for complex questions. Addressing more specific, complex questions (e.g., "What is the best mix of power-generation methods to reduce climate change ?") often requires specialized architectures, domain knowledge, new skills, decomposition and multi-step resolution, deep reasoning, sensitive data protection, explainability, and human-in-the-loop processes. Therefore, we review: (1) necessary skills and tasks for handling complex questions and common LLM limits to overcome; (2) dataset, cost functions and evaluation metrics for measuring and improving (e.g. accuracy, explainability, fairness, robustness, groundedness, faithfulness, toxicity...); (3) family of solutions to overcome LLM limitations by (a) training and reinforcement (b) hybridization, (c) prompting, (d) agentic-architectures (agents, tools) and extended reasoning.
Paper Structure (109 sections, 4 equations, 2 figures, 8 tables)

This paper contains 109 sections, 4 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: IBM DeepQA Architecture (2010) ferrucciBuildingWatsonOverview2010 CQA pipeline steps "patched" to reflect new practice with verifier/critic loops, retrieval gating, and programmmatic research actions.
  • Figure 2: From reinforcement learning with human feedback to AI feedback in order to scale and maximize helpfulness vs harmless tradeoff (ouyangTrainingLanguageModels2022baiConstitutionalAIHarmlessness2022)