Table of Contents
Fetching ...

Composite Learning Units: Generalized Learning Beyond Parameter Updates to Transform LLMs into Adaptive Reasoners

Santosh Kumar Radha, Oktay Goktas

TL;DR

This work introduces Composite Learning Units (CLUs), designed to transform reasoners, such as Large Language Models (LLMs), into learners capable of generalized, continuous learning without conventional parameter updates while enhancing their reasoning abilities through continual interaction and feedback.

Abstract

Human learning thrives on the ability to learn from mistakes, adapt through feedback, and refine understanding-processes often missing in static machine learning models. In this work, we introduce Composite Learning Units (CLUs) designed to transform reasoners, such as Large Language Models (LLMs), into learners capable of generalized, continuous learning without conventional parameter updates while enhancing their reasoning abilities through continual interaction and feedback. CLUs are built on an architecture that allows a reasoning model to maintain and evolve a dynamic knowledge repository: a General Knowledge Space for broad, reusable insights and a Prompt-Specific Knowledge Space for task-specific learning. Through goal-driven interactions, CLUs iteratively refine these knowledge spaces, enabling the system to adapt dynamically to complex tasks, extract nuanced insights, and build upon past experiences autonomously. We demonstrate CLUs' effectiveness through a cryptographic reasoning task, where they continuously evolve their understanding through feedback to uncover hidden transformation rules. While conventional models struggle to grasp underlying logic, CLUs excel by engaging in an iterative, goal-oriented process. Specialized components-handling knowledge retrieval, prompt generation, and feedback analysis-work together within a reinforcing feedback loop. This approach allows CLUs to retain the memory of past failures and successes, adapt autonomously, and apply sophisticated reasoning effectively, continually learning from mistakes while also building on breakthroughs.

Composite Learning Units: Generalized Learning Beyond Parameter Updates to Transform LLMs into Adaptive Reasoners

TL;DR

This work introduces Composite Learning Units (CLUs), designed to transform reasoners, such as Large Language Models (LLMs), into learners capable of generalized, continuous learning without conventional parameter updates while enhancing their reasoning abilities through continual interaction and feedback.

Abstract

Human learning thrives on the ability to learn from mistakes, adapt through feedback, and refine understanding-processes often missing in static machine learning models. In this work, we introduce Composite Learning Units (CLUs) designed to transform reasoners, such as Large Language Models (LLMs), into learners capable of generalized, continuous learning without conventional parameter updates while enhancing their reasoning abilities through continual interaction and feedback. CLUs are built on an architecture that allows a reasoning model to maintain and evolve a dynamic knowledge repository: a General Knowledge Space for broad, reusable insights and a Prompt-Specific Knowledge Space for task-specific learning. Through goal-driven interactions, CLUs iteratively refine these knowledge spaces, enabling the system to adapt dynamically to complex tasks, extract nuanced insights, and build upon past experiences autonomously. We demonstrate CLUs' effectiveness through a cryptographic reasoning task, where they continuously evolve their understanding through feedback to uncover hidden transformation rules. While conventional models struggle to grasp underlying logic, CLUs excel by engaging in an iterative, goal-oriented process. Specialized components-handling knowledge retrieval, prompt generation, and feedback analysis-work together within a reinforcing feedback loop. This approach allows CLUs to retain the memory of past failures and successes, adapt autonomously, and apply sophisticated reasoning effectively, continually learning from mistakes while also building on breakthroughs.

Paper Structure

This paper contains 19 sections, 13 equations, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: This illustration contrasts traditional learning methods, which rely on parameter updates and fine-tuning for each task, with Composite Learning Units (CLUs). By decoupling memory from reasoning, CLUs enable a feedback-driven continuous learning system that adapts iteratively, allowing the framework to evolve through reasoning and retain evolving knowledge for future tasks. This figure highlights how CLUs, driven by active inference, enable continuous adaptation and refinement beyond the limitations of static models.
  • Figure 2: This figure shows the core components and dynamic processes of the Composite Learning Unit (CLU) framework, illustrating the adaptive process of CLUs, which are akin to intelligent systems that continually refine themselves through iterative practice and discovery to evolve and improve their reasoning abilities. (a) CLUs are modular learning units that employ Large Language Models (LLMs) as base reasoners, leveraging feedback to optimize their performance iteratively. Agents, composed of these reasoners, adapt progressively through evolving knowledge bases that respond to tasks based on set goals. (b) The Learning Phase in the figure illustrates an example of learning a shape transformation task, where CLUs iteratively refine their understanding of the transformation rules. Given few examples, the unit learns through positive and negative feedback loops—progressively improving until it accurately understands and executes the transformation (c) The Reasoning Phase demonstrates how CLUs utilize accumulated knowledge to solve tasks effectively, focusing on execution without the need for additional training. (d) The knowledge base evolves through Active Learning, where knowledge grows via continuous practice and experience via feedback. See \ref{['sec:theory-framework']} for more details.
  • Figure 3: Process flow within the Composite Learning Unit (CLU) framework. This diagram outlines the interaction between different agents, task inputs, and knowledge spaces, illustrating how feedback is integrated to improve the system's performance over time. The CLU framework operates in two distinct phases: the learning phase, where the system iteratively refines its internal knowledge representations based on feedback from diverse datasets, and the reasoning phase, where the system applies its existing knowledge to solve tasks without altering its internal state. Details about the components and their interactions are discussed in \ref{['sec:theory-framework']}.
  • Figure 4: This figure shows the Knowledge Management Unit (KMU) and task execution flow, illustrating the interactions between knowledge storage, retrieval, and the operational agents for task-solving. In (a), the KMU dynamically manages knowledge through three specialized agents: the Search Agent, which generates key search terms or tags for retrieving relevant knowledge; the Knowledge Alignment Agent, which processes raw input data $\mathcal{I}$ and aligns it to the main goal before storing it in a vector database; and the Pruning Agent, which refines the stored knowledge based on feedback, ensuring that irrelevant or outdated information is pruned from the system. The two knowledge spaces, General Knowledge Space $\mathcal{K_G}$ and Prompt-Specific Knowledge Space $\mathcal{K_P}$, are dynamically informed and updated by the agents' operations. In (b), the task space $\mathcal{T}$ provides task inputs $x \in \mathcal{T}$, which are processed by the Operational Agent $A_O$. The Operational Agent can operate as a Single Agent or a Multi-Agent system, depending on the task complexity. The agent retrieves knowledge from the general $\mathcal{K_G}$ and prompt-specific $\mathcal{K_P}$ knowledge spaces. A Meta-Prompt Agent $A_{MP}$ generates a prompt $p$ based on the retrieved knowledge from $\mathcal{K_P}$, guiding the Operational Agent in solving the task. The output $\hat{y}$ is the result of the task, combining input $x$, knowledge retrieval, and prompt generation. This combined figure showcases how feedback from task performance is used to iteratively refine the knowledge spaces, continuously improving the system's reasoning and execution capabilities.
  • Figure 5: (a)Baseline Performance with IO/CoT: The performance of the baseline Input-Output (IO) and Chain-of-Thought (CoT) methods is depicted here, showing $0\%$ accuracy for all shots, from 0-shot to 4-shot. This indicates the inability of the underlying GPT-4o-mini model to infer the correct transformation rule, even with increased examples and guided reasoning prompts. (b)Learning Dynamics of Composite Learning Unit (CLU): The CLU's performance over multiple learning iterations is presented. Initially, the accuracy remains low, but after an inflection point, CLU's performance rapidly increases, eventually stabilizing near $100\%$. This improvement is driven by CLU's iterative refinement of its knowledge, facilitated by feedback mechanisms that guide the system to correctly infer the transformation rule, as detailed in \ref{['sec:results']}. (c)Evolution of the Transformation Rule Understanding: The evolution of CLU's internal understanding is shown here, illustrating the progression of hypotheses across learning iterations for the 1-shot setting. Initially, CLU makes vague or incorrect guesses regarding the transformation rule, but as it iteratively incorporates feedback, it eventually converges on the correct rule of selecting the second letter from each word.
  • ...and 1 more figures