Table of Contents
Fetching ...

RIRO: Reshaping Inputs, Refining Outputs Unlocking the Potential of Large Language Models in Data-Scarce Contexts

Ali Hamdi, Hozaifa Kassab, Mohamed Bahaa, Marwa Mohamed

TL;DR

RIRO tackles the data-scarce generalization problem of fine-tuned LLMs by stacking input reformulation and output reshaping layers, coupled with QLoRA-based efficient fine-tuning on Phi-2. The approach standardizes inputs to match training distributions and refines outputs for consistency, yielding superior performance across BLEU, ROUGE, Levenshtein, and cosine similarity metrics. Ablation studies confirm that the full Reformulation–Fine-tuning–Reshaping pipeline provides the best accuracy and robustness compared to partial variants. While promising for automated test-case generation and similar high-stakes, data-scarce tasks, the method acknowledges trade-offs in computational cost and potential overfitting, pointing to future work in efficiency and generalization.

Abstract

Large language models (LLMs) have significantly advanced natural language processing, excelling in areas like text generation, summarization, and question-answering. Despite their capabilities, these models face challenges when fine-tuned on small, domain-specific datasets, often struggling to generalize and deliver accurate results with unfamiliar inputs. To tackle this issue, we introduce RIRO, a novel two-layer architecture designed to improve performance in data-scarce environments. The first layer leverages advanced prompt engineering to reformulate inputs, ensuring better alignment with training data, while the second layer focuses on refining outputs to minimize inconsistencies. Through fine-tuning models like Phi-2, Falcon 7B, and Falcon 1B, with Phi-2 outperforming the others. Additionally, we introduce a benchmark using evaluation metrics such as cosine similarity, Levenshtein distance, BLEU score, ROUGE-1, ROUGE-2, and ROUGE-L. While these advancements improve performance, challenges like computational demands and overfitting persist, limiting the potential of LLMs in data-scarce, high-stakes environments such as healthcare, legal documentation, and software testing.

RIRO: Reshaping Inputs, Refining Outputs Unlocking the Potential of Large Language Models in Data-Scarce Contexts

TL;DR

RIRO tackles the data-scarce generalization problem of fine-tuned LLMs by stacking input reformulation and output reshaping layers, coupled with QLoRA-based efficient fine-tuning on Phi-2. The approach standardizes inputs to match training distributions and refines outputs for consistency, yielding superior performance across BLEU, ROUGE, Levenshtein, and cosine similarity metrics. Ablation studies confirm that the full Reformulation–Fine-tuning–Reshaping pipeline provides the best accuracy and robustness compared to partial variants. While promising for automated test-case generation and similar high-stakes, data-scarce tasks, the method acknowledges trade-offs in computational cost and potential overfitting, pointing to future work in efficiency and generalization.

Abstract

Large language models (LLMs) have significantly advanced natural language processing, excelling in areas like text generation, summarization, and question-answering. Despite their capabilities, these models face challenges when fine-tuned on small, domain-specific datasets, often struggling to generalize and deliver accurate results with unfamiliar inputs. To tackle this issue, we introduce RIRO, a novel two-layer architecture designed to improve performance in data-scarce environments. The first layer leverages advanced prompt engineering to reformulate inputs, ensuring better alignment with training data, while the second layer focuses on refining outputs to minimize inconsistencies. Through fine-tuning models like Phi-2, Falcon 7B, and Falcon 1B, with Phi-2 outperforming the others. Additionally, we introduce a benchmark using evaluation metrics such as cosine similarity, Levenshtein distance, BLEU score, ROUGE-1, ROUGE-2, and ROUGE-L. While these advancements improve performance, challenges like computational demands and overfitting persist, limiting the potential of LLMs in data-scarce, high-stakes environments such as healthcare, legal documentation, and software testing.

Paper Structure

This paper contains 13 sections, 7 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: The proposed model architectures for RIRO Versions. (a) Refining LLM: This architecture focuses on input normalization. It aligns the input user stories with the training data distribution. (b) Reshaping LLM: Here, the output reshaping layer to ensure coherent test cases. This method adjusts the final output to maintain consistency and accuracy. (c) Stacked LLM: A combined approach that first normalizes the input, passes it through a fine-tuned LLM, and applies reshaping for output generation.