Alignment For Performance Improvement in Conversation Bots
Raghav Garg, Kapil Sharma, Shrey Singla
TL;DR
It is shown that alignment methods can achieve superior adherence to guardrails compared to instruction fine-tuning alone in conversational agents within predefined guidelines or 'guardrails'.
Abstract
This paper shows that alignment methods can achieve superior adherence to guardrails compared to instruction fine-tuning alone in conversational agents, also known as bots, within predefined guidelines or 'guardrails'. It examines traditional training approaches such as instruction fine-tuning and the recent advancements in direct alignment methods like Identity Preference Optimization (IPO), and Kahneman-Tversky Optimization (KTO). The effectiveness of alignment techniques both pre and post-instruction tuning is highlighted, illustrating their potential to optimize conversational bots in domains that require strict adherence to specified rules, such as customer care.
