Table of Contents
Fetching ...

Enhancing Human-Like Responses in Large Language Models

Ethem Yağız Çalık, Talha Rüzgar Akkuş

TL;DR

This work tackles the challenge of making large language models more human-like in conversation by combining synthetic data generation with Direct Preference Optimization (DPO) and Low-Rank Adaptation (LoRA) to fine-tune open-source models. Using a data-generation pipeline that pairs human-like and formal responses, the authors train Llama 3, Qwen, and Mistral Nemo variants, achieving notable gains in conversational naturalness while preserving benchmark performance. Human-likeness evaluations via crowdsourced voting show the fine-tuned models are preferred in ~89–90% of cases, and open-leaderboard results indicate only modest shifts in standard benchmarks, underscoring practical usability. The study contributes openly available models and a synthetic dataset, and discusses ethical considerations and avenues for future work, including dataset diversification and broader evaluation.

Abstract

This paper explores the advancements in making large language models (LLMs) more human-like. We focus on techniques that enhance natural language understanding, conversational coherence, and emotional intelligence in AI systems. The study evaluates various approaches, including fine-tuning with diverse datasets, incorporating psychological principles, and designing models that better mimic human reasoning patterns. Our findings demonstrate that these enhancements not only improve user interactions but also open new possibilities for AI applications across different domains. Future work will address the ethical implications and potential biases introduced by these human-like attributes.

Enhancing Human-Like Responses in Large Language Models

TL;DR

This work tackles the challenge of making large language models more human-like in conversation by combining synthetic data generation with Direct Preference Optimization (DPO) and Low-Rank Adaptation (LoRA) to fine-tune open-source models. Using a data-generation pipeline that pairs human-like and formal responses, the authors train Llama 3, Qwen, and Mistral Nemo variants, achieving notable gains in conversational naturalness while preserving benchmark performance. Human-likeness evaluations via crowdsourced voting show the fine-tuned models are preferred in ~89–90% of cases, and open-leaderboard results indicate only modest shifts in standard benchmarks, underscoring practical usability. The study contributes openly available models and a synthetic dataset, and discusses ethical considerations and avenues for future work, including dataset diversification and broader evaluation.

Abstract

This paper explores the advancements in making large language models (LLMs) more human-like. We focus on techniques that enhance natural language understanding, conversational coherence, and emotional intelligence in AI systems. The study evaluates various approaches, including fine-tuning with diverse datasets, incorporating psychological principles, and designing models that better mimic human reasoning patterns. Our findings demonstrate that these enhancements not only improve user interactions but also open new possibilities for AI applications across different domains. Future work will address the ethical implications and potential biases introduced by these human-like attributes.
Paper Structure (30 sections, 3 equations, 6 figures, 8 tables)

This paper contains 30 sections, 3 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: General schema
  • Figure 2: Atlas Nomic Map of the dataset
  • Figure 3: Reward Margins Graph for the fine-tuned models
  • Figure 4: Example generation of Human-Like-Llama-3-8B-Instruct
  • Figure 5: Example generation of Human-Like-Qwen-2.5-7B-Instruct
  • ...and 1 more figures