Table of Contents
Fetching ...

Harmonic Reasoning in Large Language Models

Anna Kruspe

TL;DR

It is shown that while LLMs do well with note intervals, they struggle with more complicated tasks like recognizing chords and scales, which points out clear limits in current LLM abilities and shows where they need to make them better.

Abstract

Large Language Models (LLMs) are becoming very popular and are used for many different purposes, including creative tasks in the arts. However, these models sometimes have trouble with specific reasoning tasks, especially those that involve logical thinking and counting. This paper looks at how well LLMs understand and reason when dealing with musical tasks like figuring out notes from intervals and identifying chords and scales. We tested GPT-3.5 and GPT-4o to see how they handle these tasks. Our results show that while LLMs do well with note intervals, they struggle with more complicated tasks like recognizing chords and scales. This points out clear limits in current LLM abilities and shows where we need to make them better, which could help improve how they think and work in both artistic and other complex areas. We also provide an automatically generated benchmark data set for the described tasks.

Harmonic Reasoning in Large Language Models

TL;DR

It is shown that while LLMs do well with note intervals, they struggle with more complicated tasks like recognizing chords and scales, which points out clear limits in current LLM abilities and shows where they need to make them better.

Abstract

Large Language Models (LLMs) are becoming very popular and are used for many different purposes, including creative tasks in the arts. However, these models sometimes have trouble with specific reasoning tasks, especially those that involve logical thinking and counting. This paper looks at how well LLMs understand and reason when dealing with musical tasks like figuring out notes from intervals and identifying chords and scales. We tested GPT-3.5 and GPT-4o to see how they handle these tasks. Our results show that while LLMs do well with note intervals, they struggle with more complicated tasks like recognizing chords and scales. This points out clear limits in current LLM abilities and shows where we need to make them better, which could help improve how they think and work in both artistic and other complex areas. We also provide an automatically generated benchmark data set for the described tasks.
Paper Structure (12 sections, 2 figures, 1 table)

This paper contains 12 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Results of the interval experiments, starting with upward intervals with sharps and within the same octave only, and then increasing difficulty of the task.
  • Figure 2: Results for the task of recognizing chords and scales from the contained notes: First in their basic form only, then with shuffled notes, random enharmonic versions, and both combined. "Informed" means that the model was told in advance what chords/scales were possible, and how they had been transformed.