Table of Contents
Fetching ...

Using Zero-shot Prompting in the Automatic Creation and Expansion of Topic Taxonomies for Tagging Retail Banking Transactions

Daniel de S. Moraes, Pedro T. C. Santos, Polyana B. da Costa, Matheus A. S. Pinto, Ivan de J. P. Pinto, Álvaro M. G. da Veiga, Sergio Colcher, Antonio J. G. Busson, Rafael H. Rocha, Rennan Gaio, Rafael Miceli, Gabriela Tourinho, Marcos Rabaioli, Leandro Santos, Fellipe Marques, David Favaro

TL;DR

The paper tackles the challenge of manually creating and updating topic taxonomies for tagging retail banking transactions by proposing an unsupervised framework that combines keyword extraction, topic modeling, and instruction-based LLM post-processing. It introduces zero-shot prompting for taxonomy expansion and demonstrates the approach on a private dataset, producing 58 Food and 6 Shopping taxonomies with high coherence (>90%) in qualitative evaluation, and strong expansion performance via commercial LLMs (e.g., GPT-4, Gemini) for parent-node prediction. Key contributions include a full pipeline for automatic taxonomy construction, a novel zero-shot expansion workflow, and comparative evidence that LLM-based expansion can outperform traditional baselines in low-resource settings. The work has practical implications for scalable, adaptable tagging of retail transactions and highlights directions for further improving taxonomy enrichment with embeddings and architectural refinements.

Abstract

This work presents an unsupervised method for automatically constructing and expanding topic taxonomies using instruction-based fine-tuned LLMs (Large Language Models). We apply topic modeling and keyword extraction techniques to create initial topic taxonomies and LLMs to post-process the resulting terms and create a hierarchy. To expand an existing taxonomy with new terms, we use zero-shot prompting to find out where to add new nodes, which, to our knowledge, is the first work to present such an approach to taxonomy tasks. We use the resulting taxonomies to assign tags that characterize merchants from a retail bank dataset. To evaluate our work, we asked 12 volunteers to answer a two-part form in which we first assessed the quality of the taxonomies created and then the tags assigned to merchants based on that taxonomy. The evaluation revealed a coherence rate exceeding 90% for the chosen taxonomies. The taxonomies' expansion with LLMs also showed exciting results for parent node prediction, with an f1-score above 70% in our taxonomies.

Using Zero-shot Prompting in the Automatic Creation and Expansion of Topic Taxonomies for Tagging Retail Banking Transactions

TL;DR

The paper tackles the challenge of manually creating and updating topic taxonomies for tagging retail banking transactions by proposing an unsupervised framework that combines keyword extraction, topic modeling, and instruction-based LLM post-processing. It introduces zero-shot prompting for taxonomy expansion and demonstrates the approach on a private dataset, producing 58 Food and 6 Shopping taxonomies with high coherence (>90%) in qualitative evaluation, and strong expansion performance via commercial LLMs (e.g., GPT-4, Gemini) for parent-node prediction. Key contributions include a full pipeline for automatic taxonomy construction, a novel zero-shot expansion workflow, and comparative evidence that LLM-based expansion can outperform traditional baselines in low-resource settings. The work has practical implications for scalable, adaptable tagging of retail transactions and highlights directions for further improving taxonomy enrichment with embeddings and architectural refinements.

Abstract

This work presents an unsupervised method for automatically constructing and expanding topic taxonomies using instruction-based fine-tuned LLMs (Large Language Models). We apply topic modeling and keyword extraction techniques to create initial topic taxonomies and LLMs to post-process the resulting terms and create a hierarchy. To expand an existing taxonomy with new terms, we use zero-shot prompting to find out where to add new nodes, which, to our knowledge, is the first work to present such an approach to taxonomy tasks. We use the resulting taxonomies to assign tags that characterize merchants from a retail bank dataset. To evaluate our work, we asked 12 volunteers to answer a two-part form in which we first assessed the quality of the taxonomies created and then the tags assigned to merchants based on that taxonomy. The evaluation revealed a coherence rate exceeding 90% for the chosen taxonomies. The taxonomies' expansion with LLMs also showed exciting results for parent node prediction, with an f1-score above 70% in our taxonomies.
Paper Structure (15 sections, 1 figure, 2 tables)