Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task

Gabriel Lino Garcia; Pedro Henrique Paiola; Luis Henrique Morelli; Giovani Candido; Arnaldo Cândido Júnior; Danilo Samuel Jodas; Luis C. S. Afonso; Ivan Rizzo Guilherme; Bruno Elias Penteado; João Paulo Papa

Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task

Gabriel Lino Garcia, Pedro Henrique Paiola, Luis Henrique Morelli, Giovani Candido, Arnaldo Cândido Júnior, Danilo Samuel Jodas, Luis C. S. Afonso, Ivan Rizzo Guilherme, Bruno Elias Penteado, João Paulo Papa

TL;DR

This paper presents Bode, a Portuguese instruction-following LLM fine-tuned from LLaMA 2 using LoRA, available in $7B$ and $13B$ sizes. It evaluates zero-shot and in-context learning on sentiment analysis, news classification, and fake news detection using three Portuguese datasets, comparing against multiple baseline models. Bode demonstrates strong performance, particularly in the $13B$ variant with Cabrita-derived fine-tuning, while also highlighting issues such as catastrophic forgetting and varying task-specific gains. The work contributes a practical, open-resource Portuguese LLM and emphasizes the benefits of monolingual fine-tuning and prompt-based evaluation for low-resource languages, with potential impact on Portuguese NLP research and applications.

Abstract

Large Language Models (LLMs) are increasingly bringing advances to Natural Language Processing. However, low-resource languages, those lacking extensive prominence in datasets for various NLP tasks, or where existing datasets are not as substantial, such as Portuguese, already obtain several benefits from LLMs, but not to the same extent. LLMs trained on multilingual datasets normally struggle to respond to prompts in Portuguese satisfactorily, presenting, for example, code switching in their responses. This work proposes a fine-tuned LLaMA 2-based model for Portuguese prompts named Bode in two versions: 7B and 13B. We evaluate the performance of this model in classification tasks using the zero-shot approach with in-context learning, and compare it with other LLMs. Our main contribution is to bring an LLM with satisfactory results in the Portuguese language, as well as to provide a model that is free for research or commercial purposes.

Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task

TL;DR

This paper presents Bode, a Portuguese instruction-following LLM fine-tuned from LLaMA 2 using LoRA, available in

and

sizes. It evaluates zero-shot and in-context learning on sentiment analysis, news classification, and fake news detection using three Portuguese datasets, comparing against multiple baseline models. Bode demonstrates strong performance, particularly in the

variant with Cabrita-derived fine-tuning, while also highlighting issues such as catastrophic forgetting and varying task-specific gains. The work contributes a practical, open-resource Portuguese LLM and emphasizes the benefits of monolingual fine-tuning and prompt-based evaluation for low-resource languages, with potential impact on Portuguese NLP research and applications.

Abstract

Paper Structure (20 sections, 3 figures, 1 table)

This paper contains 20 sections, 3 figures, 1 table.

Introduction
Theoretical Background
LLaMA Models
Mistral 7B
Falcon 7B
Low-Rank Adaptation
Related Works
Sabiá
Cabrita
Proposed Model
Methodology
Experimental Setup
Zero-Shot Learning
In-Context Learning
Prompt Engineering
...and 5 more sections

Figures (3)

Figure 1: Overview of the proposed method.
Figure 2: Accuracy values for each dataset and LLM adopted in the experiments.
Figure 3: F1-score values for each dataset and LLM adopted in the experiments.

Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task

TL;DR

Abstract

Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task

Authors

TL;DR

Abstract

Table of Contents

Figures (3)