Table of Contents
Fetching ...

Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4

Sondos Mahmoud Bsharat, Aidar Myrzakhan, Zhiqiang Shen

TL;DR

Prompt quality critically shapes LLM outputs, and this work proposes 26 principled instructions to guide prompting across model scales and tasks. The authors validate the approach on the ATLAS benchmark using LLaMA-1/2 variants and GPT-3.5/4, reporting significant boosts in response quality and correctness—especially for larger models. The principles emphasize audience-tailored prompts, incremental prompting, and example-driven design to reduce bias and improve accuracy. This work provides actionable guidance for researchers and developers to craft prompts and argues for integrating principled prompts into standard LLM workflows.

Abstract

This paper introduces 26 guiding principles designed to streamline the process of querying and prompting large language models. Our goal is to simplify the underlying concepts of formulating questions for various scales of large language models, examining their abilities, and enhancing user comprehension on the behaviors of different scales of large language models when feeding into different prompts. Extensive experiments are conducted on LLaMA-1/2 (7B, 13B and 70B), GPT-3.5/4 to verify the effectiveness of the proposed principles on instructions and prompts design. We hope that this work can provide a better guide for researchers working on the prompting of large language models. Project page is available at https://github.com/VILA-Lab/ATLAS.

Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4

TL;DR

Prompt quality critically shapes LLM outputs, and this work proposes 26 principled instructions to guide prompting across model scales and tasks. The authors validate the approach on the ATLAS benchmark using LLaMA-1/2 variants and GPT-3.5/4, reporting significant boosts in response quality and correctness—especially for larger models. The principles emphasize audience-tailored prompts, incremental prompting, and example-driven design to reduce bias and improve accuracy. This work provides actionable guidance for researchers and developers to craft prompts and argues for integrating principled prompts into standard LLM workflows.

Abstract

This paper introduces 26 guiding principles designed to streamline the process of querying and prompting large language models. Our goal is to simplify the underlying concepts of formulating questions for various scales of large language models, examining their abilities, and enhancing user comprehension on the behaviors of different scales of large language models when feeding into different prompts. Extensive experiments are conducted on LLaMA-1/2 (7B, 13B and 70B), GPT-3.5/4 to verify the effectiveness of the proposed principles on instructions and prompts design. We hope that this work can provide a better guide for researchers working on the prompting of large language models. Project page is available at https://github.com/VILA-Lab/ATLAS.
Paper Structure (15 sections, 14 figures, 2 tables)

This paper contains 15 sections, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Illustration example of prompts and corresponding responses before and after applying principles. Left is the original promotes and their responses from GPT-4, right is the principled prompts and the associated responses. Principles 5 and 6 are utilized.
  • Figure 2: Boosting example of LLM response after using the principle 13 on prompts.
  • Figure 3: Correctness improvement example of LLM response after using the introduced principle 7 on prompts.
  • Figure 4: Boosting of LLM response quality after employing the introduced principles on prompts. small-scale indicates the 7B models, medium-scale indicates the 13B models and large-scale indicates the 70B and GPT-3.5/4 models.
  • Figure 6: Relative correctness improvement of LLM response quality after employing the introduced principles on prompts. small-scale indicates the 7B models, medium-scale indicates the 13B models and large-scale indicates the 70B and GPT-3.5/4 models.
  • ...and 9 more figures