Table of Contents
Fetching ...

ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs)

Tosin Adewumi, Lama Alkhaled, Claudia Buck, Sergio Hernandez, Saga Brilioth, Mkpe Kekung, Yelvin Ragimov, Elisa Barney

TL;DR

ProCoT offers a probing chain-of-thought framework to engage students with Large Language Models while mitigating cheating and enhancing critical writing. By requiring grounding in peer-reviewed references and leveraging self-regulated learning, the method yields more evidence-based student outputs and exposes the limitations of LLMs in providing verifiable citations. Across two university cases, ProCoT demonstrated potential to stimulate critical thinking and reduce reliance on unvalidated AI content, while highlighting the need for robust evaluation strategies as LLMs evolve. The work suggests practical implications for educators to harness AI as a tool for learning rather than a source of cheating, with broader applicability across writing-intensive domains.

Abstract

We introduce a novel writing method called Probing Chain-of-Thought (ProCoT), which potentially prevents students from cheating using a Large Language Model (LLM), such as ChatGPT, while enhancing their active learning. LLMs have disrupted education and many other fields. For fear of students cheating, many have resorted to banning their use. These LLMs are also known for hallucinations. We conduct studies with ProCoT in two different courses with 65 students. The students in each course were asked to prompt an LLM of their choice with one question from a set of four and required to affirm or refute statements in the LLM output by using peer-reviewed references. The results show two things: (1) ProCoT stimulates creative/critical thinking and writing of students through engagement with LLMs when we compare the LLM-only output to ProCoT output and (2) ProCoT can prevent cheating because of clear limitations in existing LLMs, particularly ChatGPT, when we compare students' ProCoT output to LLM ProCoT output. We also discover that most students prefer to give answers in fewer words than LLMs, which are typically verbose. The average word counts for students in the first course, ChatGPT (v3.5), and Phind (v8) are 208, 391 and 383, respectively.

ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs)

TL;DR

ProCoT offers a probing chain-of-thought framework to engage students with Large Language Models while mitigating cheating and enhancing critical writing. By requiring grounding in peer-reviewed references and leveraging self-regulated learning, the method yields more evidence-based student outputs and exposes the limitations of LLMs in providing verifiable citations. Across two university cases, ProCoT demonstrated potential to stimulate critical thinking and reduce reliance on unvalidated AI content, while highlighting the need for robust evaluation strategies as LLMs evolve. The work suggests practical implications for educators to harness AI as a tool for learning rather than a source of cheating, with broader applicability across writing-intensive domains.

Abstract

We introduce a novel writing method called Probing Chain-of-Thought (ProCoT), which potentially prevents students from cheating using a Large Language Model (LLM), such as ChatGPT, while enhancing their active learning. LLMs have disrupted education and many other fields. For fear of students cheating, many have resorted to banning their use. These LLMs are also known for hallucinations. We conduct studies with ProCoT in two different courses with 65 students. The students in each course were asked to prompt an LLM of their choice with one question from a set of four and required to affirm or refute statements in the LLM output by using peer-reviewed references. The results show two things: (1) ProCoT stimulates creative/critical thinking and writing of students through engagement with LLMs when we compare the LLM-only output to ProCoT output and (2) ProCoT can prevent cheating because of clear limitations in existing LLMs, particularly ChatGPT, when we compare students' ProCoT output to LLM ProCoT output. We also discover that most students prefer to give answers in fewer words than LLMs, which are typically verbose. The average word counts for students in the first course, ChatGPT (v3.5), and Phind (v8) are 208, 391 and 383, respectively.
Paper Structure (13 sections, 2 figures)

This paper contains 13 sections, 2 figures.

Figures (2)

  • Figure 1: Case 1: Quantitative plots of procot number of words in student answers.
  • Figure 2: Case 2: Quantitative plots of procot number of words in student answers.