Table of Contents
Fetching ...

Learning SQL from within: integrating database exercises into the database itself

Aristide Grange

TL;DR

The theoretical underpinnings of SQLab's query fingerprinting model, its implementation challenges, and its potential to improve SQL education through game-based learning are described.

Abstract

SQL adventure builder (SQLab) is an open-source framework for creating SQL games that are embedded within the very database they query. Students' answers are evaluated using query fingerprinting, a novel technique that allows for better feedback than traditional SQL online judge systems. Fingerprints act as tokens that are used to unlock messages encrypted in an isolated auxiliary table. These messages may include hints, answer keys, examples, explanations, or narrative elements. They can also contain the problem statement of the next task, which turns them into nodes in a virtual DAG with queries as edges. This makes it possible to design a coherent adventure with a storyline of arbitrary complexity. This paper describes the theoretical underpinnings of SQLab's query fingerprinting model, its implementation challenges, and its potential to improve SQL education through game-based learning. The underlying concepts are fully cross-vendor, and support for SQLite, PostgreSQL and MySQL is already available. As a proof of concept, two games, 30 exercises and one mock exam were tested over a three-year period with about 300 students.

Learning SQL from within: integrating database exercises into the database itself

TL;DR

The theoretical underpinnings of SQLab's query fingerprinting model, its implementation challenges, and its potential to improve SQL education through game-based learning are described.

Abstract

SQL adventure builder (SQLab) is an open-source framework for creating SQL games that are embedded within the very database they query. Students' answers are evaluated using query fingerprinting, a novel technique that allows for better feedback than traditional SQL online judge systems. Fingerprints act as tokens that are used to unlock messages encrypted in an isolated auxiliary table. These messages may include hints, answer keys, examples, explanations, or narrative elements. They can also contain the problem statement of the next task, which turns them into nodes in a virtual DAG with queries as edges. This makes it possible to design a coherent adventure with a storyline of arbitrary complexity. This paper describes the theoretical underpinnings of SQLab's query fingerprinting model, its implementation challenges, and its potential to improve SQL education through game-based learning. The underlying concepts are fully cross-vendor, and support for SQLite, PostgreSQL and MySQL is already available. As a proof of concept, two games, 30 exercises and one mock exam were tested over a three-year period with about 300 students.

Paper Structure

This paper contains 62 sections, 1 theorem, 3 equations, 11 figures, 7 tables.

Key Result

theorem 1

In a perfect setting, two queries $Q_1$ and $Q_2$ with the same formula eqn:formulas:basic or eqn:formulas:agg and no derived table in their FROM clause produce the same token if and only if their starred versions $Q_1^\mathtt{*}$ and $Q_2^\mathtt{*}$ result in the same table (ignoring the null colu

Figures (11)

  • Figure 1: An SQLab exercise with a possible answer and its result table. The given formula is meant to be copied and pasted into the outer SELECT clause of the query. The resulting token can be directly used to unlock the correction (one-pass fingerprinting). https://github.com/laowantong/sqlab/blob/main/pub/2024-10. Learning SQL from Within/code.ipynb
  • Figure 2: A query with two-pass fingerprinting. Students are invited to replace the (0.0) placeholder in the formula by the first number of the employees column (1). https://github.com/laowantong/sqlab/blob/main/pub/2024-10. Learning SQL from Within/code.ipynb
  • Figure 3: Fragment of an activity map on SQLab Sessform (MySQL version). A green node is an entry point, i.e., a task whose access token is part of the statement given to the students: it can be either an independant exercise (e.g., 027, 040, 086) or the first episode of an adventure (292). A red node (e.g., 202, 248) is an intermediate point, i.e., a task whose access token is produced by a correct answer to its predecessor; multiple arcs (e.g., the arcs outgoing from 027 or 273) indicate that a task admits multiple solutions. A small blank node represents a specific hint-message triggered by a wrong answer (arrows omitted for clarity). A yellow star (e.g., 282) is an exit point, i.e., a message that congratulates the student for solving an exercise or completing an adventure. Entire map on https://github.com/laowantong/sqlab_sessform/blob/main/mysql/activity_map.pdf.
  • Figure 4: Overview of the token calculation for a SELECT query joining three tables $A$, $B$, and $C$. It starts with the values present in the hash columns of the tables involved in the query. These values are coalesced (1), combined horizontally (2), aggregated by groups (3), and aggregated vertically into a single value (4); the latter may be combined with a value extracted from the other columns of the result table (5), before being finally salted (6).
  • Figure 6: Density of aggregation function outcomes on full-range hashes (sparser is better). Each jittered strip plot represents the outcome of a function applied to 1000 individuals. https://github.com/laowantong/sqlab/blob/main/pub/2024-10. Learning SQL from Within/code.ipynb
  • ...and 6 more figures

Theorems & Definitions (5)

  • Definition 1: Starring
  • Definition 2: Perfect setting
  • theorem 1: Fingerprinting accuracy
  • proof
  • proof