Table of Contents
Fetching ...

The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community

Shachar Don-Yehiya, Leshem Choshen, Omri Abend

TL;DR

This paper addresses the need for open, human-model conversation data to improve alignment and research by introducing the ShareLM Collection and its companion ShareLM plugin. The approach unifies existing open datasets into a living artifact with a shared schema, and provides a browser extension that volunteers can use to contribute conversations across multiple models while preserving user control and privacy. Key contributions include a random-id, privacy-conscious data collection pipeline with 24-hour delayed uploads, a PostgreSQL backend for releases, and a user study validating usability. The work advances open science by lowering barriers to data sharing, enabling ongoing growth of open human-model data and informing model development and personalization in the open-source ecosystem.

Abstract

Human-model conversations provide a window into users' real-world scenarios, behavior, and needs, and thus are a valuable resource for model development and research. While for-profit companies collect user data through the APIs of their models, using it internally to improve their own models, the open source and research community lags behind. We introduce the ShareLM collection, a unified set of human conversations with large language models, and its accompanying plugin, a Web extension for voluntarily contributing user-model conversations. Where few platforms share their chats, the ShareLM plugin adds this functionality, thus, allowing users to share conversations from most platforms. The plugin allows the user to rate their conversations, both at the conversation and the response levels, and delete conversations they prefer to keep private before they ever leave the user's local storage. We release the plugin conversations as part of the ShareLM collection, and call for more community effort in the field of open human-model data. The code, plugin, and data are available.

The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community

TL;DR

This paper addresses the need for open, human-model conversation data to improve alignment and research by introducing the ShareLM Collection and its companion ShareLM plugin. The approach unifies existing open datasets into a living artifact with a shared schema, and provides a browser extension that volunteers can use to contribute conversations across multiple models while preserving user control and privacy. Key contributions include a random-id, privacy-conscious data collection pipeline with 24-hour delayed uploads, a PostgreSQL backend for releases, and a user study validating usability. The work advances open science by lowering barriers to data sharing, enabling ongoing growth of open human-model data and informing model development and personalization in the open-source ecosystem.

Abstract

Human-model conversations provide a window into users' real-world scenarios, behavior, and needs, and thus are a valuable resource for model development and research. While for-profit companies collect user data through the APIs of their models, using it internally to improve their own models, the open source and research community lags behind. We introduce the ShareLM collection, a unified set of human conversations with large language models, and its accompanying plugin, a Web extension for voluntarily contributing user-model conversations. Where few platforms share their chats, the ShareLM plugin adds this functionality, thus, allowing users to share conversations from most platforms. The plugin allows the user to rate their conversations, both at the conversation and the response levels, and delete conversations they prefer to keep private before they ever leave the user's local storage. We release the plugin conversations as part of the ShareLM collection, and call for more community effort in the field of open human-model data. The code, plugin, and data are available.
Paper Structure (18 sections, 5 figures)

This paper contains 18 sections, 5 figures.

Figures (5)

  • Figure 1: The popup window. The user can go over their previous conversations from the last 24 hours and rate them or alternatively choose to delete them if they prefer to keep them private.
  • Figure 2: The recording banner is at the top of the window, indicating that the current chat demo (here ChatUI) is supported by the plugin and that the current conversation is recorded. Clicking on the "Click here to stop sharing" button will pause the conversation's recording.
  • Figure 3: The conversation collection is paused. Clicking on the "Click here to start sharing" button will start the conversation's recording.
  • Figure 4: Providing feedback through the chat interface. The user can rate each response separately, at the time of the interaction.
  • Figure 5: The frequently asked questions section (in the popup window). Provides answers to common questions regarding the plugin.