The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community
Shachar Don-Yehiya, Leshem Choshen, Omri Abend
TL;DR
This paper addresses the need for open, human-model conversation data to improve alignment and research by introducing the ShareLM Collection and its companion ShareLM plugin. The approach unifies existing open datasets into a living artifact with a shared schema, and provides a browser extension that volunteers can use to contribute conversations across multiple models while preserving user control and privacy. Key contributions include a random-id, privacy-conscious data collection pipeline with 24-hour delayed uploads, a PostgreSQL backend for releases, and a user study validating usability. The work advances open science by lowering barriers to data sharing, enabling ongoing growth of open human-model data and informing model development and personalization in the open-source ecosystem.
Abstract
Human-model conversations provide a window into users' real-world scenarios, behavior, and needs, and thus are a valuable resource for model development and research. While for-profit companies collect user data through the APIs of their models, using it internally to improve their own models, the open source and research community lags behind. We introduce the ShareLM collection, a unified set of human conversations with large language models, and its accompanying plugin, a Web extension for voluntarily contributing user-model conversations. Where few platforms share their chats, the ShareLM plugin adds this functionality, thus, allowing users to share conversations from most platforms. The plugin allows the user to rate their conversations, both at the conversation and the response levels, and delete conversations they prefer to keep private before they ever leave the user's local storage. We release the plugin conversations as part of the ShareLM collection, and call for more community effort in the field of open human-model data. The code, plugin, and data are available.
