Table of Contents
Fetching ...

CARLoS: Retrieval via Concise Assessment Representation of LoRAs at Scale

Shahar Sarfaty, Adi Haviv, Uri Hacohen, Niva Elkin-Koren, Roi Livni, Amit H. Bermano

TL;DR

CARLoS introduces a metadata-free framework to characterize Low Rank Adapters (LoRAs) using a concise, CLIP-based three-part representation: Semantic Direction, Strength, and Consistency. By indexing a large LoRA corpus with prompts and seeds, CARLoS enables prompt-independent retrieval that better matches user queries than text-based baselines, while providing insights into stability and legal implications. The approach yields strong qualitative and quantitative improvements in retrieval quality, diversity, and user-perceived relevance, and suggests practical utility for IP attribution and platform moderation. Limitations include reliance on CLIP and fixed-scale experiments, with clear avenues for extension to additional backbones and adapter types. Overall, CARLoS offers a scalable, interpretable, and practically impactful framework for organizing and retrieving community-generated LoRAs at scale.

Abstract

The rapid proliferation of generative components, such as LoRAs, has created a vast but unstructured ecosystem. Existing discovery methods depend on unreliable user descriptions or biased popularity metrics, hindering usability. We present CARLoS, a large-scale framework for characterizing LoRAs without requiring additional metadata. Analyzing over 650 LoRAs, we employ them in image generation over a variety of prompts and seeds, as a credible way to assess their behavior. Using CLIP embeddings and their difference to a base-model generation, we concisely define a three-part representation: Directions, defining semantic shift; Strength, quantifying the significance of the effect; and Consistency, quantifying how stable the effect is. Using these representations, we develop an efficient retrieval framework that semantically matches textual queries to relevant LoRAs while filtering overly strong or unstable ones, outperforming textual baselines in automated and human evaluations. While retrieval is our primary focus, the same representation also supports analyses linking Strength and Consistency to legal notions of substantiality and volition, key considerations in copyright, positioning CARLoS as a practical system with broader relevance for LoRA analysis.

CARLoS: Retrieval via Concise Assessment Representation of LoRAs at Scale

TL;DR

CARLoS introduces a metadata-free framework to characterize Low Rank Adapters (LoRAs) using a concise, CLIP-based three-part representation: Semantic Direction, Strength, and Consistency. By indexing a large LoRA corpus with prompts and seeds, CARLoS enables prompt-independent retrieval that better matches user queries than text-based baselines, while providing insights into stability and legal implications. The approach yields strong qualitative and quantitative improvements in retrieval quality, diversity, and user-perceived relevance, and suggests practical utility for IP attribution and platform moderation. Limitations include reliance on CLIP and fixed-scale experiments, with clear avenues for extension to additional backbones and adapter types. Overall, CARLoS offers a scalable, interpretable, and practically impactful framework for organizing and retrieving community-generated LoRAs at scale.

Abstract

The rapid proliferation of generative components, such as LoRAs, has created a vast but unstructured ecosystem. Existing discovery methods depend on unreliable user descriptions or biased popularity metrics, hindering usability. We present CARLoS, a large-scale framework for characterizing LoRAs without requiring additional metadata. Analyzing over 650 LoRAs, we employ them in image generation over a variety of prompts and seeds, as a credible way to assess their behavior. Using CLIP embeddings and their difference to a base-model generation, we concisely define a three-part representation: Directions, defining semantic shift; Strength, quantifying the significance of the effect; and Consistency, quantifying how stable the effect is. Using these representations, we develop an efficient retrieval framework that semantically matches textual queries to relevant LoRAs while filtering overly strong or unstable ones, outperforming textual baselines in automated and human evaluations. While retrieval is our primary focus, the same representation also supports analyses linking Strength and Consistency to legal notions of substantiality and volition, key considerations in copyright, positioning CARLoS as a practical system with broader relevance for LoRA analysis.

Paper Structure

This paper contains 49 sections, 7 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Given a large pool of community generated Low Rank Adapters (LoRAs), our method, CARLoS, concisely represents them according to their influence on generation, and retrieves semantically relevant ones given a retrieval query (left). Since our efficient retrieval (right) is generation based, it finds the LoRAs that are visually similar to the query, outperforming retrieval methods that rely on the name and textual descriptions provided by LoRA creators (middle).
  • Figure 2: CARLoS framework. Given a set of curated LoRAs operating over the SDXL backbone, we represent each one as a three parts vector, used for efficient retrieval (left). To create our concise representation (top), we generate for each LoRA and the vanilla backbone images using $N=280$ prompts and $M=16$ seeds. We measure the semantic difference between the vanilla generation and the LoRAs in CLIP space (CLIP-diff), and store their average as a representative Direction effect, their mean magnitude to represent effect Strength, and their variance as a measure for Consistency. During retrieval (bottom), we measure the average CLIP space difference between a set of $N$ different prompts with and without the retrieval query appended. We then simply retrieve the LoRAs with the most similar Direction vectors, and filter out LoRAs demonstrating above-threshold Strength and under-threshold Consistency.
  • Figure 3: Qualitative retrieval results for CARLoS. Various query modifications are presented, depicting different effect types. The vanilla backbone generated image is on the top left, and its LoRA-modified counterparts are depicted for the top-3 retrieved LoRAs for each query below. Zoomed in viewing recommended.
  • Figure 4: Qualitative comparisons of textual description-based retrieval (bottom rows) to CARLoS (top row). While some effects are sufficiently described in text (e.g., Pixel art) and are therefore retrieved well, more elaborate queries, (such as celestial beings, or futuristic games) are not described well, resorting textual-based retrieval to similar wording as opposed to effects (e.g., clouds, cartoons)
  • Figure 5: Aggregated results of our subjective user study. Participants compared CARLoS against four strong textual retrieval baselines (QWEN3, E5, BGE, GTE) across three criteria. CARLoS was consistently preferred in all categories.
  • ...and 6 more figures