Table of Contents
Fetching ...

Let the Agent Steer: Closed-Loop Ranking Optimization via Influence Exchange

Yin Cheng, Liao Zhou, Xiyu Liang, Dihao Luo, Tewei Lee, Kailun Zheng, Weiwei Zhang, Mingchen Cai, Jian Dong, Andy Zhang

Abstract

Recommendation ranking is fundamentally an influence allocation problem: a sorting formula distributes ranking influence among competing factors, and the business outcome depends on finding the optimal "exchange rates" among them. However, offline proxy metrics systematically misjudge how influence reallocation translates to online impact, with asymmetric bias across metrics that a single calibration factor cannot correct. We present Sortify, the first fully autonomous LLM-driven ranking optimization agent deployed in a large-scale production recommendation system. The agent reframes ranking optimization as continuous influence exchange, closing the full loop from diagnosis to parameter deployment without human intervention. It addresses structural problems through three mechanisms: (1) a dual-channel framework grounded in Savage's Subjective Expected Utility (SEU) that decouples offline-online transfer correction (Belief channel) from constraint penalty adjustment (Preference channel); (2) an LLM meta-controller operating on framework-level parameters rather than low-level search variables; (3) a persistent Memory DB with 7 relational tables for cross-round learning. Its core metric, Influence Share, provides a decomposable measure where all factor contributions sum to exactly 100%. Sortify has been deployed across two markets. In Country A, the agent pushed GMV from -3.6% to +9.2% within 7 rounds with peak orders reaching +12.5%. In Country B, a cold-start deployment achieved +4.15% GMV/UU and +3.58% Ads Revenue in a 7-day A/B test, leading to full production rollout.

Let the Agent Steer: Closed-Loop Ranking Optimization via Influence Exchange

Abstract

Recommendation ranking is fundamentally an influence allocation problem: a sorting formula distributes ranking influence among competing factors, and the business outcome depends on finding the optimal "exchange rates" among them. However, offline proxy metrics systematically misjudge how influence reallocation translates to online impact, with asymmetric bias across metrics that a single calibration factor cannot correct. We present Sortify, the first fully autonomous LLM-driven ranking optimization agent deployed in a large-scale production recommendation system. The agent reframes ranking optimization as continuous influence exchange, closing the full loop from diagnosis to parameter deployment without human intervention. It addresses structural problems through three mechanisms: (1) a dual-channel framework grounded in Savage's Subjective Expected Utility (SEU) that decouples offline-online transfer correction (Belief channel) from constraint penalty adjustment (Preference channel); (2) an LLM meta-controller operating on framework-level parameters rather than low-level search variables; (3) a persistent Memory DB with 7 relational tables for cross-round learning. Its core metric, Influence Share, provides a decomposable measure where all factor contributions sum to exactly 100%. Sortify has been deployed across two markets. In Country A, the agent pushed GMV from -3.6% to +9.2% within 7 rounds with peak orders reaching +12.5%. In Country B, a cold-start deployment achieved +4.15% GMV/UU and +3.58% Ads Revenue in a 7-day A/B test, leading to full production rollout.

Paper Structure

This paper contains 97 sections, 17 equations, 12 figures, 13 tables.

Figures (12)

  • Figure 1: Three-panel overview: (a) dual-channel architecture diagram, (b) online GMV/Orders uplift trend across rounds, (c) LLM correction convergence curve
  • Figure 2: Old vs. New Paradigm: Left---manual optimization loop (offline search, deploy, human judgment, discard state, repeat). Right---Sortify closed loop (offline search, deploy, auto-calibrate via dual channels, accumulate in Memory DB, repeat with improved priors). Highlights the structural differences: stateless vs. persistent, single-axis vs. dual-channel, manual vs. LLM-orchestrated.
  • Figure 3: Three-layer architecture diagram. Layer 1 (Human/Config) feeds objectives and constraints to Layer 2 (LLM + Algorithm), which contains the Belief channel, Preference channel, and LLM meta-controller. Layer 2 outputs calibrated target_range and penalty_weight to Layer 3 (Optuna TPE Search, 5000 trials $\times$ 25 workers). Layer 3 produces best parameters $\to$ Redis $\to$ online A/B $\to$ Memory DB $\to$ back to Layer 2.
  • Figure 4: Flow diagram showing: item pairs $\to$ pairwise score differences $\to$ per-factor share (sum=1) $\to$ rank-weighted aggregation $\to$$I_\text{gmv}$, $I_\text{order}$, $I_\text{ecpm\_term}$. Below: 7 parameters map into the sorting formula, changing factor contributions.
  • Figure 5: Two-axis diagram. Horizontal axis: Belief (target_range = position of constraint boundary). Vertical axis: Preference (penalty_weight = hardness of constraint boundary). Arrows show LMS moving along Belief axis continuously, LLM making discrete jumps along Belief axis, and violation pressure moving along Preference axis. The two axes are orthogonal --- corrections on one do not affect the other.
  • ...and 7 more figures