Achieving Maximum Utilization in Optimal Time for Learning or Convergence in the Kolkata Paise Restaurant Problem

Aniruddha Biswas; Antika Sinha; Bikas K. Chakrabarti

Achieving Maximum Utilization in Optimal Time for Learning or Convergence in the Kolkata Paise Restaurant Problem

Aniruddha Biswas, Antika Sinha, Bikas K. Chakrabarti

TL;DR

The paper investigates how to maximize resource utilization in the Kolkata Paise Restaurant problem under decentralized learning with limited memory. Using Monte Carlo simulations, it compares Crowd Avoiding (CA) and Greedy Crowd Avoiding (GCA) strategies, showing CA yields a robust yet imperfect utilization of about $f\approx0.80$ within a short, $N$-independent convergence time ($\tau=O(10)$), while GCA can achieve full utilization ($f=1$) but only with convergence time scaling linearly with system size ($\tau= eN$). The key result is that, for large $N$, full utilization cannot be reached in finite time with single-step memory, establishing a trade-off between speed of learning and efficiency. This has implications for designing decentralized allocation mechanisms where rapid convergence may be prioritized over achieving maximum possible utilization.

Abstract

The objective of the KPR agents are to learn themselves in the minimum (learning) time to have maximum success or utilization probability ($f$). A dictator can easily solve the problem with $f = 1$ in no time, by asking every one to form a queue and go to the respective restaurant, resulting in no fluctuation and full utilization from the first day (convergence time $τ= 0$). It has already been shown that if each agent chooses randomly the restaurants, $f = 1 - e^{-1} \simeq 0.63$ (where $e \simeq 2.718$ denotes the Euler number) in zero time ($τ= 0$). With the only available information about yesterday's crowd size in the restaurant visited by the agent (as assumed for the rest of the strategies studied here), the crowd avoiding (CA) strategies can give higher values of $f$ but also of $τ$. Several numerical studies of modified learning strategies actually indicated increased value of $f = 1 - α$ for $α\to 0$, with $τ\sim 1/α$. We show here using Monte Carlo technique, a modified Greedy Crowd Avoiding (GCA) Strategy can assure full utilization ($f = 1$) in convergence time $τ\simeq eN$, with of course non-zero probability for an even larger convergence time. All these observations suggest that the strategies with single step memory of the individuals can never collectively achieve full utilization ($f = 1$) in finite convergence time and perhaps the maximum possible utilization that can be achieved is about eighty percent ($f \simeq 0.80$) in an optimal time $τ$ of order ten, even when $N$ the number of customers or of the restaurants goes to infinity.

Achieving Maximum Utilization in Optimal Time for Learning or Convergence in the Kolkata Paise Restaurant Problem

TL;DR

within a short,

-independent convergence time (

), while GCA can achieve full utilization (

) but only with convergence time scaling linearly with system size (

). The key result is that, for large

, full utilization cannot be reached in finite time with single-step memory, establishing a trade-off between speed of learning and efficiency. This has implications for designing decentralized allocation mechanisms where rapid convergence may be prioritized over achieving maximum possible utilization.

Abstract

The objective of the KPR agents are to learn themselves in the minimum (learning) time to have maximum success or utilization probability (

). A dictator can easily solve the problem with

in no time, by asking every one to form a queue and go to the respective restaurant, resulting in no fluctuation and full utilization from the first day (convergence time

). It has already been shown that if each agent chooses randomly the restaurants,

(where

denotes the Euler number) in zero time (

). With the only available information about yesterday's crowd size in the restaurant visited by the agent (as assumed for the rest of the strategies studied here), the crowd avoiding (CA) strategies can give higher values of

but also of

. Several numerical studies of modified learning strategies actually indicated increased value of

for

, with

. We show here using Monte Carlo technique, a modified Greedy Crowd Avoiding (GCA) Strategy can assure full utilization (

) in convergence time

, with of course non-zero probability for an even larger convergence time. All these observations suggest that the strategies with single step memory of the individuals can never collectively achieve full utilization (

) in finite convergence time and perhaps the maximum possible utilization that can be achieved is about eighty percent (

) in an optimal time

of order ten, even when

the number of customers or of the restaurants goes to infinity.

Paper Structure (3 sections, 6 figures)

This paper contains 3 sections, 6 figures.

Introduction
Proposed learning algorithm and Monte Carlo results
Summary & Discussion

Figures (6)

Figure 1: Fraction $f$ of people getting food (same as the utilization fraction $f$ of the restaurants, getting a customer) as function of (learning) time or day $t$, for a typical Monte Carlo run of CA case with $N = 1600$, the number of restaurants or of prospective customers. The convergence time $\tau$ is also indicated, where $f$ saturates to an average value $f_s$ and fluctuates around that value. The inset shows the dependence of the saturation value $f_s \simeq 0.80$ of the utilization fraction $f$ as function of $N$ (up to $N = 51200$) and extrapolated to $N \to \infty$.
Figure 2: Convergence time $\tau \simeq 8$ as function of $N$ in the CA case for $N =$ 100, 200, 400, 800, ... up to $N = 51200$. Inset shows that $\tau$ seems to remain finite (less than 10) even when extrapolated up to $N \to \infty$.
Figure 3: Utilization fraction $f$ as function of (learning) time or day $t$, for a typical Monte Carlo run of the GCA case with $N = 1600$. The convergence time $\tau$ is also indicated, where $f$ value saturates to unity. The inset shows that this full utilization ($f_s =1$) occurs for all values of $N$ (data here are up to $N = 6400$).
Figure 4: Convergence time $\tau$ as function of $N$ in the GCA case for $N$ up to 6400. Inset shows that $\tau = eN$, where $e \simeq 2.718$ denotes the Euler number, seems to fit well in the $N \to \infty$ limit.
Figure 5: A typical time evolution of the cumulative success rate of the individual agents, employing GCA learning, starting from the first day ($t=1$) until the convergence time ($t = \tau$) when every one just gets food. The "World Lines" are for 50 agents and 50 restaurants ($N = 50$). As expected, with learning in GCA, this dispersion (among the agents) in the cumulative success rate start decreasing as the learning time $t$$(< \tau$) grows. For example, if in the first 3 steps an agent does not get food, next two times gets food, then that agent's cumulative % successes are 0 at times $t = 1,2,3$, then $25\%$ at time $t = 4$ and $40\%$ at time $t = 5$). We stop these "World Lines" at $t = \tau$ and find the eventual dispersion in cumulative success to be much less than expected. We show (in red) the "World Line" of a typical agent, who had initial bad luck of successive failures eventually ending up in about 90% cumulative success. It is interesting to note that these "World Lines" of the cumulative success rates of the agents tend to cross each other before the convergence time $\tau$ (mostly within the time range $0.2 \tau < t < 0.1 \tau$).
...and 1 more figures

Achieving Maximum Utilization in Optimal Time for Learning or Convergence in the Kolkata Paise Restaurant Problem

TL;DR

Abstract

Achieving Maximum Utilization in Optimal Time for Learning or Convergence in the Kolkata Paise Restaurant Problem

Authors

TL;DR

Abstract

Table of Contents

Figures (6)