Decentralized Bandits without Global Clock for Dynamic Matching Market

Abstract

Two-sided matching markets are pervasive in numerous real-world applications, ranging from labor markets to online advertising. A rich line of research has studied the matching bandit problem, where participants learn their preferences through iterative interactions. However, existing works assume a static environment with fixed participants and require synchronized learning, in which all participants start simultaneously and have access to a global clock. In reality, matching markets are inherently dynamic: participants may enter and leave at arbitrary time steps without any global signal, creating coordination challenges. To study the dynamic setting, we first investigate one-sided learning under uncoordinated player ar- rivals, where only the players need to learn their preferences. We propose the Way-SE algorithm, which achieves a regret of $O(\frac{K^2\log T}{\Delta_{min}^2})$, where $K$ is the number of arms, $T$ is the time horizon, and $\Delta_{min}$ is the minimum utility gap. This is done through a distributed exploration mechanism that coordinates exploration implicitly via just local clocks. More importantly, we extend our work to fully decentralized dynamic two-sided learning, where both sides need to learn their preferences, and players arrive or depart arbitrarily. We introduce Way-SE-2S, the first algorithm to achieve sublinear regret $O(\frac{KT^{1-1/K}(\log T)^{2/K}}{\Delta_{min}^2})$ in this challenging environment, without requiring global signals, restrictive preference structures, or observability of the results of competing agents. Our work provides the first theoretical guarantee for stable matching in fully decentralized and uncoordinated bandit markets.

Publication
In ICML
Wentao Zhou 周文韬
Wentao Zhou 周文韬
PhD Student
Xuanzhi Xia 夏暄智
Xuanzhi Xia 夏暄智
PhD Student
Jing Chen 陈婧
Jing Chen 陈婧
Professor