Ainur Zhaikhan 2025

Abstract of PhD Dissertation
Institute of Electrical and Micro Engineering, Ecole Polytechnique Federale de Lausanne, EPFL, October 2025
Advisor: Prof. Ali H. Sayed

Reinforcement Learning by Networked Agents

Ainur Zhaikhan, Ecole Polytechnique Federale de Lausanne, EPFL


Multi-agent reinforcement learning (MARL) has emerged as a compelling framework for modeling the collaborative behavior of autonomous agents operating in interconnected systems. The potential for agents to collectively achieve goals that are infeasible for any single unit makes MARL a powerful tool across a wide range of domains. However, the multi-agent setting introduces distinct challenges that require specialized solutions. This thesis addresses two fundamental challenges in MARL: (i) effective deep exploration, and (ii) global state estimation under partial observability. To this end, we leverage the networked structure of agents and their communication capabilities to develop decentralized learning algorithms that facilitate robust and scalable collaboration under uncertain conditions.

First, we propose a novel, counting-free deep exploration algorithm for MARL that guarantees all state-action pairs are visited infinitely often. Deep exploration is essential for avoiding suboptimal learning in environments with sparse or deceptively structured rewards. Our method distributes an ensemble of value estimates across the network of agents and uses statistical variance to guide exploration. The count-free nature of the design makes it suitable for large or continuous state spaces. Theoretical guarantees are established for sufficient exploration, and the approach is validated through extensive simulations.

Second, we address the challenge of global state estimation in partially observable environments, where agents have access only to local, incomplete observations. Individually, these observations are insufficient to recover the global state; however, through local communication, agents can collaboratively estimate it. We explore two social learning-based strategies to tackle this issue: standard and adaptive social learning. Standard social learning does not impose constraints on state dynamics but introduces a two-time-scale learning structure. We provide theoretical analysis showing that MARL combined with this approach achieves ε-optimality with respect to the fully observable baseline. To overcome the limitation of two-time-scale learning, we introduce an adaptive social learning method that enables single-time-scale integration of state estimation and rein-forcement learning, assuming slowly evolving state dynamics. Under appropriate choices of the adaptation and learning parameters, we show that the proposed method also achieves ε-optimal performance. Both methods are fully decentralized, rely solely on local communication, and are supported by rigorous convergence guarantees. Empirical evaluations further confirm the effectiveness of both approaches.