The real world is inherently distributed and interconnected, where no single agent can have full access to the totality of information that is available in a dispersed manner across a multitude of agents. Distributed learning has emerged as a key paradigm to address this challenge, enabling agents to cooperatively solve global problems of interest through information exchange. While centralized methods rely on a fusion center that aggregates data and manages the entire computation, decentralized methods allow agents to operate collaboratively by relying mainly on neighbor-to-neighbor communication. However, distributed learning continues to face some important theoretical and practical challenges. One important challenge is their performance in the face of adversarial attacks, where imperceptible perturbations can degrade performance. This dissertation investigates the performance of distributed learning strategies in both standard and adversarial environments, and reveals some interesting properties in relation to their stability guarantees and robustness.
The first part of the dissertation examines the standard (clean) setting, providing a systematic comparison of the performance of centralized and decentralized learning strategies. Specifically, this part examines the performance of stochastic gradient algorithms for distributed learning in relation to their behavior around local minima in nonconvex environments. Previous works on learning by stand-alone agents have noticed that convergence toward flat local minima tends to enhance the generalization ability of learning algorithms. This work discovers three interesting results. First, it shows that decentralized learning strategies favor convergence toward flatter minima relative to the centralized solution. Second, in decentralized methods, the consensus strategy has a worse excess-risk performance than diffusion, giving it a better chance of escaping from local minima and favoring flatter minima. Third, and importantly, the ultimate classification accuracy is not solely dependent on the flatness of the local minimum but also on how well a learning algorithm can approach that minimum. In other words, the classification accuracy is a function of both flatness and optimization performance. In this regard, since diffusion has a lower excess-risk than consensus, when both algorithms are trained starting from random initial points, diffusion enhances the classification accuracy. The first half of the dissertation examines the interplay between the two measures of flatness and optimization error closely. One important conclusion is that decentralized strategies deliver, in general, enhanced classification accuracy because they strike a more favorable balance between flatness and optimization performance compared to the centralized solution.
The second part of the dissertation focuses on distributed learning in adversarial environments and is divided into two components. First, motivated by the extensive study of adversarial training as a means to enhance model robustness, this part extends this framework to distributed settings over graphs, where individual agents are subjected to perturbations of varied strength levels across space. It is expected that interactions by linked agents, and the heterogeneity of the attack models that are possible over the graph, can help enhance robustness in view of the coordination power of the group. Using a min-max formulation of distributed learning, the dissertation develops a distributed adversarial training framework for multi-agent systems and establishes convergence guarantees for strongly-convex, convex, and non-convex environments. Next, this second part of the dissertation analyzes the optimization bias of different distributed adversarial training algorithms–including centralized and decentralized strategies. Previous studies have highlighted the importance of model flatness in determining robustness. To this end, this work develops a general theoretical framework to study the escaping efficiency of these algorithms from local minima, which is closely related to the flatness of the resulting models. It is shown that when the perturbation bound is sufficiently small (i.e., when the attack strength is relatively mild) and a large batch size is used, decentralized adversarial training algorithms are guaranteed to escape faster from local minima than the centralized strategy, thereby favoring flatter minima. The results highlight the potential of decentralized strategies to enhance the robustness of models in distributed settings.