The abundance of data and proliferation of computational resources are leading to a significant shift in paradigm towards data-driven engineering design. While past practice has often relied on physical models or approximations thereof, it is common nowadays to rely on datasets that represent the behavior of a complex system, rather than employ an explicit mathematical model for it.
These datasets can arise from a multitude of sources. For example, data can be generated from mobile devices, or from sensors scattered throughout “smart cities” and “smart grids”, or even from vehicles on the road. A common feature of such datasets is their heterogeneity due to variations in data distributions on a local level. For example, variations in regional dialects within a country lead to datasets that can impact the training and performance of speech recognition models differently. Likewise, preferences by users vary across different regions in the world and these affect the training and performance of recommender systems. In a similar vein, regional differences in weather patterns, power usage, and traffic patterns affect the behavior and performance of many other monitoring systems.
Training a single model on heterogeneous data generally leads to poor sample efficiency and performance, resulting in models that perform “optimally” on average, but can yield poor performance on any given local dataset. This fact has sparked significant research activity over recent years on the topics of multitask and meta-learning, where the purpose is to extract translational information from heterogeneous data sources, while allowing for local variability. In this tutorial, we will present a unifying and up-to-date overview of multitask and meta-learning with a focus on learning from streaming data in federated and decentralized architectures.