In this course, students learn to master tools, algorithms, and core concepts related to inference from data, data analysis, and adaptation and learning theories. Emphasis is on the theoretical underpinnings and statistical limits of learning theory. In particular, the course covers topics related to optimal inference, estimation theory, regularization methods, proximal methods, online and batch methods, stochastic learning, generalization and statistical learning theories, Bayes and naive classifiers, nearest-neighbor rules, self-organizing maps, decision trees, logistic regression, discriminant analysis, Perceptron, support vector machines, kernel methods, bagging, boosting, random forests, cross-validation, and principal component analysis. Project themes selected by students in consultation with the instructor.


  1. A. H. Sayed, Adaptation and Learning, lecture notes by the instructor, 2015.
  2. A. H. Sayed, Adaptive Filters, Wiley, NY, 2008.


Part A: Background Material

  • Optimal inference
  • Bayesian inference
  • Maximum likelihood, Expectation-maximization
  • Mixture models
  • Regression analysis, Data fusion
  • Least-squares problems
  • Convex functions, L2 regularization, L1 regularization
  • Subgradients, proximal operators

Part B: Stochastic Learning

  • Batch learning
  • Stochastic gradient learning
  • Stochastic subgradient learning
  • Stochastic proximal learning
  • Variance-reduced stochastic learning

Part C: Classification and Clustering

  • Naive Bayes
  • NN rule
  • k-means clustering
  • Self-organizing maps
  • Decision trees

Part D: Generalization and Learning

  • Generalization theory
  • Logistic regression
  • Discriminant analysis
  • The Perceptron
  • Support vector machines
  • Kernel-based learning
  • Bagging and boosting
  • Principal component analysis


The course deals with the topic of information processing over graphs. It covers results and techniques that relate to the analysis and design of networks that are able to solve optimization, adaptation, and learning problems in an efficient and distributed manner through localized interactions among their agents. The treatment covers three intertwined topics: (a) how to perform distributed optimization over networks; (b) how to perform distributed adaptation over networks; and (c) how to perform distributed learning over networks. In these three domains, the course examines and compares the advantages and limitations of non-cooperative, centralized, and distributed solutions. There are many good reasons for the peaked interest in distributed implementations, especially in this day and age when the word “network” has become commonplace whether one is referring to social networks, power networks, transportation networks, biological networks, or other types of networks. Some of these reasons have to do with the benefits of cooperation in terms of improved performance and improved robustness and resilience to failure. Other reasons deal with privacy and secrecy considerations where agents may not be comfortable sharing their data with remote fusion centers. In other situations, the data may already be available in dispersed locations, as happens with cloud computing. One may also be interested in learning and extracting information through data mining from Big Data sets. The course devotes some good effort towards quantifying the limits of performance of distributed solutions and towards discussing design procedures that can realize their potential more fully. The presentation adopts a useful statistical perspective and derives tight performance results that elucidate the mean-square stability, convergence, and steady-state behavior of the learning networks. The course also illustrates how distributed processing over graphs gives rise to some revealing phenomena due to the coupling effect among the agents. The course overviews such phenomena in the context of adaptive networks and considers examples related to distributed sensing, intrusion detection, distributed estimation, online adaptation, clustering, network system theory, and machine learning applications.


  1. A. H. Sayed, “Adaptation, learning, and optimization over networks,” Foundations and Trends in Machine Learningvol. 7, issue 4-5, pp. 311-801, NOW Publishers, July 2014.  [Main Text]
  2. A. H. Sayed, “Adaptive networks,” Proc. IEEE, vol. 102, no. 4, pp. 460-497, April 2014. [Proceedings Article]
  3. A. H. Sayed et al., “Diffusion strategies for adaptation and learning over networks,” IEEE Signal Processing Magazinevol. 30, no. 3, pp. 155-171, May 2013.  [Magazine Article] 
  4. A. H. Sayed, “Diffusion adaptation over networks,” in Academic Press Library in Signal Processingvol. 3, pp. 323-454, Academic Press, Elsevier 2014. [Book Chapter]


Part A: Background Material

  • Linear Algebra and Matrix Theory Results
  • Complex Gradients and Complex Hessian Matrices
  • Convexity, Strict Convexity, and Strong Convexity
  • Mean-Value Theorems, Lipschitz Conditions

Part B: Single-Agent Adaptation and Learning

  • Single-Agent Optimization
  • Stochastic-Gradient Optimization
  • Convergence and Stability Properties
  • Mean-Square-Error Performance

Part C: Centralized Adaptation and Learning

  • Batch and Centralized Processing
  • Convergence, Stability, and Performance
  • Comparison to Single-Agent Processing

Part D: Multi-Agent Network Model

  • Graph Properties. Connected and Strongly-Connected Networks
  • Multi-Agent Inference Strategies
  • Limit Point and Pareto Optimality
  • Evolution of Network Dynamics

Part E: Multi-Agent Network Stability and Performance

  • Stability of Network Dynamics
  • Long-Term Error Dynamics
  • Performance of Multi-Agent Networks
  • Benefits of Cooperation
  • Role of Informed Agents
  • Adaptive Combination Strategies
  • Gossip and Asynchronous Strategies
  • Constrained Optimization
  • Proximal Strategies
  • ADMM Strategies
  • Clustering


  • Lecture #1: Motivation and Examples (slides)
  • Lecture #2: Complex Gradient Vectors (slides)
  • Lecture #3: Complex Hessian Matrices (slides)
  • Lecture #4: Convex Functions (slides)
  • Lecture #5: Logistic Regression (slides)
  • Lecture #6: Mean-Value Theorems (slides)
  • Lecture #7: Lipschitz Conditions (slides)
  • Lecture #8: Useful Matrix Results (slides)
  • Lecture #9: Optimization by Single Agents (slides)
  • Lecture #10: Stochastic Optimization by Single Agents (slides)
  • Lecture #11: Stability and Long-Term Dynamics (slides)
  • Lecture #12: Performance by Single Agents (slides)
  • Lecture #13: Centralized Adaptation and Learning (slides)
  • Lecture #14: Multi-Agent Network Model (slides)
  • Lecture #15: Multi-Agent Distributed Strategies (slides)
  • Lecture #16: Evolution of Multi-Agent Networks (slides)
  • Lecture #17: Stability of Multi-Agent Networks (slides)
  • Lecture #18: Mean-Error Network Stability (slides)
  • Lecture #19: Long-Term Network Dynamics (slides)
  • Lecture #20: Performance of Multi-Agent Networks, I (slides)
  • Lecture #21: Performance of Multi-Agent Networks, II (slides)
  • Lecture #22: Benefits of Cooperation (slides)
  • Lecture #23: Role of Informed Agents (slides)
  • Lecture #24: Combination Policies (slides)
  • Lecture #25: Extensions (slides)



In this course, students learn to master tools, algorithms, and core concepts related to the analysis and design of adaptive filters. Emphasis is on the theoretical underpinnings and statistical limits of performance. In particular, the course covers topics related to optimal inference, estimation theory, Wiener and Kalman filtering, stochastic gradient algorithms, mean-square-error performance, transient performance, tracking performance, least-squares methods, recursive least-squares, array algorithms, order-recursive relations, and lattice filters. Applications include channel estimation, linear and decision-feedback channel equalization, echo and noise cancellation, and beamforming designs.


  1. A. H. Sayed, Adaptive Filters, Wiley, NY, 2008.
  2. A. H. Sayed, Fundamentals of Adaptive Filtering, Wiley, NY, 2003.



Part A: Background Material

  • Optimal MSE estimation
  • Linear MSE estimation
  • Normal equations
  • Constrained estimation

Part B: Filters and Recursive Estimation

  • Wiener filters
  • Kalman filters
  • Steepest-descent algorithms
  • Stochastic gradient algorithms

Part C: Performance Analysis

  • MSE performance
  • Transient performance
  • Stability and convergence rates
  • Tracking performance

Part D: Least-Squares Designs

  • Least-squares methods
  • Recursive least-squares
  • Array algorithms
  • Fast fixed-order algorithms
  • Lattice filters


  • Lecture #1: MSE Estimation: Scalar Case (slides)
  • Lecture #2: MSE Estimation: Vector Case (slides)
  • Lecture #3: Linear Estimation (slides)
  • Lecture #4: Normal Equations and Design Problems (slides)
  • Lecture #5: Linear Models and Applications (slides)
  • Lecture #6: Constrained Estimation (slides)
  • Lecture #7: Kalman Filtering  (slides)
  • Lecture #8: Wiener Filtering (slides)
  • Lecture #9: Steepest-Descent Algorithm (slides)
  • Lecture #10: Stochastic Gradient Algorithms (slides)
  • Lecture #11: MSE Performance (slides)
  • Lecture #12: Tracking Performance (slides)
  • Lecture #13: Transient Performance – Part I (slides)
  • Lecture #14: Transient Performance – Part II (slides)
  • Lecture #15: Least-Squares Methods  (slides)
  • Lecture #16: Recursive Least-Squares (slides)
  • Lecture #17: Unitary Transformations (slides)
  • Lecture #18: Array Algorithms (slides)  (slides)
  • Lecture #19: Order and Time-Update Relations (slides)
  • Lecture #20: Order Recursive Least-Squares (slides)
  • Lecture #21: Lattice Filters  (slides)