Machine Learning in Network Science

Course Overview

General Info:


Instructor: Fragkiskos Malliaros
Email: fragkiskos.me [at] gmail.com
Office hours: Please send me an email or contact me on Teams

TA: Alexandre Duval
Email: alexandre.duval [at] centralesupelec.fr

Edunao: centralesupelec.edunao.com


Networks (or graphs) have become ubiquitous as data from diverse disciplines can naturally be mapped to graph structures. Social networks, such as academic collaboration networks and interaction networks over online social networking applications are used to represent and model the social ties among individuals. Information networks, including the hyperlink structure of the Web and blog networks, have become crucial mediums for information dissemination, offering an effective way to represent content and navigate through it. A plethora of technological networks, including the Internet, power grids, telephone networks and road networks are an important part of everyday life. The problem of extracting meaningful information from large scale graph data in an efficient and effective way has become crucial and challenging with several important applications and towards this end, graph mining and analysis methods constitute prominent tools. The goal of this course is to present recent and state-of-the-art methods and algorithms for analyzing, mining and learning large-scale graph data, as well as their practical applications in various domains (e.g., the web, social networks, recommender systems).





Schedule and Lectures

The topics of the lectures are subject to change (the following schedule outlines the topics that will be covered in the course). The slides for each lecture will be posted on Edunao just before the start of the class. The due dates of the assignments/project are subject to change.

Week Date Topic Material Assignments/Project
1March 1 ○ Introduction to network science and machine learning on graphs
○ Basic network properties; the random graph and small-world models
Lecture 1A Lecture 1B
2March 8 ○ Power-law degree distribution and the Preferential Attachment model
○ Centrality criteria and link analysis algorithms
○ Practical session
Lecture 2A Lecture 2B
3March 15 ○ Link prediction and graph similarity
○ Graph similarity
○ Practical session (preparation for the Kaggle challenge)
Lecture 3A Lecture 3B Data challenge out
4March 19 ○ Representation learning on graphs; node embedding models
○ Practical session
Lecture 4 Project proposal due on March 21
5March 22 ○ Graph neural networks
○ Practical session
Lecture 5
6March 26 ○ Graph clustering and community detection
○ Practical session
Lecture 6
7April 2 ○ Cascading behavior in networks; Influence maximization
○ Practical session
Lecture 7 Data challenge due on April 4
8April 9Project presentationsProject final report due on April 9
Online poster session or presentations



[March 1] Lecture 1: Introduction; basic network properties; generative models

Part A: Introduction to network science and machine learning on graphs

Introduction to graph mining, network analysis, and machine learning on graphs; administrivia, course structure and overview of the topics that will be covered in the course.

Reading:

Part B: Basic network properties; the random graph and small-world models

Presentation of basic concepts in graph theory, linear algebra and spectral graph theory that will be used throughout the course. Basic network properties: degree distribution, clustering coefficient and shortest path length. The Erdos-Renyi random graph model and its basic properties. Comparison to the properties of real networks. The small-world phenomenon and the small-world model.

Reading: Additional:
  • Networks: An Introduction (Chapters 6, 8 and 9)
  • SVD and Low Rank Matrix Approximations, lecture notes by Tim Roughgarden and Gregory Valiant (Stanford University)
  • Lectures on Spectral Graph Theory, by Fan Chung (Chapter 2: Eigenvalues and the Laplacian of a graph)
  • J. Leskovec and E. Horvitz. Planetary-Scale Views on a Large Instant-Messaging Network. In WWW, 2008
  • Lecture 3, Lecture 4 and Lecture 5 from CSE 258 (background knowledge in data mining and machine learning)
  • Random graphs, lecture notes by Aaron Clauset (CU Boulder)
  • Diameter on d-regular random graphs, lecture notes by Yaron Singer (Harvard University)
  • Networks: An Introduction (Chapter 12)
  • P. Erdos and A. Renyi. On Random Graphs I. Publicationes Mathematicae (6) 290-297, 1959
  • P. Erdos and A. Renyi. On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutato Int. Koezl., 1960
  • D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature 393:440-42, 1998
  • P. S. Dodds, R. Muhamad, D. J. Watts. An Experimental Study of Search in Global Social Networks. Science 301, 2003
  • D. J. Watts, P. S. Dodds, M. E. J. Newman. Identity and Search in Social Networks. Science, 296, 1302-1305, 2002
  • M. E. J. Newman. Models of the Small World: A Review., J. Stat. Physics 2000
  • J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. ACM Symposium on Theory of Computing, 2000
  • L. Backstrom, P. Boldi, M. Rosa, J. Ugander, and S. Vigna. Four Degrees of Separation. ACM Web Science Conference. 2012
  • J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The Anatomy of the Facebook Social Graph. arXiv, 2012

[March 8] Lecture 2: The Preferential Attachment model; centrality criteria

Part A: Power-law degree distribution and the Preferential Attachment model

Power-law degree distribution in real networks. The Preferential Attachment model. Consequences of skewed degree distribution in the robustness of real networks.

Reading: Additional:

Part B: Centrality criteria and link analysis algorithms

Centrality criteria in graphs (degree, closeness, betweenness, eigenvector, Katz). Link analysis ranking algorithms (HITS and PageRank).

Reading: Additional:

[March 15] Lecture 3: Link prediction and graph similarity

Part A: Link prediction

The link prediction task. Node similarity measures. Unsupervised vs. supervised link prediction in networks.

Reading: Additional:

Part B: Graph similarity

Graph similarity. Graph kernels.

Reading: Additional:

[March 19] Lecture 4: Representation learning on graphs

Methods for learning node embeddings on graphs (LINE, DeepWalk and node2vec).

Reading:
Additional:

[March 22] Lecture 5: Graph neural networks

GNN models. Message aggregation strategies. Graph Convolutional Networks. GraphSAGE. Graph Attention Networks. Unsupervised and supervised training. Predictions (node-level, edge-level, graph-level) with GNNs.

Reading: Additional:

[March 26] Lecture 6: Graph clustering and community detection

Community detection in networks. Girvan-Newman algorithm. Modularity and modularity optimization (greedy, spectral, Louvain method). Spectral clustering. Community evaluation criteria. Community detection in directed networks. Overlapping community detection. Community structure of large scale networks.

Reading: Additional:

[April 2] Lecture 7A: Cascading behavior in networks

Cascading behavior. Models of virus and information probagation.

Reading: Additional:

[April 2] Lecture 7B: Influence maximization

Influence maximization in social networks. The Greedy algorithm. Outbreak detection in networks.

Reading: Additional:


Course Structure

Learning objectives

The course aims to introduce students to the field of graph mining and network analysis by:
  • Covering a wide range of topics, methodologies and related applications.
  • Giving the students the opportunity to obtain hands-on experience on dealing with graph data and graph mining tasks.
We expect that by the end of the course, the students will have a thorough understanding of various graph mining and learning tasks, will be able to analyze large-scale graph data as well as to formulate and solve problems that involve graph structures.


Prerequisites

There is no official prerequisite for this course. However, the students are expected to:

  • Have basic knowledge of graph theory and linear algebra.
  • Be familiar with fundamental data mining and machine learning tasks.
  • Be familiar with at least one programming language (e.g., Python or any language of their preference).
During the first lecture, we will review basic concepts in graph theory, linear algebra and machine learning.


Reading material

Most of the material of the course is based on research articles. Some of the topics are also covered by the following books:



Evaluation

The evaluation of the course will be based on the following:

  1. Data challenge this assignment aims to familiarize the students with basic graph mining and analysis tasks through a data challenge.
  2. Project: this will be the main component for the evaluation of the course. The students are expected to form groups of 3-4 people, propose a topic for their project, and submit a final project report (it would also be interesting to organize a poster session at the end of the quarter). Please, read the project section for more details.

The grading will be as follows:

Data challenge (groups of 3-4 students): 40%
Project (groups of 3-4 students): 60%


Academic integrity

All of your work must be your own. Don't copy another student's assignment, in part or in total, and submit it as your own work. Acknowledge and cite source material in your papers or assignments.



Project

Details about the project of the course can be found on piazza.




Resources

Datasets


Software tools

  • NetworkX: Python software package for graph analytics
  • igraph: collection of software packages for graph theory and network analysis (Python, C++ and R)
  • SNAP: high performance system for the analysis of large network (C++ and Python)
  • Gephi: graph visualization and exploration software

Related conferences
Please find below a list of conferences related to the contents of the course (mostly in the field of data mining, social network analysis and the Web). We provide the DBLP website of each venue where you can access the proceedings (papers, tutorials, etc).

Check out the website of each conference (e.g., KDD 2016 ) for more information.