Network Science Analytics

Course Overview

General Info:


Instructor: Fragkiskos Malliaros
Email: fragkiskos.me [at] gmail.com
Office hours: Right after class (or send me an email and we will find a good time to meet)

TA: Abdulkadir Çelikkanat
Email: abdcelikkanat [at] gmail.com

Piazza: piazza.com/centralesupelec/winter2020/ngsa/home


Networks (or graphs) have become ubiquitous as data from diverse disciplines can naturally be mapped to graph structures. Social networks, such as academic collaboration networks and interaction networks over online social networking applications are used to represent and model the social ties among individuals. Information networks, including the hyperlink structure of the Web and blog networks, have become crucial mediums for information dissemination, offering an effective way to represent content and navigate through it. A plethora of technological networks, including the Internet, power grids, telephone networks and road networks are an important part of everyday life. The problem of extracting meaningful information from large scale graph data in an efficient and effective way has become crucial and challenging with several important applications and towards this end, graph mining and analysis methods constitute prominent tools. The goal of this course is to present recent and state-of-the-art methods and algorithms for analyzing, mining and learning large-scale graph data, as well as their practical applications in various domains (e.g., the web, social networks, recommender systems).





Schedule and Lectures

The topics of the lectures are subject to change (the following schedule outlines the topics that will be covered in the course). The slides for each lecture will be posted in piazza just before the start of the class. The due dates of the assignments/project are subject to change.

Week Date Topic Material Assignments/Project
1February 7 ○ Introduction to network science and graph mining
○ Graph theory and linear algebra recap; basic network properties
Lecture 1A Lecture 1B
2February 14 ○ Random graphs and the small-world phenomenon
○ Power-law degree distribution and the Preferential Attachment model
Lecture 2A Lecture 2B
3February 21 ○ Time-evolving graphs and network models
○ Centrality criteria and link analysis algorithms
Lecture 3A Lecture 3B
4February 28 ○ Graph clustering and community detection Lecture 4Project proposal due on March 1
5March 6 ○ Node similarity and link prediction
○ Graph similarity
Lecture 5A Lecture 5B Data challenge out
6March 13 ○ Representation learning in graphs
○ Graph sampling and summarization
Lecture 6A Lecture 6B
7March 20 ○ Epidemic processes and cascading behavior in networks
○ Influence maximization in social networks
Lecture 7A Lecture 7B Data challenge due on March 22
8April 10Project presentationsProject final report due on April 10
Project poster session or presentations



[February 7] Lecture 1A: Introduction

Introduction to graph mining and network analysis, administrivia, course structure and overview of the topics that will be covered in the course.

Reading:

[February 7] Lecture 1B: Graph theory and linear algebra recap; basic network properties

Presentation of basic concepts in graph theory, linear algebra and spectral graph theory that will be used throughout the course. Basic network properties: degree distribution, clustering coefficient and shortest path length.

Reading: Additional:

[February 14] Lecture 2A: Random graphs and the small-world phenomenon

The Erdos-Renyi random graph model and its basic properties. Comparison to the properties of real networks. The small-world phenomenon and the small-world model.

Reading: Additional:
  • Random graphs, lecture notes by Aaron Clauset (CU Boulder)
  • Diameter on d-regular random graphs, lecture notes by Yaron Singer (Harvard University)
  • Networks: An Introduction (Chapter 12)
  • P. Erdos and A. Renyi. On Random Graphs I. Publicationes Mathematicae (6) 290-297, 1959
  • P. Erdos and A. Renyi. On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutato Int. Koezl., 1960
  • D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature 393:440-42, 1998
  • P. S. Dodds, R. Muhamad, D. J. Watts. An Experimental Study of Search in Global Social Networks. Science 301, 2003
  • D. J. Watts, P. S. Dodds, M. E. J. Newman. Identity and Search in Social Networks. Science, 296, 1302-1305, 2002
  • M. E. J. Newman. Models of the Small World: A Review., J. Stat. Physics 2000
  • J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. ACM Symposium on Theory of Computing, 2000
  • L. Backstrom, P. Boldi, M. Rosa, J. Ugander, and S. Vigna. Four Degrees of Separation. ACM Web Science Conference. 2012
  • J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The Anatomy of the Facebook Social Graph. arXiv, 2012

[February 14] Lecture 2B: Power-law degree distribution and the Preferential Attachment model

Power-law degree distribution in real networks. How to analyze and visualize power-law distributions. The Preferential Attachment model. Consequences of skewed degree distribution in the robustness of real networks.

Reading: Additional:
  • A. Clauset, C.R. Shalizi, and M.E.J. Newman. Power-law distributions in empirical data. SIAM Review 51(4), 661-703, 2009
  • Networks, crowds, and markets (Chapter 18)
  • Graph Mining: Laws, Tools, and Case Studies (Chapter 2 and 9)
  • Bela Bollobas, Oliver Riordan, Joel Spencer and Gabor Tusnady. The degree sequence of a scale-free random graph process. Journal Random Structures and Algorithms 18(3), 2001
  • M. Mitzenmacher. A Brief History of Generative Models for Power Law and Lognormal Distributions. Internet Mathematics 1(2), pp. 226-251, 2004
  • M. Faloutsos, P. Faloutsos, C. Faloutsos. On Power-Law Relationships of the Internet Topology. In SIGCOMM, 1999.
  • R. Albert, H Jeong, and A.-L. Barabasi. The diameter of the world wide web. Nature 401, 130-131, 1999
  • A.L Barabasi, R. Albert. Emergence of scaling in random networks. Science, 286, 1999

[XX] Lecture 3A: Time-evolving graphs and network models

Properties of time-evolving graphs. The Forest-Fire and Kronecker graph models.

Reading: Additional:

[XX] Lecture 3B: Centrality criteria and link analysis algorithms

Centrality criteria in graphs (degree, closeness, betweenness, eigenvector, Katz). Link analysis ranking algorithms (HITS and PageRank).

Reading: Additional:

[XX] Lecture 4: Graph clustering and community detection

Community detection in networks. Girvan-Newman algorithm. Modularity and modularity optimization (greedy, spectral, Louvain method). Spectral clustering. Community evaluation criteria. Community detection in directed networks. Overlapping community detection. Community structure of large scale networks.

Reading: Additional:

[XX] Lecture 5A: Link prediction

Node similarity measures. Link prediction in networks.

Reading: Additional:

[XX] Lecture 5B: Graph similarity

Graph similarity. Graph kernels.

Reading: Additional:

[XX] Lecture 6A: Representation learning in graphs

Methods for learning node embeddings in graphs (LINE, DeepWalk and node2vec).

Reading:
Additional:

[XX] Lecture 6B: Graph sampling and summarization

Graph sampling. Graph sparsification for community detection. Graph summarization.

Reading: Additional:

[XX] Lecture 7A: Cascading behavior in networks

Cascading behavior. Models of virus and information probagation.

Reading: Additional:

[XX] Lecture 7B: Influence maximization

Influence maximization in social networks. The Greedy algorithm. Outbreak detection in networks.

Reading: Additional:


Course Structure

Learning objectives

The course aims to introduce students to the field of graph mining and network analysis by:
  • Covering a wide range of topics, methodologies and related applications.
  • Giving the students the opportunity to obtain hands-on experience on dealing with graph data and graph mining tasks.
We expect that by the end of the course, the students will have a thorough understanding of various graph mining and learning tasks, will be able to analyze large-scale graph data as well as to formulate and solve problems that involve graph structures.


Prerequisites

There is no official prerequisite for this course. However, the students are expected to:

  • Have basic knowledge of graph theory and linear algebra.
  • Be familiar with fundamental data mining and machine learning tasks.
  • Be familiar with at least one programming language (e.g., Python or any language of their preference).
During the first lecture, we will review basic concepts in graph theory, linear algebra and machine learning.


Reading material

Most of the material of the course is based on research articles. Some of the topics are also covered by the following books:



Evaluation

The evaluation of the course will be based on the following:

  1. Data challenge this assignment aims to familiarize the students with basic graph mining and analysis tasks through a data challenge.
  2. Project: this will be the main component for the evaluation of the course. The students are expected to form groups of 3-4 people, propose a topic for their project, and submit a final project report (it would also be interesting to organize a poster session at the end of the quarter). Please, read the project section for more details.

The grading will be as follows:

Data challenge (groups of 3-4 students): 40%
Project (groups of 3-4 students): 60%


Academic integrity

All of your work must be your own. Don't copy another student's assignment, in part or in total, and submit it as your own work. Acknowledge and cite source material in your papers or assignments.



Project

Details about the project of the course can be found on piazza.




Resources

Datasets


Software tools

  • NetworkX: Python software package for graph analytics
  • igraph: collection of software packages for graph theory and network analysis (Python, C++ and R)
  • SNAP: high performance system for the analysis of large network (C++ and Python)
  • Gephi: graph visualization and exploration software

Related conferences
Please find below a list of conferences related to the contents of the course (mostly in the field of data mining, social network analysis and the Web). We provide the DBLP website of each venue where you can access the proceedings (papers, tutorials, etc).

Check out the website of each conference (e.g., KDD 2016 ) for more information.