Network Science Analytics

Course Overview

General Info:

  • Applied Mathematics Option, CentraleSupélec
  • M.Sc. in Data Sciences and Business Analytics, ESSEC Business School and CentraleSupélec
  • January '18 - April '18

Lecture hours: Friday, 8:30 - 11:30
Lecture room: Eiffel Building, CentraleSupélec

Instructor: Fragkiskos Malliaros
Email: fragkiskos.me [at] gmail.com
Office hours: Right after class (or send me an email and we will find a good time to meet)

TA: Abdulkadir Çelikkanat
Email: abdcelikkanat [at] gmail.com

Piazza: piazza.com/centralesupelec/spring2018/ngsa/home


Networks (or graphs) have become ubiquitous as data from diverse disciplines can naturally be mapped to graph structures. Social networks, such as academic collaboration networks and interaction networks over online social networking applications are used to represent and model the social ties among individuals. Information networks, including the hyperlink structure of the Web and blog networks, have become crucial mediums for information dissemination, offering an effective way to represent content and navigate through it. A plethora of technological networks, including the Internet, power grids, telephone networks and road networks are an important part of everyday life. The problem of extracting meaningful information from large scale graph data in an efficient and effective way has become crucial and challenging with several important applications and towards this end, graph mining and analysis methods constitute prominent tools. The goal of this course is to present recent and state-of-the-art methods and algorithms for analyzing, mining and learning large-scale graph data, as well as their practical applications in various domains (e.g., the web, social networks, recommender systems).





Schedule and Lectures

The topics of the lectures are subject to change (the following schedule outlines the topics that will be covered in the course). The slides for each lecture will be posted in piazza just before the start of the class. The due dates of the assignments/project are subject to change.

Week Date Topic Material Assignments/Project
1January 19 ○ Introduction to network science and graph mining
○ Graph theory and linear algebra recap; basic network properties
Lecture 1A Lecture 1B
2January 26 ○ Random graphs and the small-world phenomenon
○ Power-law degree distribution and the Preferential Attachment model
Lecture 2A Lecture 2B
3February 2 ○ Time-evolving graphs and network models
○ Centrality criteria and link analysis algorithms
Lecture 3A Lecture 3B Assignment 1 out
4February 9 ○ Graph clustering and community detection Project proposal due on February 10
5March 2 ○ Node similarity and link prediction
○ Graph similarity and graph classification
Assignment 1 due on February 25
Assignment 2 out
6March 9 ○ Representation learning in graphs
○ Graph sampling and summarization
7March 16 ○ Epidemic processes and cascading behavior in networks
○ Influence maximization in social networks
8March 23 ○ Core decomposition in networks
○ Graph-based methods in NLP
Assignment 2 due on March 25
9April 6Project presentationsProject final report due
Project poster session or presentations



[January 19] Lecture 1A: Introduction

Introduction to graph mining and network analysis, administrivia, course structure and overview of the topics that will be covered in the course.

Reading:

[January 19] Lecture 1B: Graph theory and linear algebra recap; basic network properties

Presentation of basic concepts in graph theory, linear algebra and spectral graph theory that will be used throughout the course. Basic network properties: degree distribution, clustering coefficient and shortest path length.

Reading: Additional:

[January 26] Lecture 2A: Random graphs and the small-world phenomenon

The Erdos-Renyi random graph model and its basic properties. Comparison to the properties of real networks. The small-world phenomenon and the small-world model.

Reading: Additional:
  • Random graphs, lecture notes by Aaron Clauset (CU Boulder)
  • Diameter on d-regular random graphs, lecture notes by Yaron Singer (Harvard University)
  • Networks: An Introduction (Chapter 12)
  • P. Erdos and A. Renyi. On Random Graphs I. Publicationes Mathematicae (6) 290-297, 1959
  • P. Erdos and A. Renyi. On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutato Int. Koezl., 1960
  • D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature 393:440-42, 1998
  • P. S. Dodds, R. Muhamad, D. J. Watts. An Experimental Study of Search in Global Social Networks. Science 301, 2003
  • D. J. Watts, P. S. Dodds, M. E. J. Newman. Identity and Search in Social Networks. Science, 296, 1302-1305, 2002
  • M. E. J. Newman. Models of the Small World: A Review., J. Stat. Physics 2000
  • J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. ACM Symposium on Theory of Computing, 2000
  • L. Backstrom, P. Boldi, M. Rosa, J. Ugander, and S. Vigna. Four Degrees of Separation. ACM Web Science Conference. 2012
  • J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The Anatomy of the Facebook Social Graph. arXiv, 2012

[January 26] Lecture 2B: Power-law degree distribution and the Preferential Attachment model

Power-law degree distribution in real networks. How to analyze and visualize power-law distributions. The Preferential Attachment model. Consequences of skewed degree distribution in the robustness of real networks.

Reading: Additional:
  • A. Clauset, C.R. Shalizi, and M.E.J. Newman. Power-law distributions in empirical data. SIAM Review 51(4), 661-703, 2009
  • Networks, crowds, and markets (Chapter 18)
  • Graph Mining: Laws, Tools, and Case Studies (Chapter 2 and 9)
  • Bela Bollobas, Oliver Riordan, Joel Spencer and Gabor Tusnady. The degree sequence of a scale-free random graph process. Journal Random Structures and Algorithms 18(3), 2001
  • M. Mitzenmacher. A Brief History of Generative Models for Power Law and Lognormal Distributions. Internet Mathematics 1(2), pp. 226-251, 2004
  • M. Faloutsos, P. Faloutsos, C. Faloutsos. On Power-Law Relationships of the Internet Topology. In SIGCOMM, 1999.
  • R. Albert, H Jeong, and A.-L. Barabasi. The diameter of the world wide web. Nature 401, 130-131, 1999
  • A.L Barabasi, R. Albert. Emergence of scaling in random networks. Science, 286, 1999

[February 2] Lecture 3A: Time-evolving graphs and network models

Properties of time-evolving graphs. The Forest-Fire and Kronecker graph models.

Reading: Additional:

[February 2] Lecture 3B: Centrality criteria and link analysis algorithms

Centrality criteria in graphs (degree, closeness, betweenness, eigenvector, Katz). Link analysis ranking algorithms (HITS and PageRank).

Reading: Additional:

Lecture XX: Graph clustering and community detection (Part I)

Strength of weak ties. Community detection in networks. Girvan-Newman algorithm. Modularity and modularity optimization (greedy, spectral, Louvain method).

Reading: Additional:

Lecture XX: Graph clustering and community detection (Part II)

Graph partitioning. Spectral clustering. Community evaluation criteria.

Reading: Additional:

Lecture XX: Graph clustering and community detection (Part III)

Community detection in directed networks. Overlapping community detection. Community structure of large scale networks.

Reading: Additional:

Lecture XX: Link prediction

Node similarity measures. Link prediction in networks.

Reading: Additional:

Lecture XX: Graph similarity

Graph similarity. Graph kernels.

Reading: Additional:

Lecture XX: Graph sampling and summarization

Graph sampling. Graph sparsification for community detection. Graph summarization.

Reading: Additional:

Lecture XX: Cascading behavior in networks

Cascading behavior. Models of virus and information probagation.

Reading: Additional:

Lecture XX: Influence maximization

Influence maximization in social networks. The Greedy algorithm. Outbreak detection in networks.

Reading:
Additional:

Lecture XX: Representation learning in graphs

Methods for learning node embeddings in graphs (LINE, DeepWalk and node2vec).

Reading:
Additional:

Lecture XX: Core decomposition in graphs

Core decomposition and algorithms. Applications in dense subgraph detection, community detection, identification of influential spreaders and NLP.

Reading:
  • M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley, and H. A. Makse. Identification of influential spreaders in complex networks. Nature Physics 6, 888-893, 2010
  • C. Giatsidis, D. Thilikos, and M. Vazirgiannis. D-cores: Measuring Collaboration of Directed Graphs Based on Degeneracy. In ICDM, 2011

Additional:

Lecture XX: Graph-based methods in NLP

Graph-based methods in information retrieval, text categorization and text summarization.



Course Structure

Learning objectives

The course aims to introduce students to the field of graph mining and network analysis by:
  • Covering a wide range of topics, methodologies and related applications.
  • Giving the students the opportunity to obtain hands-on experience on dealing with graph data and graph mining tasks.
We expect that by the end of the course, the students will have a thorough understanding of various graph mining and learning tasks, will be able to analyze large-scale graph data as well as to formulate and solve problems that involve graph structures.


Prerequisites

There is no official prerequisite for this course. However, the students are expected to:

  • Have basic knowledge of graph theory and linear algebra.
  • Be familiar with fundamental data mining and machine learning tasks.
  • Be familiar with at least one programming language (e.g., Python or any language of their preference).
In the second lecture, we will review basic concepts in graph theory, linear algebra and machine learning.


Reading material

Most of the material of the course is based on research articles. Some of the topics are also covered by the following books:



Evaluation

The evaluation of the course will be based on the following:

  1. Two assignments: the assignments will include theoretical questions as well hands-on practical questions and will familiarize the students with basic graph mining and analysis tasks.
  2. Project: this will be the main component for the evaluation of the course. The students are expected to form groups of 3-4 people, propose a topic for their project, and submit a final project report (it would also be interesting to organize a poster session at the end of the quarter). Please, read the project section for more details.

The grading will be as follows:

Assignment 1 (individually): 20%
Assignment 2 (groups of 3-4 students): 30%
Project (groups of 3-4 students): 50%


Academic integrity

All of your work must be your own. Don't copy another student's assignment, in part or in total, and submit it as your own work. Acknowledge and cite source material in your papers or assignments.



Project

Details about the project of the course can be found on piazza.




Resources

Datasets


Software tools

  • NetworkX: Python software package for graph analytics
  • igraph: collection of software packages for graph theory and network analysis (Python, C++ and R)
  • SNAP: high performance system for the analysis of large network (C++ and Python)
  • Gephi: graph visualization and exploration software

Related conferences
Please find below a list of conferences related to the contents of the course (mostly in the field of data mining, social network analysis and the Web). We provide the DBLP website of each venue where you can access the proceedings (papers, tutorials, etc).

Check out the website of each conference (e.g., KDD 2016 ) for more information.