2EL1730: Machine Learning

Course Overview

General Info: 2nd year course, CentraleSupélec, November 2019 - January 2020

Lecture hours: Tuesday (8:30-12:00), Friday (13:45 - 17:00)
Instructors: Fragkiskos Malliaros and Maria Vakalopoulou
Office hours: Right after class (or send us an email and we will find a good time to meet)

TAs: Mohamed El Amine Seddik, Yunshi Huang, Yoann Pradat, Jun Zhu

Piazza: piazza.com/centralesupelec/winter2020/2el1730/home

Machine learning is the scientific field that provides computers the ability to learn without being explicitly programmed (definition by Wikipedia). Machine learning lies at the heart of many real-world applications, including recommender systems, web search, computer vision, autonomous cars and automatic language translation.

The course will provide an overview of fundamental topics as well as important trends in machine learning, including algorithms for supervised and unsupervised learning, dimensionality reduction methods and their applications. A substantial lab section will involve group projects on a data science competition and will provide the students the ability to apply the course theory to real-world problems.

Schedule and Lectures

The topics of the lectures are subject to change (the following schedule outlines the topics that will be covered in the course). The slides for each lecture will be posted in piazza just before the start of the class. The due dates of the assignments/project are subject to change.

Lecture	Date	Topic	Material	Assignments/Project
1	November 26	Introduction; Model selection and evaluation	Lecture 1
2	November 29	Dimensionality reduction	Lecture 2
3	December 3	Linear and logistic regression	Lecture 3	Assignment 1 out
4	December 6	Probabilistic classifiers and linear discriminant analysis	Lecture 4
5	December 13	Non-parametric learning and nearest neighbor methods	Lecture 5	Project proposal due on December 13 Assignment 2 out
6	December 17	Support Vector Machines	Lecture 6	Assignment 1 due on December 17
7	December 20	Tree-based methods and ensemble learning	Lecture 7
8	January 7	Neural networks	Lecture 8
9	January 10	Introduction to deep learning Guest lecture by Dr. Stergios Christodoulidis (Institut Gustave Roussy)	Lecture 9
10	January 14	Introduction to reinforcement learning Guest lecture by Dr. Nikolaos Tziortziotis (Tradelab R&D)	Lecture 10	Assignment 2 due on January 14
11	January 17	Unsupervised learning: clustering	Lecture 11
12	January 20	Exams		Project final report due on January 26

[November 26] Lecture 1: Introduction; Model selection and evaluation

Introduction to machine learning, administrivia, course structure and overview of the topics that will be covered in the course. Overfitting and generalization. Bias-variance tradeoff. Training, validation and test sets. Cross-validation. Evaluation of supervised learning algorithms. Basic concepts in optimization.

Reading:

Understanding Machine Learning: From Theory to Algorithms (Chapters 1 and 2)
The Elements of Statistical Learning (Chapter 1)
The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Sections 7.1, 7.2, 7.3, 7.10)

Additional:

M. Kuhn and K. Johnson. An Introduction to Feature Selection. Applied Predictive Modeling, pages 487-519, 2013. [For the part of the lecture on feature selection].
Concentration of the empirical risk (also here), lecture notes by Dimitris Papailiopoulos (UW-Madison).
Model evaluation, model selection, and algorithm selection in machine learning: Part I (Basics), Part II (Bootstrapping and uncertainties), and Part III (Cross-validation and hyperparameter tuning). Interesting blog post by Sebastian Raschka, 2016.
Convex Optimization: Algorithms and Complexity (Sections 1.1, 1.2, 1.3)
Convex optimization and gradient descent, lecture notes by Nisheeth Vishnoi, EPFL (Sections 1.1, 1.2, 1.3)

[November 29] Lecture 2: Dimensionality reduction

Dimemensionality reduction techniques. Singular Value Decomposition (SVD). Principal Component Analysis (PCA). Multidimensional Scaling (MDS) and nonlinear dimensionality reduction.

Reading:

Mining of Massive Datasets (Sections 11.1, 11.2, 11.3)
Jonathon Shlens. A Tutorial on Principal Component Analysis. arXiv, 2014.

Additional:

SVD and Low Rank Matrix Approximations, lecture notes by Tim Roughgarden and Gregory Valiant (Stanford University)
Understanding Machine Learning: From Theory to Algorithms (Section 23.1)
J. B. Tenenbaum, V. De Silva, and J. C. Langford. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290:5500, pp. 2319-2323, 2000
S. T. Roweis and L. K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290:5500, pp. 2323-2326, 2000
M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, 2001

[December 3] Lecture 3: Linear and logistic regression

Supervised learning models. Linear regression. Regularization. Linear classification models. Logistic regression. Maximum likelihood estimation.

Reading:

Linear and logistic regression, lecture notes by Andrew Ng (Stanford University)
The Elements of Statistical Learning (Sections 3.1, 3.2 and 3.4)

Additional:

Understanding Machine Learning: From Theory to Algorithms (Sections 9.2 and 9.3)

[December 6] Lecture 4: Probabilistic classifiers and linear discriminant analysis

Bayes rule. Naive Bayes classifier. Maximum a posteriori estimation. Linear discriminant analysis (LDA).

Reading:

Generative learning algorithms, lecture notes by Andrew Ng (Stanford University)
Fisher linear discriminant analysis (LDA), lecture notes by Cheng Li and Bingyu Wang (Northeastern University)

Additional:

Understanding Machine Learning: From Theory to Algorithms (Sections 24.1, 24.2 and 24.3)
Linear discriminant analysis (LDA), lecture notes by Duncan Fyfe Gillies (Imperial College)
Fisher linear discriminant analysis, notes by Max Welling (University of Amsterdam)

[December 13] Lecture 5: Non-parametric learning and nearest neighbor methods

Introduction to non-parametric learning methods. Distance and similarity metrics. Nearest neighbor algorithms.

Reading:

A Course in Machine Learning, by Hal Daumé III (Sections 3.1, 3.2 and 3.3)

Additional:

Understanding Machine Learning: From Theory to Algorithms (Chapter 19)
The Elements of Statistical Learning (Section 13.3)

[December 17] Lecture 6: Support Vector Machines

Maximum margin classifier. Linear SVMs. Primal and dual optimization problems. Non-linearly separable data and the kernel trick. Regularization and the non-separable case.

Reading:

Support Vector Machines, lecture notes by Andrew Ng (Stanford University)

Additional:

Understanding Machine Learning: From Theory to Algorithms (Chapters 15)

[December 20] Lecture 7: Tree-based methods and ensemble learning

Decision trees. Ensemble learning. Bagging and Boosting. The AdaBoost algorithm.

Reading:

Classification: Basic Concepts, Decision Trees, and Model Evaluation, Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, 2006
Introduction to AdaBoost, lecture notes by Balazs Kegl (University of Paris-Saclay)

Additional:

Understanding Machine Learning: From Theory to Algorithms (Chapters 18 and 10)

[January 7] Lecture 8: Neural networks

Introduction to neural networks. The perceptron algorithm. Multilayer perceptron. Backpropagation. Applications.

Reading:

Understanding Machine Learning: From Theory to Algorithms (Chapter 20)

[January 19] Lecture 9: Introduction to deep learning

Deep learning, CNNs

Reading:

Understanding Machine Learning: From Theory to Algorithms (Chapter 20)

[January 14] Lecture 10: Introduction to reinforcement learning

Intelligence agents, dynamic programming, Monte Carlo methods, temporal difference learning

Reading:

Reinforcement Learning: An Introduction

[January 17] Lecture 11: Unsupervised learning: clustering

Introduction to unsupervised learning methods. Data clustering. Hierarchical clustering. k-means clustering. Spectral clustering.

Reading:

Understanding Machine Learning: From Theory to Algorithms (Chapter 22)

Additional:

U. von Luxburg. Tutorial on spectral clustering. Statistics and Computing 17(4), 2007

Course Structure and Objectives

Structure
Each section of the course is divided into 1h30' lecture and 1h30' lab. The labs will include hands-on assignments (using Python) and will provide the students the opportunity to deal with ML tasks in practice.

Learning objectives

The course aims to introduce students to the field of machine learning by:

Covering a wide range of topics, methodologies and related applications.
Giving the students the opportunity to obtain hands-on experience on dealing with.

We expect that by the end of the course, the students will be able to:

Identify problems that can be solved using machine learning methodologies.
Given a problem, identify and apply the most appropriate algorithm(s).
Implement some of those algorithms from scratch.
Evaluate and compare machine learning algorithms for a particular task.
Deal with real-world data challenges.

Prerequisites

There is no official prerequisite for this course. However, the students are expected to:

Have basic knowledge of probability theory and linear algebra.
Be familiar with at least one programming language (e.g., Python or any language of their preference).

Reading material

There is no single requiered textbook for the course. We will recommend specific chapters from the following books:

Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014.
Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2011.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition, Springer, 2017.
Jure Leskovec, Anand Rajaraman, and Jeff Ullman. Mining of Massive Datasets. Cambridge University Press, 2014.

Evaluation

The evaluation of the course will be based on the following:

Two assignments: the assignments will include theoretical questions as well hands-on practical questions that will familiarize the students with basic machine learning tasks.
Project: The students are expected to form groups of 3-4 people, propose a topic for their project, and submit a final project report. Please, read the project section for more details.
Final exam: Final exam in the material covered in the course.

The grading will be as follows:

Assignment 1 (individually):	10%
Assignment 2 (groups of 3-4 students):	20%
Project (groups of 3-4 students):	30%
Final exam:	40%

Academic integrity

All of your work must be your own. Don't copy another student's assignment, in part or in total, and submit it as your own work. Acknowledge and cite source material in your papers or assignments.

Project

Details about the project of the course have been posted on piazza.

Resources

Datasets

List of datasets for machine learning reserach (mainly link to the corresponding research articles).
List of Kaggle datasets.
KDnuggets list of datasets.
archive.org list of datasets.
Paris Data: publicly available datasets from the city of Paris.
KDD Cup of Fresh air: dataset containing the concentration level or air pollutants (including PM2.5) in Beijing and London.
CrowdFlower AI list of datasets.
Amazon Web Services (AWS) list of datasets.
Amazon product data by Julian McAuley (UC San Diego).
Amazon question/answer data by Julian McAuley (UC San Diego).
Stanford Network Analysis Project (SNAP).
Social Computing Data Repository at Arizona State University.
BuzzFeedNews datasets.
Awesome public datasets.
Datasets from the Social Computing Research group at MPI-SWS.
Datasets from the AMiner academic social network.
UCR Time Series Archive
Datasets from papers published in the International AAAI Conference on Web and Social Media (ICWSM). Also, check the previous ICWSM conferences.
UCI of Machine Learning Repository.
Social media (Instagram and Facebook) datasets by Emilio Ferrara (USC).
KDD cup archives, uncluding a competition about Tencent Weibo (KDD is one of the premier data mining conferences).
Sina Weibo dataset by Weiboscope (University of Hong Kong). Another Sina Weibo dataset by Tianchi can be found here.
T-Drive trajectory data by Microsof Research (one-week trajectories of 10,357 taxis).
Foursquare dataset by Dingqi Yang (University of Fribourg).
Datasets from the Spatial and Urban Networks group at CEA (Paris).
Yelp dataset challenge
Quora question pairs competition in Kaggle (currently active).
IlliMine repository from the Data Mining Research Group at UIUC.

Software tools

scikit-learn: Machine Learning library in Python
pandas: Python data analysis library
seaborn: statistical data visualization based on matplotlib
Gallery of interesting Jupyter Notebooks

Related conferences
Please find below a list of conferences related to the contents of the course (mostly in the field of machine learning and data mining. We provide the DBLP website of each venue where you can access the proceedings (papers, tutorials, etc).

Check out the website of each conference (e.g., KDD 2016 ) for more information.