This is a graduate course on deep learning, one of the hottest topics in machine learning and AI at the moment.

In the last two or three years, Deep learning has revolutionized speech recognition and image recognition. Deep learning is widely deployed by such companies as Google, Facebook, Microsoft, IBM, Baidu, Apple and others for audio/speech, image, video, and natural language processing.

**Spring 2014 instructor**: Yann LeCun, 715 Broadway, Room 1220, 212-998-3283, yann [ a t ] cs.nyu.edu**Teaching Assistant**: Liu Hao, haoliu [ at ] nyu.edu**Classes**: Mondays 5:10 to 7:00 PM. Location: Cantor, room 101**Lab Sessions**: Wednesdays 5:10 to 6:00 PM. Location: Warren Weaver Hall, room 109.**Office Hours for Prof. LeCun**: Wednesdays 3:00-5:00 and 6:00-7:00 PM. Please send an email to Prof. LeCun prior to an office hour visit.

The course covers a wide variety of topics in deep learning, feature learning and neural computation. It covers the mathematical methods and theoretical aspects as well as algorithmic and practical issues. Deep Learning is at the core of many recent advances in AI, particularly in audio, image, video, and language analysis and undestanding.

This course is primarily designed for student in the Data Science programs. But any student who is familiar with the basics of machine learning can take this course.

The only formal pre-requisites is to have successfully completed “Intro to Data Science” or any basic course on machine learning. Familiarity with computer programming is assumed. The course relies heavily on such mathematical tools as linear algebra, probability and statistics, multi-variate calculus, and function optimization. The basic mathematical concepts will be introduced when needed, but students will be expected to assimilate a non-trivial amount of mathematical concepts in a fairly short time.

Familiarity with basic ML/stats concepts such as multinomial linear regression, logistic regression, K-means clustering, Principal Components Analysis, and simple regularization is assumed.

The topics studied in the course include:

- learning representations of data.
- the energy-based view of model estimation.
- basis function expansion
- supervised learning in multilayer architectures. Backpropagation
- optimization issues in deep learning
- heterogeneous learning systems, modular approach to learning.
- convolutional nets
- applications to image recognition
- structured prediction, factor graphs and deep architectures
- applications to speech recognition
- learning embeddings, metric learning
- recurrent nets: learning dynamical systems
- recursive nets: algebra on representations
- the basics of unsupervised learning
- the energy-based view of unsupervised learning
- energy-shaping methods for unsupervised learning
- decoder-only models: K-means, sparse coding, convolutional sparse coding
- encoder-only models: ICA, Product of Experts, Field of Experts.
- the encoder-decoder architecture
- Sparse Auto-encoders,
- Denoising, Contracting, and Saturating auto-encoders
- Restricted Boltzmann Machines. Contrastive Divergence.
- learning invariant features: group sparsity
- feature factorization
- scattering transform
- software implementation issues. GPU implementations.
- parallelizing deep learning
- theoretical questions
- open questions

- A quick overview of some of the material contained in the course is available from my ICML 2013 tutorial on Deep Learning:
- Slides: PDF

- Q&A about deep learning (Spring 2013 course on large-scale ML)
- Video (2013)

- 2012 IPAM Summer School deep learning and representation learning
- 2014 International Conference on Learning Representations (ICLR 2014)

* Intro to Deep Learning

- Topics:
- Reading material:

* Modular Learning, Neural Nets and Backprop

- Topics: : Backprop, modular models
- Reading Material:
- ?: Additional readings: ICML 2013 pp 34 - 53?

* Clement Farabet's tutorial on the Torch ML library

* Mixture of experts, recurrent nets, intro to ConvNets

- Topics: : Discussion of some modules, Sum/branch, Switch, Logsum module; RBF Net; MAP/MLE loss; Parameter Space Transforms; Convolutional Module
- Reading Material:

* Unscheduled

- Topics: : Some more modules and architectures
- Reading Material:

Guest lecture by Rob Fergus on Conv nets

* Energy–Based Models for Supervised Learning

- Topics: : energy for inference, objective for learning, loss functionals.
- Reading Material:
- Other On-Line Material:

* Optimization Tricks for Deep Learning and Computer Vision

- Topics: : Aspect Ratio, Randomization, Normalization mean / std, Channel Decorrelation
- Reading Material:

* Energy-Based Models for Unsupervised Learning

- Topics: : Learning energy function is hard. These are different strategies (?); Use PCA; NLL: problem intractable; Contrastive Divergence; Just learn E-surface around datapoints; Denoising AE (with drawing on blackboard); Sparse coding
- Reading Material:
- 2013 lectures:
- Sparse Coding, Sparse Auto-Encoders 2013 Lectures:
- ?: NIPS 06 slides

* Optimization for Deep Learning

* Jason Weston's guest Lecture

- Topics:
- Reading Material
- From Jason Weston: http://ronan.collobert.com/pub/matos/2009_tutorial_nips.pdf (part 2)

* Metric Learning and Optimization / Dr Lim

- Topics: : NCA; Dr Lim
- Reading Material:
- Not relevant

* Latent Factor Graphs

- Topics: : Latent Variable Models, Probabilistic LVM, Loss Function, Example handwriting recognition
- Reading Material:
- This is covered in the energy learning tutorial
- Video (2013)

* Optimization for Deep Learning?

- Topics:
- Reading Material:
- Not relevant

* Energy-Based Models for Unsupervised Learning

- Topics: : ISTA/FISTA/LISTA/…
- Reading Material:
- 2013 lectures:

* Speech Recognition / Structured Prediction

- Topics: FFT/DFT, Time Delay Conv Nets, Acoustic Modeling
- Reading Material:
- Not relevant

* Discussion of Project Topics

- Topics:
- Reading Material:
- Torch Cheatsheet: https://github.com/torch/torch7/wiki/Cheatsheet

* Back propagation, History of Deep Learning

- Topics: Lagrange derivation of back propagation, development of neural networks and deep learning since the 1940s
- Reading Material:
- Not relevant

* Final Exam Period May 12 to May 19

- Final Project May 16
- If you are not graduating and need an extension talk to the TA Liu Hao: haoliu [ at ] nyu.edu
- Final Exam May 19

- the reasons for deep learning.
- fprop/bprop: here is the fprop function for a module. Write the bprop.
- modules you should know about:
- linear, point-wise non-linearity, max,
- Y branch, square distance, log-softmax

- loss functions: least square, cross-entropy, hinge
- energy-based supervised learning: energy/inference - objective function/learning
- loss functionals: energy loss, negative log likelihood, perceptron, hinge
- metric learning, siamese nets
- DrLIM, WSABIE criteria
- network architectures:
- shared weights and other weight space transformations
- recurrent nets: basic algorithm for backprop-through-time

- mixture of experts
- convolutional nets:
- architecture, usage, for image and speech recognition and detection of objects in images

- optimization:
- SGD
- tricks to make learning efficient: data normalization and such.
- computing 2nd derivatives (diagonal terms)

- deep learning + structured prediction
- inference through energy minimization and marginalization
- latent variables E(X,Y,Z) → F(X,Y)
- learning using a loss functional
- applications to sequence processing (e.g. Speech and handwriting recognition)
- applications:
- speech and audio (temporal convnets)
- image (spatial convnets)
- text (see Jason Weston and Antoine Bordes¹ lectures)

- unsupervised learning:
- basic idea of energy-based unsupervised learning
- the 7 methods to make the energy low on/near the samples and high everywhere else
- sparse coding and sparse auto-encoders
- ISTA/FISTA, LISTA
- group sparsity

Dates:

- 27 January 2014, 16 weeks

Included in selections:

Deep Learning

Good materials on deep learning.

Good materials on deep learning.

More from 'Computer Science':

CS 282: Principles of Operating Systems II: Systems Programming for Android

Developing high quality distributed systems software is hard; developing high...

Developing high quality distributed systems software is hard; developing high...

Ruby on Rails Tutorial: Learn From Scratch

This post is part of our “Getting Started” series of free text tutorials on...

This post is part of our “Getting Started” series of free text tutorials on...

C++ Grandmaster Certification

The C++ Grandmaster Certification is an online course in which participants...

The C++ Grandmaster Certification is an online course in which participants...

Computational Chemistry (CHEM 4021/8021)

Modern theoretical methods used in study of molecular structure, bonding, and...

Modern theoretical methods used in study of molecular structure, bonding, and...

Machine Learning Course - CS 156

This is an introductory course by Caltech Professor Yaser Abu-Mostafa on machine...

This is an introductory course by Caltech Professor Yaser Abu-Mostafa on machine...

© 2013-2019