Getting involved

There is now a mailing list for this reading group. Send me (Damien) an email to get on it. No problem.

Note: this reading group is about deep learning as applied to depth estimation from a single image - one of the super hot topics. If your interest is deep learning in general, you may find some of the readings a little bit off-topic. So let me know if you want some idea about which you should read for.

Proposed Schedule

The below schedule is only proposed, and subject to change.

14 Sept 8.30am: ~~Foley & Maitlin Chapter 6: Distance & Size Perception~~
- Location: EEBF 4302
21 Sept 8.30am: ~~Saxena, Min & Ng: Make3D~~
- Location: EEBF 4302
28 Sept 8.30am: ~~Michels, Saxena & Ng: High speed obstacle avoidance~~
- Location: EEBF 4302
05 Octr 8.30am: ~~Karsch, Liu & Kang: Depth Transfer~~
- Location: EEBF 4302
12 Octr 8.30am: ~~LeCun, Bengio & Hinton: Deep Learning Review~~
- Location: EEBF 4302
19 Octr 8.30am: ~~Rumelhart, Hinton & Williams: Backpropagation~~
- Location: EEBF 4302
26 Octr 8.30am: ~~LeCun, Bottou, Bengio & Haffner: CNNs~~
- Location: EEBF 4302
02 Novr 8:30am: ~~LeCun, Bottou, Bengio & Haffner: CNNs~~
- Location: EEBF 4302
09 Novr 8.30am: ~~Simonyan & Zisserman: VGG-16~~
- Location: EEBF 4302
16 Novr 8.30am: ~~Eigen, Puhrsch & Fergus: Depth map prediction~~
- Location: EEBF 4302
23 Novr 8.30am: ~~Luo et al.: Deep Learning + Stereo~~
- Location: EEBF 4302
30 Novr 8.30am: ~~Shelhamer, Long & Darrell: Fully Convolutional Segmentation~~
- Location: EEBF 4302
07 Decr 8.30am: ~~Giuisti et al.: Forest trails CNN~~
- Location: EEBF 4302
14 Decr 8.30am: ~~Batch Normalization~~
- Location: EEBF 4302
22 Decr 8:30am: ~~He, Zhang, Ren & Sun: ResNet~~
- Location: EEBF 4302

Week of 25 Decr: Break
Week of 01 Janr: Break
Week of 08 Janr: Break
Week of 15 Janr: Break
Week of 22 Janr: Break
Week of 29 Janr: Break
Week of 05 Febr: Cao, Wu & Shen: Fully convolutional depth 1
Week of 12 Febr: Laina et al.: Fully convolutional depth 2
Week of 19 Febr: Li, Klein & Yao: Fully convolutional depth 3
Week of 26 Febr: Güler et al.: DenseReg
Week of 05 Marc: Godard et al. Unsupervised/train from stereo
Week of 12 Marc: Zhou et al.: SfMLearner
Week of 19 Marc: Yan et al. Superpixel CNN/CRF
Week of 26 Marc: Break
Week of 02 Aprl: Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser
Week of 09 Aprl: Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images
Week of 16 Aprl: Goodfellow et al.: Generative Adversarial Nets
Week of 23 Aprl: Oord et al.: Pixel-RNN and Pixel-CNN
Week of 30 Aprl: Isola et al.: Pix2Pix
Week of 07 May : Tatarchenko et al. Multi-view 3D models
Week of 14 May : Girshick, Donahue, Darrell & Malik: R-CNN
Week of 21 May : Redmon et al.: YOLO
???: Hoeim et al. Photo pop-up
???: Heitz et al. Cascaded Classification Models
???: Li et al. Feedback-enabled Cascaded Classification Models
???: Han et al. Bayesian object-level reconstruction
???: Liu et al. Depth from semantics
???: Wu et al. Repetitive scene structure
???: He et al. Haze removal
???: Hassner et al. Example-based Depth

Details

Foley & Maitlin Chapter 6 - Distance & Size Perception

Because our project is about using machine learning to extract depth from a single image (with deep learning, then applying it to robot problems) it pays to learn a bit about how humans do it...

https://books.google.com.tr/books?id=jLBmCgAAQBAJ&printsec=frontcover

Go to Chapter 6.

If that doesn't work (some have reported finding it difficult to access Chapter 6), try the following link: http://tinyurl.com/yalnnwp9 - some have reported being able to access the chapter by doing a google search for content.

Another thing to try that has worked for some is to log out of any google/gmail account before trying to access.

If nothing else works, email me.

Saxena, Min & Ng: Make3D

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4531745

This is the classic paper that brought machine learning to the problem of depth from a single image, quite successfully, considering previous attempts. It uses Markov Random Fields, which are a bit advanced but, importantly, quite slow.

Note: because our library has a subscription to IEEE Xplore, you can access the above link from on-campus or via off-campus library access or via VPN.

But, here is an alternative link: http://www.cs.cornell.edu/~asaxena/reconstruction3d/saxena_make3d_learning3dstructure.pdf

There are some videos and things available here: http://make3d.cs.cornell.edu/ -- there used to be a live online demo but they've closed that. There is also a list of results on the Make3D dataset up till about 2012: http://make3d.cs.cornell.edu/results_stateoftheart.html

After that other datasets started being used also.

Superpixels are used in the study. Here is a quick intro to them: http://ttic.uchicago.edu/~xren/research/superpixel/

MRFs are more difficult and if anybody has seen a good tutorial for them let me know so that I can link to it here. The best I could find is https://mitpress.mit.edu/sites/default/files/titles/content/9780262015776_sch_0001.pdf but it is still a bit difficult. We will probably end up discussing what MRFs are a lot on Thursday.

Michels, Saxena & Ng: High speed obstacle avoidance

http://dl.acm.org/citation.cfm?id=1102426

Here the same authors focus on a related problem, that of determining open spaces for guiding a vehicle, again using machine learning techniques.

This version of the paper might be of higher quality (thanks to Hossein for finding):

http://ai.stanford.edu/~asaxena/rccar/ICML_ObstacleAvoidance.pdf

Karsch, Liu & Kang: Depth Transfer

Here is the target paper: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5551153

For those who are not on campus, a temporary link: http://web.itu.edu.tr/djduff/Share/KarschEtAl2014.pdf

This is a nonparametric approach to depth from a single image. They search a database of images similar to the observed one then aligns the found image with the observed one then warps the found image retrieved from the database to estimate the depth of the current image. It depends on an approach called SIFTFlow to do the alignment.

Here is a paper describing "SIFTFlow" (if you have the time to go deeper): http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6787109

Free version: http://people.csail.mit.edu/celiu/SIFTflow/

Or a shorter conference version available off-campus: http://people.csail.mit.edu/celiu/ECCV2008/

LeCun, Bengio & Hinton: Deep Learning Review

http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html?foxtrotcallback=true

Alternative links: http://pages.cs.wisc.edu/~dyer/cs540/handouts/deep-learning-nature2015.pdf https://www.researchgate.net/publication/277411157_Deep_Learning

A whirlwind compressed intro to deep learning and its parts.

For a more gentle introduction to deep learning: http://cs231n.stanford.edu/

Or you can find lots of gentle short intros: https://www.google.com.tr/search?q=intro+to+deep+learning

Rumelhart, Hinton & Williams: Backpropagation

An early paper introducing backpropagation, the main way we train neural networks nowadays: http://www.nature.com/articles/323533a0

Alternative link: http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf

These topics are also addressed in the tutorials shared just above (or you will find plenty online and most neural network tutorials attempt to explain backpropagation as it is the main way these networks are trained - I usually use a simple genetic algorithm when explaining how to train neural networks because it's simpler - an accelerated tutorial of neural networks without explaining backpropagation but explaining one of the tools is at http://files.djduff.net/nn.zip ).

LeCun, Bottou, Bengio & Haffner: CNNs

This is the now classic paper describing LeNet architectures applying Convolutional Neural Networks (CNNs) to the problem of optical character recognition. It also embeds the neural network in an architecture for automatically segmenting text, including a system for automatically reading cheques.

http://ieeexplore.ieee.org/abstract/document/726791/

Alternative link: http://www.dengfanxin.cn/wp-content/uploads/2016/03/1998Lecun.pdf

Convolutional neural networks were introduced by the authors in 1990. It may be instructive to read that considerably simpler paper: http://yann.lecun.com/exdb/publis/pdf/lecun-90c.pdf

Or any tutorial about CNNs. One good place to follow is: http://cs231n.stanford.edu/syllabus.html

Krizhevsky, Sutskever & Hinton: ImageNet/AlexNet

We will not discuss this in the reading group.

https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Here is when convolutional neural networks and deep learning really showed what it could do - the problem of image recognition.

But we won't use this because a lot of the complexity it introduces turns out not to be necessary. Later methods are "cleaner". So we have taken it out of the reading list.

Simonyan & Zisserman: VGG-16

http://arxiv.org/abs/1409.1556

A relatively recent "deep" deep net with 16 layers for image recognition. Note: successful recent networks have one thousand layers.

Feel free to take a look at AlexNet above to get an idea of the space of approaches.

Eigen, Puhrsch & Fergus: Depth map prediction

https://www.cs.nyu.edu/~deigen/depth/

Finally, we apply deep neural convolutional networks to the problem that we are interested in.

Luo et al.: Deep Learning + Stereo

Combining deep learning and stereo.

https://www.cs.toronto.edu/~urtasun/publications/luo_etal_cvpr16.pdf

Shelhamer, Long & Darrell: Fully Convolutional Segmentation

http://arxiv.org/abs/1605.06211

Here a related problem is solved, that of semantic segmentation, but this approach is applicable to our problem.

Giuisti et al.: Forest trails CNN

http://ieeexplore.ieee.org/document/7358076/

Alternative link: http://rpg.ifi.uzh.ch/docs/RAL16_Giusti.pdf

Here we have a CNN-based update to the learn-to-navigate-from-images problem addressed by Saxena et al. above.

Loffe & Szegedy: Batch Normalization

https://arxiv.org/abs/1502.03167

A recent technique that has enabled powerful new methods and ultimately much deeper neural networks. Important stuff.

Apparently the following video from the Stanford CS231N course contains details of Batch Normalization (37:00 to 59:30)

https://youtu.be/gYpoJMlgyXA?list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC

Best, Damien

He, Zhang, Ren & Sun: ResNets

https://arxiv.org/abs/1512.03385

This work and variations on it have been the basis of the 1000 layer recent neural networks. Important stuff.

The diagram at the top of Page 2 of this paper (which offers an improvement on the ResNet of the above paper) is quite useful in understanding the structure of a residual unit:

http://arxiv.org/abs/1603.05027

Cao, Wu & Shen: Fully convolutional depth 1

http://arxiv.org/abs/1605.02305

Here we start a series of recent papers that take different approaches using deep nets to depth from a single image.

Laina et al.: Fully convolutional depth 2

http://arxiv.org/abs/1606.00373

Here we continue a series of recent papers that take different approaches using deep nets to depth from a single image.

Li, Klein & Yao: Fully convolutional depth 3

http://arxiv.org/abs/1607.00730

Here we finalise a series of recent papers that take different approaches using deep nets to depth from a single image.

Güler et al. DenseReg

https://arxiv.org/abs/1612.01202

A key idea here is how to do regression using categorical prediction & an application of the Fully Convolutional Networks to regression problems.

Godard et al. Unsupervised/train from stereo

https://arxiv.org/abs/1609.03677

http://visual.cs.ucl.ac.uk/pubs/monoDepth/

Zhou et al.: SfMLearner

Super cool stuff.

https://people.eecs.berkeley.edu/%7Etinghuiz/projects/SfMLearner/

https://arxiv.org/abs/1704.07813

Yan et al.: Superpixel CNN/CRF

http://ieeexplore.ieee.org/document/8105853/

Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser

https://arxiv.org/abs/1611.02174

Here we see an interesting depth-from-single-image sensor fusion with robotics applications.

Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images

https://arxiv.org/abs/1411.5928

A non-adversarial approach to generating images.

Goodfellow et al.: Generative Adversarial Nets

https://papers.nips.cc/paper/5423-generative-adversarial-nets

Another important recent development that we may make use of.

Oord et al.: Pixel-RNN & Pixel-CNN

https://arxiv.org/abs/1601.06759

Producing distributions over images. We have always intended to do something like this for depth images.

http://arxiv.org/abs/1606.05328

Isola et al. Pix2Pix

https://arxiv.org/abs/1611.07004

We can use this too. And it's cool.

Tatarchenko et al. Multi-view 3D models

https://arxiv.org/abs/1511.06702

Not just inferring the depth image but also other views of it (related to the SfMLearner paper).

Girshick, Donahue, Darrell & Malik: R-CNN

https://arxiv.org/abs/1311.2524

We take a slight segue to check out how tracking has been done recently with neural networks. Note that Faster-RCNN and more recent alternatives use similar principles but do it faster.

Redmon et al.: YOLO

https://arxiv.org/abs/1506.02640

Hoeim et al. Photo pop-up

A classic.

http://repository.cmu.edu/cgi/viewcontent.cgi?article=1288&context=robotics

Heitz et al. Cascaded Classification Models

Older pre-CNN machine learning papers for depth estimation from a single image.

http://papers.nips.cc/paper/3472-cascaded-classification-models-combining-models-for-holistic-scene-understanding.pdf

Li et al. Feedback-enabled Cascaded Classification Models

Older pre-CNN machine learning papers for depth estimation from a single image.

https://arxiv.org/abs/1110.5102

Han et al. Bayesian object-level reconstruction

http://escholarship.org/uc/item/9tk6935x.pdf

Liu et al. Depth from semantics

Older pre-CNN machine learning papers for depth estimation from a single image.

http://ai.stanford.edu/people/koller/Papers/Liu+al:CVPR10.pdf

Wu et al. Repetitive scene structure

Older pre-CNN machine learning papers for depth estimation from a single image.

http://www.academia.edu/download/30713855/WuCVPR11.pdf

He et al. Haze removal

Might be interesting because of use of single-image cues.

http://mmlab.ie.cuhk.edu.hk/2009/dehaze_cvpr2009.pdf

Hassner et al. Example-based Depth

Seems like an older version of the SIFTFlow based one of Karsch.

http://www.wisdom.weizmann.ac.il/~ronen/papers/Hassner%20Basri%20-%20Example%20Based%203D%20Reconstruction%20from%20Single%202D%20Images.pdf

List of interested people

(who I will contact with information about the schedule etc.)

Abdulmajeed M. K.
Alican M.
Anas M.
K. Bulut Ö.
Imaduddin A. M.
Tolga C.
Bilge A.
Hatice K.
Hossein P.
Torkan G.
Buse Sibel K.
Oğuzhan C.
M. Alperen Ö.
Hatice K.
Özgür Ö.
Doğay K.
Onur A.
Alper K.
Furkan A.
Elena B. S.
Müjde A.
Emeç E.
Onur A.
Ekrem Alper K.
Utku Ö.
Mert Ş.
Jimmy A.
Hasan K.
B. Uğur T.

Additional Resources

EBook: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/DeepLearning-NowPublishing-Vol7-SIG-039.pdf Deep Learning: Methods and Applications by Li Deng and Dong Yu

Online course with slides videos and assignments: http://cs231n.stanford.edu/ CS231n: Convolutional Neural Networks for Visual Recognition (Stanford)

My NN/Keras bootcamp slides: http://files.djduff.net/nn.zip

ReadingGroup

Contents

Getting involved

Proposed Schedule

Details

Foley & Maitlin Chapter 6 - Distance & Size Perception

Saxena, Min & Ng: Make3D

Michels, Saxena & Ng: High speed obstacle avoidance

Karsch, Liu & Kang: Depth Transfer

LeCun, Bengio & Hinton: Deep Learning Review

Rumelhart, Hinton & Williams: Backpropagation

LeCun, Bottou, Bengio & Haffner: CNNs

Krizhevsky, Sutskever & Hinton: ImageNet/AlexNet

Simonyan & Zisserman: VGG-16

Eigen, Puhrsch & Fergus: Depth map prediction

Luo et al.: Deep Learning + Stereo

Shelhamer, Long & Darrell: Fully Convolutional Segmentation

Giuisti et al.: Forest trails CNN

Loffe & Szegedy: Batch Normalization

He, Zhang, Ren & Sun: ResNets

Cao, Wu & Shen: Fully convolutional depth 1

Laina et al.: Fully convolutional depth 2

Li, Klein & Yao: Fully convolutional depth 3

Güler et al. DenseReg

Godard et al. Unsupervised/train from stereo

Zhou et al.: SfMLearner

Yan et al.: Superpixel CNN/CRF

Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser

Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images

Goodfellow et al.: Generative Adversarial Nets

Oord et al.: Pixel-RNN & Pixel-CNN

Isola et al. Pix2Pix

Tatarchenko et al. Multi-view 3D models

Girshick, Donahue, Darrell & Malik: R-CNN

Redmon et al.: YOLO

Hoeim et al. Photo pop-up

Heitz et al. Cascaded Classification Models

Li et al. Feedback-enabled Cascaded Classification Models

Han et al. Bayesian object-level reconstruction

Liu et al. Depth from semantics

Wu et al. Repetitive scene structure

He et al. Haze removal

Hassner et al. Example-based Depth

List of interested people

Additional Resources

Navigation menu

Search