Getting involved

There is now a mailing list for this reading group. Send me (Damien) an email to get on it. No problem.

Note: this reading group is about deep learning as applied to depth estimation from a single image - one of the super hot topics. If your interest is deep learning in general, you may find some of the readings a little bit off-topic. So let me know if you want some idea about which you should read for.

Proposed Schedule

The below schedule is only proposed, and subject to change.

14 Sept 8.30am: Foley & Maitlin Chapter 6: Distance & Size Perception
- Location: EEBF 4302
21 Sept 8.30am: Saxena, Min & Ng: Make3D
- Location: EEBF 4302
28 Sept 8.30am: Michels, Saxena & Ng: High speed obstacle avoidance
- Location: EEBF 4302
05 Octr 8.30am: Karsch, Liu & Kang: Depth Transfer
- Location: EEBF 4302
12 Octr 8.30am: LeCun, Bengio & Hinton: Deep Learning Review
19 Octr 8.30am: LeCun, Bottou, Bengio & Haffner: CNNs
26 Octr 8.30am: Simonyan & Zisserman: VGG-16
Week of 02 Novr: Break
09 Novr 8.30am: Eigen, Puhrsch & Fergus: Depth map prediction
16 Novr 8.30am: Shelhamer, Long & Darrell: Fully Convolutional Segmentation
23 Novr 8.30am: He, Zhang, Ren & Sun: ResNet
30 Novr 8.30am: Girshick, Donahue, Darrell & Malik: R-CNN
07 Decr 8.30am: Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser
14 Decr 8.30am: Giuisti et al.: Forest trails CNN
Week of 21 Decr: Break
Week of 25 Decr: Cao, Wu & Shen: Fully convolutional depth 1
Week of 01 Jany: Laina et al.: Fully convolutional depth 2
Week of 08 Jany: Li, Klein & Yao: Fully convolutional depth 3
Week of 15 Jany: Luo et al.: Deep Learning + Stereo
Week of 22 Jany: Break
Week of 29 Jany: Break
Week of 06 Febr: Goodfellow et al.: Generative Adversarial Nets
Week of 13 Febr: Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images
Week of 20 Febr: Oord et al.: Pixel-RNN and Pixel-CNN
Week of 27 Febr: Isola et al. Pix2Pix

Details

Foley & Maitlin Chapter 6 - Distance & Size Perception

Because our project is about using machine learning to extract depth from a single image (with deep learning, then applying it to robot problems) it pays to learn a bit about how humans do it...

https://books.google.com.tr/books?id=jLBmCgAAQBAJ&printsec=frontcover

Go to Chapter 6.

If that doesn't work (some have reported finding it difficult to access Chapter 6), try the following link: http://tinyurl.com/yalnnwp9 - some have reported being able to access the chapter by doing a google search for content.

Another thing to try that has worked for some is to log out of any google/gmail account before trying to access.

If nothing else works, email me.

Saxena, Min & Ng: Make3D

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4531745

This is the classic paper that brought machine learning to the problem of depth from a single image, quite successfully, considering previous attempts. It uses Markov Random Fields, which are a bit advanced but, importantly, quite slow.

Note: because our library has a subscription to IEEE Xplore, you can access the above link from on-campus or via off-campus library access or via VPN.

But, here is an alternative link: http://www.cs.cornell.edu/~asaxena/reconstruction3d/saxena_make3d_learning3dstructure.pdf

There are some videos and things available here: http://make3d.cs.cornell.edu/ -- there used to be a live online demo but they've closed that. There is also a list of results on the Make3D dataset up till about 2012: http://make3d.cs.cornell.edu/results_stateoftheart.html

After that other datasets started being used also.

Superpixels are used in the study. Here is a quick intro to them: http://ttic.uchicago.edu/~xren/research/superpixel/

MRFs are more difficult and if anybody has seen a good tutorial for them let me know so that I can link to it here. The best I could find is https://mitpress.mit.edu/sites/default/files/titles/content/9780262015776_sch_0001.pdf but it is still a bit difficult. We will probably end up discussing what MRFs are a lot on Thursday.

Michels, Saxena & Ng: High speed obstacle avoidance

http://dl.acm.org/citation.cfm?id=1102426

Here the same authors focus on a related problem, that of determining open spaces for guiding a vehicle, again using machine learning techniques.

This version of the paper might be of higher quality (thanks to Hossein for finding):

http://ai.stanford.edu/~asaxena/rccar/ICML_ObstacleAvoidance.pdf

Karsch, Liu & Kang: Depth Transfer

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5551153

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6787109

This is a nonparametric approach to depth from a single image. They search a database of images similar to the observed one then warp the image retrieved from the database to estimate the depth of the current image.

For those who are not on campus, a temporary link:

http://web.itu.edu.tr/djduff/Share/KarschEtAl2014.pdf

LeCun, Bengio & Hinton: Deep Learning Review

http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html?foxtrotcallback=true

A whirlwind compressed intro to deep learning and its parts.

LeCun, Bottou, Bengio & Haffner: CNNs

http://ieeexplore.ieee.org/abstract/document/726791/

Here is the classic paper applying convolutional neural networks to image processing.

Krizhevsky, Sutskever & Hinton: ImageNet/AlexNet

https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Here is when convolutional neural networks and deep learning really showed what it could do - the problem of image recognition.

But we won't use this because a lot of the complexity it introduces turns out not to be necessary. Later methods are "cleaner".

Simonyan & Zisserman: VGG-16

http://arxiv.org/abs/1409.1556

A relatively recent "deep" deep net with 16 layers for image recognition. Note: successful recent networks have one thousand layers.

Eigen, Puhrsch & Fergus: Depth map prediction

https://www.cs.nyu.edu/~deigen/depth/

Finally, we apply deep neural convolutional networks to the problem that we are interested in.

Shelhamer, Long & Darrell: Fully Convolutional Segmentation

http://arxiv.org/abs/1605.06211

Here a related problem is solved, that of semantic segmentation, but this approach is applicable to our problem.

Loffe & Szegedy: Batch Normalization

https://arxiv.org/abs/1502.03167

A recent technique that has enabled powerful new methods and ultimately much deeper neural networks. Important stuff.

He, Zhang, Ren & Sun: ResNets

https://arxiv.org/abs/1512.03385

This work and variations on it have been the basis of the 1000 layer recent neural networks. Important stuff.

Girshick, Donahue, Darrell & Malik: R-CNN

https://arxiv.org/abs/1311.2524

We take a slight seque to check out how tracking has been done recently with neural networks. Note that Faster-RCNN and more recent alternatives use similar principles but do it faster.

Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser

https://arxiv.org/abs/1611.02174

Here we see an interesting depth-from-single-image sensor fusion with robotics applications.

Giuisti et al.: Forest trails CNN

http://ieeexplore.ieee.org/document/7358076/

Cao, Wu & Shen: Fully convolutional depth 1

http://arxiv.org/abs/1605.02305

Here we start a series of recent papers that take different approaches using deep nets to depth from a single image.

Laina et al.: Fully convolutional depth 2

http://arxiv.org/abs/1606.00373

Here we continue a series of recent papers that take different approaches using deep nets to depth from a single image.

Li, Klein & Yao: Fully convolutional depth 3

http://arxiv.org/abs/1607.00730

Here we finalise a series of recent papers that take different approaches using deep nets to depth from a single image.

Luo et al.: Deep Learning + Stereo

Combining deep learning and stereo.

https://www.cs.toronto.edu/~urtasun/publications/luo_etal_cvpr16.pdf

Goodfellow et al.: Generative Adversarial Nets

https://papers.nips.cc/paper/5423-generative-adversarial-nets

Another important recent development that we may make use of.

Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images

https://arxiv.org/abs/1411.5928

A non-adversarial approach to the same problem.

Oord et al.: Pixel-RNN & Pixel-CNN

https://arxiv.org/abs/1601.06759

Producing distributions over images. We have always intended to do something like this for depth images.

http://arxiv.org/abs/1606.05328

Isola et al. Pix2Pix

https://arxiv.org/abs/1611.07004

We can use this too. And it's cool.

List of interested people

(who I will contact with information about the schedule etc.)

Abdulmajeed M. K.
Alican M.
Anas M.
K. Bulut Ö.
Imaduddin A. M.
Tolga C.
Bilge A.
Hatice K.
Hossein P.
Torkan G.
Buse Sibel K.
Oğuzhan C.
M. Alperen Ö.
Hatice K.
Özgür Ö.
Doğay K.
Onur A.
Alper K.
Furkan A.
Elena B. S.
Müjde A.
Emeç E.

Additional Resources

EBook: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/DeepLearning-NowPublishing-Vol7-SIG-039.pdf Deep Learning: Methods and Applications by Li Deng and Dong Yu

ReadingGroup

Contents