ReadingGroup
Contents
- 1 Getting involved
- 2 Proposed Schedule
- 3 Details
- 3.1 Foley & Maitlin Chapter 6 - Distance & Size Perception
- 3.2 Saxena, Min & Ng: Make3D
- 3.3 Michels, Saxena & Ng: High speed obstacle avoidance
- 3.4 Karsch, Liu & Kang: Depth Transfer
- 3.5 LeCun, Bengio & Hinton: Deep Learning Review
- 3.6 Rumelhart, Hinton & Williams: Backpropagation
- 3.7 LeCun, Bottou, Bengio & Haffner: CNNs
- 3.8 Krizhevsky, Sutskever & Hinton: ImageNet/AlexNet
- 3.9 Simonyan & Zisserman: VGG-16
- 3.10 Eigen, Puhrsch & Fergus: Depth map prediction
- 3.11 Shelhamer, Long & Darrell: Fully Convolutional Segmentation
- 3.12 Loffe & Szegedy: Batch Normalization
- 3.13 He, Zhang, Ren & Sun: ResNets
- 3.14 Girshick, Donahue, Darrell & Malik: R-CNN
- 3.15 Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser
- 3.16 Giuisti et al.: Forest trails CNN
- 3.17 Cao, Wu & Shen: Fully convolutional depth 1
- 3.18 Laina et al.: Fully convolutional depth 2
- 3.19 Li, Klein & Yao: Fully convolutional depth 3
- 3.20 Luo et al.: Deep Learning + Stereo
- 3.21 Goodfellow et al.: Generative Adversarial Nets
- 3.22 Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images
- 3.23 Oord et al.: Pixel-RNN & Pixel-CNN
- 3.24 Isola et al. Pix2Pix
- 4 List of interested people
- 5 Additional Resources
Getting involved
There is now a mailing list for this reading group. Send me (Damien) an email to get on it. No problem.
Note: this reading group is about deep learning as applied to depth estimation from a single image - one of the super hot topics. If your interest is deep learning in general, you may find some of the readings a little bit off-topic. So let me know if you want some idea about which you should read for.
Proposed Schedule
The below schedule is only proposed, and subject to change.
- 14 Sept 8.30am: Foley & Maitlin Chapter 6: Distance & Size Perception
- Location: EEBF 4302
- 21 Sept 8.30am: Saxena, Min & Ng: Make3D
- Location: EEBF 4302
- 28 Sept 8.30am: Michels, Saxena & Ng: High speed obstacle avoidance
- Location: EEBF 4302
- 05 Octr 8.30am: Karsch, Liu & Kang: Depth Transfer
- Location: EEBF 4302
- 12 Octr 8.30am: LeCun, Bengio & Hinton: Deep Learning Review
- Location: EEBF 4302
- 19 Octr 8.30am: LeCun, Bottou, Bengio & Haffner: CNNs
- Location: EEBF 4302
- 26 Octr 8.30am: Rumelhart, Hinton & Williams: Backpropagation
- Location: EEBF 4302
- 02 Novr 8:30am: Simonyan & Zisserman: VGG-16
- Location: EEBF 4302
- 09 Novr 8.30am: Eigen, Puhrsch & Fergus: Depth map prediction
- Location: EEBF 4302
- 16 Novr 8.30am: Shelhamer, Long & Darrell: Fully Convolutional Segmentation
- Location: EEBF 4302
- 23 Novr 8.30am: He, Zhang, Ren & Sun: ResNet
- Location: EEBF 4302
- 30 Novr 8.30am: Girshick, Donahue, Darrell & Malik: R-CNN
- Location: EEBF 4302
- 07 Decr 8.30am: Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser
- Location: EEBF 4302
- 14 Decr 8.30am: Giuisti et al.: Forest trails CNN
- Location: EEBF 4302
- 22 Decr 8:30am: Cao, Wu & Shen: Fully convolutional depth 1
- Location: EEBF 4302
- Week of 25 Decr: Laina et al.: Fully convolutional depth 2
- Week of 01 Janr: Break
- Week of 08 Janr: Break
- Week of 15 Jany: Li, Klein & Yao: Fully convolutional depth 3
- Week of 22 Jany: Luo et al.: Deep Learning + Stereo
- Week of 29 Jany: Goodfellow et al.: Generative Adversarial Nets
- Week of 06 Febr: Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images
- Week of 13 Febr: Oord et al.: Pixel-RNN and Pixel-CNN
- Week of 20 Febr: Isola et al. Pix2Pix
Details
Foley & Maitlin Chapter 6 - Distance & Size Perception
Because our project is about using machine learning to extract depth from a single image (with deep learning, then applying it to robot problems) it pays to learn a bit about how humans do it...
https://books.google.com.tr/books?id=jLBmCgAAQBAJ&printsec=frontcover
Go to Chapter 6.
If that doesn't work (some have reported finding it difficult to access Chapter 6), try the following link: http://tinyurl.com/yalnnwp9 - some have reported being able to access the chapter by doing a google search for content.
Another thing to try that has worked for some is to log out of any google/gmail account before trying to access.
If nothing else works, email me.
Saxena, Min & Ng: Make3D
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4531745
This is the classic paper that brought machine learning to the problem of depth from a single image, quite successfully, considering previous attempts. It uses Markov Random Fields, which are a bit advanced but, importantly, quite slow.
Note: because our library has a subscription to IEEE Xplore, you can access the above link from on-campus or via off-campus library access or via VPN.
But, here is an alternative link: http://www.cs.cornell.edu/~asaxena/reconstruction3d/saxena_make3d_learning3dstructure.pdf
There are some videos and things available here: http://make3d.cs.cornell.edu/ -- there used to be a live online demo but they've closed that. There is also a list of results on the Make3D dataset up till about 2012: http://make3d.cs.cornell.edu/results_stateoftheart.html
After that other datasets started being used also.
Superpixels are used in the study. Here is a quick intro to them: http://ttic.uchicago.edu/~xren/research/superpixel/
MRFs are more difficult and if anybody has seen a good tutorial for them let me know so that I can link to it here. The best I could find is https://mitpress.mit.edu/sites/default/files/titles/content/9780262015776_sch_0001.pdf but it is still a bit difficult. We will probably end up discussing what MRFs are a lot on Thursday.
Michels, Saxena & Ng: High speed obstacle avoidance
http://dl.acm.org/citation.cfm?id=1102426
Here the same authors focus on a related problem, that of determining open spaces for guiding a vehicle, again using machine learning techniques.
This version of the paper might be of higher quality (thanks to Hossein for finding):
http://ai.stanford.edu/~asaxena/rccar/ICML_ObstacleAvoidance.pdf
Karsch, Liu & Kang: Depth Transfer
Here is the target paper: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5551153
For those who are not on campus, a temporary link: http://web.itu.edu.tr/djduff/Share/KarschEtAl2014.pdf
This is a nonparametric approach to depth from a single image. They search a database of images similar to the observed one then warp the image retrieved from the database to estimate the depth of the current image.
Here is a paper describing "SIFTFlow" on which the above paper depends (if you have the time to go deeper): http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6787109
Or a shorter conference version available off-campus: http://people.csail.mit.edu/celiu/ECCV2008/
LeCun, Bengio & Hinton: Deep Learning Review
http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html?foxtrotcallback=true
Alternative link: http://pages.cs.wisc.edu/~dyer/cs540/handouts/deep-learning-nature2015.pdf
A whirlwind compressed intro to deep learning and its parts.
For a more gentle introduction to deep learning: http://cs231n.stanford.edu/
Or you can find lots of gentle short intros: https://www.google.com.tr/search?q=intro+to+deep+learning
Rumelhart, Hinton & Williams: Backpropagation
An early paper introducing backpropagation, the main way we train neural networks nowadays: http://www.nature.com/articles/323533a0
Alternative link: http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf
LeCun, Bottou, Bengio & Haffner: CNNs
http://ieeexplore.ieee.org/abstract/document/726791/
Alternative link: http://www.dengfanxin.cn/wp-content/uploads/2016/03/1998Lecun.pdf
Here is the classic paper applying convolutional neural networks to image processing.
Krizhevsky, Sutskever & Hinton: ImageNet/AlexNet
Here is when convolutional neural networks and deep learning really showed what it could do - the problem of image recognition.
But we won't use this because a lot of the complexity it introduces turns out not to be necessary. Later methods are "cleaner". So we have taken it out of the reading list.
Simonyan & Zisserman: VGG-16
http://arxiv.org/abs/1409.1556
A relatively recent "deep" deep net with 16 layers for image recognition. Note: successful recent networks have one thousand layers.
Eigen, Puhrsch & Fergus: Depth map prediction
https://www.cs.nyu.edu/~deigen/depth/
Finally, we apply deep neural convolutional networks to the problem that we are interested in.
Shelhamer, Long & Darrell: Fully Convolutional Segmentation
http://arxiv.org/abs/1605.06211
Here a related problem is solved, that of semantic segmentation, but this approach is applicable to our problem.
Loffe & Szegedy: Batch Normalization
https://arxiv.org/abs/1502.03167
A recent technique that has enabled powerful new methods and ultimately much deeper neural networks. Important stuff.
He, Zhang, Ren & Sun: ResNets
https://arxiv.org/abs/1512.03385
This work and variations on it have been the basis of the 1000 layer recent neural networks. Important stuff.
Girshick, Donahue, Darrell & Malik: R-CNN
https://arxiv.org/abs/1311.2524
We take a slight seque to check out how tracking has been done recently with neural networks. Note that Faster-RCNN and more recent alternatives use similar principles but do it faster.
Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser
https://arxiv.org/abs/1611.02174
Here we see an interesting depth-from-single-image sensor fusion with robotics applications.
Giuisti et al.: Forest trails CNN
http://ieeexplore.ieee.org/document/7358076/
Alternative link: http://rpg.ifi.uzh.ch/docs/RAL16_Giusti.pdf
See also youtube: https://www.youtube.com/watch?v=umRdt3zGgpU
Here we have a CNN-based update to the learn-to-navigate-from-images problem addressed by Saxena et al. above.
Cao, Wu & Shen: Fully convolutional depth 1
http://arxiv.org/abs/1605.02305
Here we start a series of recent papers that take different approaches using deep nets to depth from a single image.
Laina et al.: Fully convolutional depth 2
http://arxiv.org/abs/1606.00373
Here we continue a series of recent papers that take different approaches using deep nets to depth from a single image.
Li, Klein & Yao: Fully convolutional depth 3
http://arxiv.org/abs/1607.00730
Here we finalise a series of recent papers that take different approaches using deep nets to depth from a single image.
Luo et al.: Deep Learning + Stereo
Combining deep learning and stereo.
https://www.cs.toronto.edu/~urtasun/publications/luo_etal_cvpr16.pdf
Goodfellow et al.: Generative Adversarial Nets
https://papers.nips.cc/paper/5423-generative-adversarial-nets
Another important recent development that we may make use of.
Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images
https://arxiv.org/abs/1411.5928
A non-adversarial approach to the same problem.
Oord et al.: Pixel-RNN & Pixel-CNN
https://arxiv.org/abs/1601.06759
Producing distributions over images. We have always intended to do something like this for depth images.
http://arxiv.org/abs/1606.05328
Isola et al. Pix2Pix
https://arxiv.org/abs/1611.07004
We can use this too. And it's cool.
List of interested people
(who I will contact with information about the schedule etc.)
- Abdulmajeed M. K.
- Alican M.
- Anas M.
- K. Bulut Ö.
- Imaduddin A. M.
- Tolga C.
- Bilge A.
- Hatice K.
- Hossein P.
- Torkan G.
- Buse Sibel K.
- Oğuzhan C.
- M. Alperen Ö.
- Hatice K.
- Özgür Ö.
- Doğay K.
- Onur A.
- Alper K.
- Furkan A.
- Elena B. S.
- Müjde A.
- Emeç E.
Additional Resources
EBook: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/DeepLearning-NowPublishing-Vol7-SIG-039.pdf Deep Learning: Methods and Applications by Li Deng and Dong Yu