ReadingGroup
Contents
 1 Getting involved
 2 Proposed Schedule
 3 Details
 3.1 Foley & Maitlin Chapter 6  Distance & Size Perception
 3.2 Saxena, Min & Ng: Make3D
 3.3 Michels, Saxena & Ng: High speed obstacle avoidance
 3.4 Karsch, Liu & Kang: Depth Transfer
 3.5 LeCun, Bengio & Hinton: Deep Learning Review
 3.6 Rumelhart, Hinton & Williams: Backpropagation
 3.7 LeCun, Bottou, Bengio & Haffner: CNNs
 3.8 Krizhevsky, Sutskever & Hinton: ImageNet/AlexNet
 3.9 Simonyan & Zisserman: VGG16
 3.10 Eigen, Puhrsch & Fergus: Depth map prediction
 3.11 Luo et al.: Deep Learning + Stereo
 3.12 Shelhamer, Long & Darrell: Fully Convolutional Segmentation
 3.13 Giuisti et al.: Forest trails CNN
 3.14 Loffe & Szegedy: Batch Normalization
 3.15 He, Zhang, Ren & Sun: ResNets
 3.16 Cao, Wu & Shen: Fully convolutional depth 1
 3.17 Laina et al.: Fully convolutional depth 2
 3.18 Li, Klein & Yao: Fully convolutional depth 3
 3.19 Güler et al. DenseReg
 3.20 Godard et al. Unsupervised/train from stereo
 3.21 Zhou et al.: SfMLearner
 3.22 Fangchang & Karaman: depth from a single image & SLAM
 3.23 Pizzoli et al.: REMODE
 3.24 Yan et al.: Superpixel CNN/CRF
 3.25 Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser
 3.26 Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images
 3.27 Goodfellow et al.: Generative Adversarial Nets
 3.28 Oord et al.: PixelRNN & PixelCNN
 3.29 Isola et al. Pix2Pix
 3.30 Pagnutti et al. RGBD semantic segmentation with CNN + surface fitting
 3.31 Yang et al.: Full 3D reconstruction from single depth view
 3.32 Girshick, Donahue, Darrell & Malik: RCNN
 3.33 Redmon et al.: YOLO
 3.34 Redmon & Farhadi: YOLO9000
 3.35 Khoreva et al.: Dense tracking/data augmentation
 3.36 Li et al. Fully Convolutional Instanceaware Semantic Segmentation
 3.37 Kim et al. Solving CRF with CNN (depth image)
 3.38 Liu et al. Attribute Grammar Scene Reconstruction
 3.39 Tatarchenko et al. Multiview 3D models
 3.40 Häne et al. Singleview voxel reconstruction
 3.41 Garg et al. Geometry+CNN unsupervised
 3.42 Xie et al. Deep3D
 3.43 Liu et al. Convolutional Neural Field CRFs
 3.44 Hong et al.: Semantic segmentation for robot behaviour
 3.45 Hoeim et al. Photo popup
 3.46 Roy & Todorovic: Neural Regression Forest
 3.47 Heitz et al. Cascaded Classification Models
 3.48 Li et al. Feedbackenabled Cascaded Classification Models
 3.49 Li et al. Depth & Normals  CRF/regression
 3.50 Han et al. Bayesian objectlevel reconstruction
 3.51 Liu et al. Depth from semantics
 3.52 Wu et al. Repetitive scene structure
 3.53 He et al. Haze removal
 3.54 Hassner et al. Examplebased Depth
 3.55 Wu et al. Repetitionbased Depth
 4 Additional Resources
Getting involved
There is now a mailing list for this reading group. Send me (Damien) an email to get on it. No problem.
Note: this reading group is about deep learning as applied to depth estimation from a single image  one of the super hot topics. If your interest is deep learning in general, you may find some of the readings a little bit offtopic. So let me know if you want some idea about which you should read for.
Proposed Schedule
The below schedule is only proposed, and subject to change.
 14 Sept 8.30am:
Foley & Maitlin Chapter 6: Distance & Size Perception Location: EEBF 4302
 21 Sept 8.30am:
Saxena, Min & Ng: Make3D Location: EEBF 4302
 28 Sept 8.30am:
Michels, Saxena & Ng: High speed obstacle avoidance Location: EEBF 4302
 05 Octr 8.30am:
Karsch, Liu & Kang: Depth Transfer Location: EEBF 4302
 12 Octr 8.30am:
LeCun, Bengio & Hinton: Deep Learning Review Location: EEBF 4302
 19 Octr 8.30am:
Rumelhart, Hinton & Williams: Backpropagation Location: EEBF 4302
 26 Octr 8.30am:
LeCun, Bottou, Bengio & Haffner: CNNs Location: EEBF 4302
 02 Novr 8:30am:
LeCun, Bottou, Bengio & Haffner: CNNs Location: EEBF 4302
 09 Novr 8.30am:
Simonyan & Zisserman: VGG16 Location: EEBF 4302
 16 Novr 8.30am:
Eigen, Puhrsch & Fergus: Depth map prediction Location: EEBF 4302
 23 Novr 8.30am:
Luo et al.: Deep Learning + Stereo Location: EEBF 4302
 30 Novr 8.30am:
Shelhamer, Long & Darrell: Fully Convolutional Segmentation Location: EEBF 4302
 07 Decr 8.30am:
Giuisti et al.: Forest trails CNN Location: EEBF 4302
 14 Decr 8.30am:
Batch Normalization Location: EEBF 4302
 22 Decr 8:30am:
He, Zhang, Ren & Sun: ResNet Location: EEBF 4302
 Week of 25 Decr: Break
 Week of 01 Janr: Break
 Week of 08 Janr: Break
 Week of 15 Janr: Break
 Week of 22 Janr: Break
 Week of 29 Janr: Break

Monday 05 Febr 3pm: Cao, Wu & Shen: Fully convolutional depth 1 Location: EEBF 4302

Monday 12 Febr 3:30pm: Laina et al.: Fully convolutional depth 2 Location: EEBF 4302

Monday 19 Febr 3:30pm: Li, Klein & Yao: Fully convolutional depth 3 Location: EEBF 4302

Monday 26 Febr 3:30pm: Güler et al.: DenseReg Location: EEBF 4302

Monday 05 Marc 3:30pm: Godard et al.: Unsupervised/train from stereo Location: EEBF 4302
 Monday 12 Marc 3:30pm: Zhou et al.: SfMLearner
 Location: EEBF 4302
 Monday 19 Marc 3:30pm: Fangchang & Karaman: depth from a single image & SLAM
 Location: EEBF 4302
 Monday 26 Marc 3:30pm: Pizzoli et al.: REMODE
 Location: EEBF 4302
 Monday 02 Aprl 3:30pm: Yan et al.: Superpixel CNN/CRF
 Location: EEBF 4302
 Monday 09 Aprl 3:30pm: Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser
 Location: EEBF 4302
 Monday 16 Aprl 3:30pm: Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images
 Location: EEBF 4302
 Monday 23 Aprl 3:30pm: Goodfellow et al.: Generative Adversarial Nets
 Location: EEBF 4302
 Monday 30 Aprl 3:30pm: Oord et al.: PixelRNN and PixelCNN
 Location: EEBF 4302
 Monday 07 May 3:30pm: Isola et al.: Pix2Pix
 Location: EEBF 4302
 Monday 14 May 3:30pm: Pagnutti et al.: RGBD semantic segmentation with CNN + surface fitting
 Location: EEBF 4302
 Monday 21 May 3:30pm: Yang et al.: Full 3D reconstruction from single depth view
 Monday 28 May 3:30pm: Girshick, Donahue, Darrell & Malik: RCNN
 Monday 05 June 3:30pm: Redmon et al.: YOLO and YOLO9000
 Monday 12 June 3:30pm: Khoreva et al.: Dense tracking/data augmentation
 ???: Li et al.: Fully Convolutional Instanceaware Semantic Segmentation
 ???: Kim et al.: Solving CRF with CNN (depth image)
 ???: Liu et al.: Attribute Grammar Scene Reconstruction
 ???: Tatarchenko et al.: Multiview 3D models
 ???: Häne et al.: Singleview voxel reconstruction
 ???: Garg et al.: Geometry+CNN unsupervised
 ???: Xie et al.: Deep3D
 ???: Liu et al.: Convolutional Neural Field CRFs
 ???: Hong et al.: Semantic segmentation for robot behaviour
 ???: Roy & Todorovic: Neural Regression Forest
 ???: Hoeim et al.: Photo popup
 ???: Heitz et al.: Cascaded Classification Models
 ???: Li et al.: Feedbackenabled Cascaded Classification Models
 ???: Li et al.: Depth & Normals  CRF/regression
 ???: Han et al.: Bayesian objectlevel reconstruction
 ???: Liu et al.: Depth from semantics
 ???: Wu et al.: Repetitive scene structure
 ???: He et al.: Haze removal
 ???: Hassner et al.: Examplebased Depth
 ???: Wu et al.: Repetitionbased Depth
Details
Foley & Maitlin Chapter 6  Distance & Size Perception
Because our project is about using machine learning to extract depth from a single image (with deep learning, then applying it to robot problems) it pays to learn a bit about how humans do it...
https://books.google.com.tr/books?id=jLBmCgAAQBAJ&printsec=frontcover
Go to Chapter 6.
If that doesn't work (some have reported finding it difficult to access Chapter 6), try the following link: http://tinyurl.com/yalnnwp9  some have reported being able to access the chapter by doing a google search for content.
Another thing to try that has worked for some is to log out of any google/gmail account before trying to access.
If nothing else works, email me.
Saxena, Min & Ng: Make3D
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4531745
This is the classic paper that brought machine learning to the problem of depth from a single image, quite successfully, considering previous attempts. It uses Markov Random Fields, which are a bit advanced but, importantly, quite slow.
Note: because our library has a subscription to IEEE Xplore, you can access the above link from oncampus or via offcampus library access or via VPN.
But, here is an alternative link: http://www.cs.cornell.edu/~asaxena/reconstruction3d/saxena_make3d_learning3dstructure.pdf
There are some videos and things available here: http://make3d.cs.cornell.edu/  there used to be a live online demo but they've closed that. There is also a list of results on the Make3D dataset up till about 2012: http://make3d.cs.cornell.edu/results_stateoftheart.html
After that other datasets started being used also.
Superpixels are used in the study. Here is a quick intro to them: http://ttic.uchicago.edu/~xren/research/superpixel/
MRFs are more difficult and if anybody has seen a good tutorial for them let me know so that I can link to it here. The best I could find is https://mitpress.mit.edu/sites/default/files/titles/content/9780262015776_sch_0001.pdf but it is still a bit difficult. We will probably end up discussing what MRFs are a lot on Thursday.
Michels, Saxena & Ng: High speed obstacle avoidance
http://dl.acm.org/citation.cfm?id=1102426
Here the same authors focus on a related problem, that of determining open spaces for guiding a vehicle, again using machine learning techniques.
This version of the paper might be of higher quality (thanks to Hossein for finding):
http://ai.stanford.edu/~asaxena/rccar/ICML_ObstacleAvoidance.pdf
Karsch, Liu & Kang: Depth Transfer
Here is the target paper: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5551153
For those who are not on campus, a temporary link: http://web.itu.edu.tr/djduff/Share/KarschEtAl2014.pdf
This is a nonparametric approach to depth from a single image. They search a database of images similar to the observed one then aligns the found image with the observed one then warps the found image retrieved from the database to estimate the depth of the current image. It depends on an approach called SIFTFlow to do the alignment.
Here is a paper describing "SIFTFlow" (if you have the time to go deeper): http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6787109
Free version: http://people.csail.mit.edu/celiu/SIFTflow/
Or a shorter conference version available offcampus: http://people.csail.mit.edu/celiu/ECCV2008/
LeCun, Bengio & Hinton: Deep Learning Review
http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html?foxtrotcallback=true
Alternative links: http://pages.cs.wisc.edu/~dyer/cs540/handouts/deeplearningnature2015.pdf https://www.researchgate.net/publication/277411157_Deep_Learning
A whirlwind compressed intro to deep learning and its parts.
For a more gentle introduction to deep learning: http://cs231n.stanford.edu/
Or you can find lots of gentle short intros: https://www.google.com.tr/search?q=intro+to+deep+learning
Rumelhart, Hinton & Williams: Backpropagation
An early paper introducing backpropagation, the main way we train neural networks nowadays: http://www.nature.com/articles/323533a0
Alternative link: http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf
These topics are also addressed in the tutorials shared just above (or you will find plenty online and most neural network tutorials attempt to explain backpropagation as it is the main way these networks are trained  I usually use a simple genetic algorithm when explaining how to train neural networks because it's simpler  an accelerated tutorial of neural networks without explaining backpropagation but explaining one of the tools is at http://files.djduff.net/nn.zip ).
LeCun, Bottou, Bengio & Haffner: CNNs
This is the now classic paper describing LeNet architectures applying Convolutional Neural Networks (CNNs) to the problem of optical character recognition. It also embeds the neural network in an architecture for automatically segmenting text, including a system for automatically reading cheques.
http://ieeexplore.ieee.org/abstract/document/726791/
Alternative link: http://www.dengfanxin.cn/wpcontent/uploads/2016/03/1998Lecun.pdf
Convolutional neural networks were introduced by the authors in 1990. It may be instructive to read that considerably simpler paper: http://yann.lecun.com/exdb/publis/pdf/lecun90c.pdf
Or any tutorial about CNNs. One good place to follow is: http://cs231n.stanford.edu/syllabus.html
Krizhevsky, Sutskever & Hinton: ImageNet/AlexNet
We will not discuss this in the reading group.
Here is when convolutional neural networks and deep learning really showed what it could do  the problem of image recognition.
But we won't use this because a lot of the complexity it introduces turns out not to be necessary. Later methods are "cleaner". So we have taken it out of the reading list.
Simonyan & Zisserman: VGG16
http://arxiv.org/abs/1409.1556
A relatively recent "deep" deep net with 16 layers for image recognition. Note: successful recent networks have one thousand layers.
Feel free to take a look at AlexNet above to get an idea of the space of approaches.
Eigen, Puhrsch & Fergus: Depth map prediction
https://www.cs.nyu.edu/~deigen/depth/
Finally, we apply deep neural convolutional networks to the problem that we are interested in.
Luo et al.: Deep Learning + Stereo
Combining deep learning and stereo.
https://www.cs.toronto.edu/~urtasun/publications/luo_etal_cvpr16.pdf
Shelhamer, Long & Darrell: Fully Convolutional Segmentation
http://arxiv.org/abs/1605.06211
Here a related problem is solved, that of semantic segmentation, but this approach is applicable to our problem.
Giuisti et al.: Forest trails CNN
http://ieeexplore.ieee.org/document/7358076/
Alternative link: http://rpg.ifi.uzh.ch/docs/RAL16_Giusti.pdf
See also youtube: https://www.youtube.com/watch?v=umRdt3zGgpU
Here we have a CNNbased update to the learntonavigatefromimages problem addressed by Saxena et al. above.
Loffe & Szegedy: Batch Normalization
https://arxiv.org/abs/1502.03167
A recent technique that has enabled powerful new methods and ultimately much deeper neural networks. Important stuff.
Apparently the following video from the Stanford CS231N course contains details of Batch Normalization (37:00 to 59:30)
https://youtu.be/gYpoJMlgyXA?list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC
Best, Damien
He, Zhang, Ren & Sun: ResNets
https://arxiv.org/abs/1512.03385
This work and variations on it have been the basis of the 1000 layer recent neural networks. Important stuff.
The diagram at the top of Page 2 of this paper (which offers an improvement on the ResNet of the above paper) is quite useful in understanding the structure of a residual unit:
http://arxiv.org/abs/1603.05027
Cao, Wu & Shen: Fully convolutional depth 1
http://arxiv.org/abs/1605.02305
Here we start a series of recent papers that take different approaches using deep nets to depth from a single image.
Let me know if you can find some media for this work (I could only find the paper itself).
Laina et al.: Fully convolutional depth 2
http://arxiv.org/abs/1606.00373
Here we continue a series of recent papers that take different approaches using deep nets to depth from a single image.
Li, Klein & Yao: Fully convolutional depth 3
http://arxiv.org/abs/1607.00730
Here we finalise a series of recent papers that take different approaches using deep nets to depth from a single image.
Güler et al. DenseReg
https://arxiv.org/abs/1612.01202
A key idea here is how to do regression using categorical prediction & an application of the Fully Convolutional Networks to regression problems.
Godard et al. Unsupervised/train from stereo
https://arxiv.org/abs/1609.03677
http://visual.cs.ucl.ac.uk/pubs/monoDepth/
https://github.com/mrharicot/monodepth
Zhou et al.: SfMLearner
Super cool stuff.
https://people.eecs.berkeley.edu/%7Etinghuiz/projects/SfMLearner/
https://arxiv.org/abs/1704.07813
A blog entry explaining the main ideas: http://bair.berkeley.edu/blog/2017/07/11/confluenceofgeometryandlearning/
The code: https://github.com/tinghuiz/SfMLearner
Fangchang & Karaman: depth from a single image & SLAM
https://arxiv.org/pdf/1709.07492.pdf
https://github.com/fangchangma/sparsetodense.git
Pizzoli et al.: REMODE
In case you miss it: this is not a singleimage method... but close to it. It is another structure from motion method. But the results are rather good (state of the art in 2014).
[1]M. Pizzoli, C. Forster, and D. Scaramuzza, “REMODE: Probabilistic, monocular dense reconstruction in real time,” in Robotics and Automation (ICRA), 2014 IEEE International Conference on, 2014, pp. 2609–2616.
http://rpg.ifi.uzh.ch/docs/ICRA14_Pizzoli.pdf
https://www.youtube.com/watch?v=QTKd5UWCG0Q
Yan et al.: Superpixel CNN/CRF
http://ieeexplore.ieee.org/document/8105853/
Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser
https://arxiv.org/abs/1611.02174
Here we see an interesting depthfromsingleimage sensor fusion with robotics applications.
Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images
https://arxiv.org/abs/1411.5928
A nonadversarial approach to generating images.
Goodfellow et al.: Generative Adversarial Nets
https://papers.nips.cc/paper/5423generativeadversarialnets
Another important recent development that we may make use of.
Oord et al.: PixelRNN & PixelCNN
https://arxiv.org/abs/1601.06759
Producing distributions over images. We have always intended to do something like this for depth images.
http://arxiv.org/abs/1606.05328
Isola et al. Pix2Pix
https://arxiv.org/abs/1611.07004
We can use this too. And it's cool.
Pagnutti et al. RGBD semantic segmentation with CNN + surface fitting
http://ieeexplore.ieee.org/document/8120042/
https://pdfs.semanticscholar.org/7716/9ee225157e77d1632e3bed54c70235b4abf0.pdf
Yang et al.: Full 3D reconstruction from single depth view
https://arxiv.org/abs/1708.07969
Girshick, Donahue, Darrell & Malik: RCNN
https://arxiv.org/abs/1311.2524
We take a slight segue to check out how tracking has been done recently with neural networks. Note that FasterRCNN and more recent alternatives use similar principles but do it faster.
Redmon et al.: YOLO
https://arxiv.org/abs/1506.02640
Redmon & Farhadi: YOLO9000
https://arxiv.org/abs/1612.08242
Khoreva et al.: Dense tracking/data augmentation
An attempt on the DAVIS dataset. The dataset is here: http://davischallenge.org/
This is interesting because of the data augmentation approach used.
The paper: http://arxiv.org/abs/1703.09554
Video: https://www.youtube.com/watch?v=QrsR5wHR14
Li et al. Fully Convolutional Instanceaware Semantic Segmentation
https://arxiv.org/abs/1611.07709
Kim et al. Solving CRF with CNN (depth image)
http://arxiv.org/abs/1603.06359
Liu et al. Attribute Grammar Scene Reconstruction
http://ieeexplore.ieee.org/document/7889053/?source=tocalert&dld=Z21haWwuY29t
Tatarchenko et al. Multiview 3D models
https://arxiv.org/abs/1511.06702
Not just inferring the depth image but also other views of it (related to the SfMLearner paper).
Häne et al. Singleview voxel reconstruction
Blog summary: http://bair.berkeley.edu/blog/2017/08/23/highquality3dobjreconstruction/
Video intro: https://www.youtube.com/watch?v=BjwhMDhbqAs
Full paper: https://arxiv.org/abs/1704.00710
Garg et al. Geometry+CNN unsupervised
We will have already read Godard et al. and Zhou et al. but this is for completeness.
http://arxiv.org/abs/1603.04992
Xie et al. Deep3D
We will have already read Godard et al. and Zhou et al. but this is for completeness.
https://arxiv.org/pdf/1604.03650
Liu et al. Convolutional Neural Field CRFs
https://arxiv.org/abs/1411.6387
Hong et al.: Semantic segmentation for robot behaviour
https://arxiv.org/abs/1802.00285
Hoeim et al. Photo popup
A classic.
http://repository.cmu.edu/cgi/viewcontent.cgi?article=1288&context=robotics
Roy & Todorovic: Neural Regression Forest
https://www.cvfoundation.org/openaccess/content_cvpr_2016/app/S2311.pdf
Heitz et al. Cascaded Classification Models
Older preCNN machine learning papers for depth estimation from a single image.
Li et al. Feedbackenabled Cascaded Classification Models
Older preCNN machine learning papers for depth estimation from a single image.
https://arxiv.org/abs/1110.5102
Li et al. Depth & Normals  CRF/regression
https://www.cvfoundation.org/openaccess/content_cvpr_2015/app/1B_001.pdf
Han et al. Bayesian objectlevel reconstruction
http://escholarship.org/uc/item/9tk6935x.pdf
Liu et al. Depth from semantics
Older preCNN machine learning papers for depth estimation from a single image.
http://ai.stanford.edu/people/koller/Papers/Liu+al:CVPR10.pdf
Wu et al. Repetitive scene structure
Older preCNN machine learning papers for depth estimation from a single image.
http://www.academia.edu/download/30713855/WuCVPR11.pdf
He et al. Haze removal
Might be interesting because of use of singleimage cues.
http://mmlab.ie.cuhk.edu.hk/2009/dehaze_cvpr2009.pdf
Hassner et al. Examplebased Depth
Seems like an older version of the SIFTFlow based one of Karsch.
Wu et al. Repetitionbased Depth
"Repetitionbased dense singleview reconstruction"
http://www.academia.edu/download/30713855/WuCVPR11.pdf
Additional Resources
EBook: https://www.microsoft.com/enus/research/wpcontent/uploads/2016/02/DeepLearningNowPublishingVol7SIG039.pdf Deep Learning: Methods and Applications by Li Deng and Dong Yu
Online course with slides videos and assignments: http://cs231n.stanford.edu/ CS231n: Convolutional Neural Networks for Visual Recognition (Stanford)
My NN/Keras bootcamp slides: http://files.djduff.net/nn.zip
Foley & Maitlin's book: https://books.google.com.tr/books?id=jLBmCgAAQBAJ&printsec=frontcover