Difference between revisions of "ReadingGroup"

From Deep Depth 116E167 Project Documentation
Jump to: navigation, search
m
m
 
(126 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= Getting involved =
 
= Getting involved =
  
'''We are trying to find a suitable time each week to hold this. Please enter your details in the following doodle poll and register your interest by sending me an email on [mailto:djduff@itu.edu.tr djduff@itu.edu.tr].'''
+
There is now a mailing list for this reading group. Send me (Damien) an email to get on it. No problem.
  
https://doodle.com/poll/c4k8mpxaxggs7tty
+
'''Note''': this reading group is about deep learning as applied to ''depth estimation from a single image'' - one of the super hot topics. If your interest is deep learning in general, you may find some of the readings a little bit off-topic. So let me know if you want some idea about which you should read for.
  
Decision about a time slot during the week will be made as soon as possible after the course registration period.
+
= Proposed Schedule =
  
'''Note''': this reading group is about deep learning as applied to ''depth estimation from a single image'' - one of the super hot topics. If your interest is deep learning in general, you may find some of the readings a little bit off-topic. So let me know if you want some idea about which you should read for.
+
**Warning: This is now out of date. We are arranging via the email list now.**
  
= Proposed Schedule =
+
<div style="opacity:0.2">
  
 
The below schedule is only '''proposed''', and subject to change.
 
The below schedule is only '''proposed''', and subject to change.
  
* Week of 11 Sept: Foley & Maitlin Chapter 6: Distance & Size Perception
+
* 14 Sept 8.30am: <s>Foley & Maitlin Chapter 6: Distance & Size Perception</s>
* Week of 18 Sept: Saxena, Min & Ng: Make3D
+
** Location: EEBF 4302
* Week of 25 Sept: Michels, Saxena & Ng: High speed obstacle avoidance
+
* 21 Sept 8.30am: <s>Saxena, Min & Ng: Make3D</s>
* Week of 02 Octr: Karsch, Liu & Kang: Depth Transfer
+
** Location: EEBF 4302
* Week of 09 Octr: LeCun, Bottou, Bengio & Haffner: CNNs
+
* 28 Sept 8.30am: <s>Michels, Saxena & Ng: High speed obstacle avoidance</s>
* Week of 16 Octr: Krizhevsky, Sutskever & Hinton: ImageNet/AlexNet
+
** Location: EEBF 4302
* Week of 23 Octr: Simonyan & Zisserman: VGG-16
+
* 05 Octr 8.30am: <s>Karsch, Liu & Kang: Depth Transfer</s>
* Week of 30 Octr: Break
+
** Location: EEBF 4302
* Week of 06 Novr: Eigen, Puhrsch & Fergus: Depth map prediction
+
* 12 Octr 8.30am: <s>LeCun, Bengio & Hinton: Deep Learning Review</s>
* Week of 13 Novr: Shelhamer, Long & Darrell: Fully Convolutional Segmentation
+
** Location: EEBF 4302
* Week of 20 Novr: He, Zhang, Ren & Sun: ResNet
+
* 19 Octr 8.30am: <s>Rumelhart, Hinton & Williams: Backpropagation</s>
* Week of 27 Novr: Girshick, Donahue, Darrell & Malik: R-CNN
+
** Location: EEBF 4302
* Week of 04 Decr: Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser
+
* 26 Octr 8.30am: <s>LeCun, Bottou, Bengio & Haffner: CNNs</s>
* Week of 11 Decr: Giuisti et al.: Forest trails CNN
+
** Location: EEBF 4302
* Week of 18 Decr: Break
+
* 02 Novr 8:30am: <s>LeCun, Bottou, Bengio & Haffner: CNNs</s>
* Week of 25 Decr: Cao, Wu & Shen: Fully convolutional depth 1
+
** Location: EEBF 4302
* Week of 01 Jany: Laina et al.: Fully convolutional depth 2
+
* 09 Novr 8.30am: <s>Simonyan & Zisserman: VGG-16</s>
* Week of 08 Jany: Li, Klein & Yao: Fully convolutional depth 3
+
** Location: EEBF 4302
* Week of 15 Jany: Luo et al.: Deep Learning + Stereo
+
* 16 Novr 8.30am: <s>Eigen, Puhrsch & Fergus: Depth map prediction</s>
* Week of 22 Jany: Break
+
** Location: EEBF 4302
* Week of 29 Jany: Break
+
* 23 Novr 8.30am: <s>Luo et al.: Deep Learning + Stereo</s>
* Week of 06 Febr: Goodfellow et al.: Generative Adversarial Nets
+
** Location: EEBF 4302
* Week of 13 Febr: Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images
+
* 30 Novr 8.30am: <s>Shelhamer, Long & Darrell: Fully Convolutional Segmentation</s>
* Week of 20 Febr: Oord et al.: Pixel-RNN and Pixel-CNN
+
** Location: EEBF 4302
* Week of 27 Febr: Isola et al. Pix2Pix
+
* 07 Decr 8.30am: <s>Giuisti et al.: Forest trails CNN</s>
 +
** Location: EEBF 4302
 +
* 14 Decr 8.30am: <s>Batch Normalization</s>
 +
** Location: EEBF 4302
 +
* 22 Decr 8:30am: <s>He, Zhang, Ren & Sun: ResNet</s>
 +
** Location: EEBF 4302
  
 +
* <span style="color:#F09090">Week of 25 Decr: Break</span>
 +
* <span style="color:#F09090">Week of 01 Janr: Break</span>
 +
* <span style="color:#F09090">Week of 08 Janr: Break</span>
 +
* <span style="color:#F09090">Week of 15 Janr: Break</span>
 +
* <span style="color:#F09090">Week of 22 Janr: Break</span>
 +
* <span style="color:#F09090">Week of 29 Janr: Break</span>
 +
* <span style="color:#000000"><s>Monday 05 Febr 3pm: Cao, Wu & Shen: Fully convolutional depth 1</s></span>
 +
** Location: EEBF 4302
 +
* <span style="color:#000000"><s>Monday 12 Febr 3:30pm: Laina et al.: Fully convolutional depth 2</s></span>
 +
** Location: EEBF 4302
 +
* <span style="color:#000000"><s>Monday 19 Febr 3:30pm: Li, Klein & Yao: Fully convolutional depth 3</s></span>
 +
** Location: EEBF 4302
 +
* <span style="color:#000000"><s>Monday 26 Febr 3:30pm: Güler et al.: DenseReg</s></span>
 +
** Location: EEBF 4302
 +
* <span style="color:#000000"><s>Monday 05 Marc 3:30pm: Godard et al.: Unsupervised/train from stereo</s></span>
 +
** Location: EEBF 4302
 +
* <span style="color:#000000"><s>Monday 12 Marc 3:30pm: Zhou et al.: SfMLearner</s></span>
 +
** Location: EEBF 4302
 +
* <span style="color:#000000"><s>Monday 19 Marc 3:30pm: Fangchang & Karaman: depth from a single image & SLAM</s></span>
 +
** Location: EEBF 4302
 +
* <span style="color:#000000"><s>Monday 26 Marc 3:30pm: Pizzoli et al.: REMODE</s></span>
 +
** Location: EEBF 4302
 +
* <span style="color:#000000"><s>Monday 02 Aprl 3:30pm: Yan et al.: Superpixel CNN/CRF</s></span>
 +
** Location: EEBF 4302
 +
* <span style="color:#F09090">Monday 09 Aprl 3:30pm: Break</span>
 +
** Location: EEBF 4302
 +
* <span style="color:#000000">Monday 16 Aprl 3:30pm: Mnih et al.: Deep reinforcement learning</span>
 +
** Location: EEBF 4302
 +
* <span style="color:#000000">Monday 23 Aprl 3:30pm: Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images</span>
 +
** Location: EEBF 4302
 +
* <span style="color:#000000">Monday 30 Aprl 3:30pm: Goodfellow et al.: Generative Adversarial Nets</span>
 +
** Location: EEBF 4302
 +
* <span style="color:#000000">Monday 07 May 3:30pm: Oord et al.: Pixel-RNN and Pixel-CNN</span>
 +
** Location: EEBF 4302
 +
* <span style="color:#000000">Monday 14 May 3:30pm: Isola et al.: Pix2Pix</span>
 +
** Location: EEBF 4302
 +
* <span style="color:#000000">Monday 21 May 3:30pm: Girshick, Donahue, Darrell & Malik: R-CNN</span>
 +
** Location: EEBF 4302
 +
* <span style="color:#CAAAAA">???: Redmon et al.: YOLO and YOLO9000</span>
 +
* <span style="color:#CAAAAA">???: Khoreva et al.: Dense tracking/data augmentation</span>
 +
* <span style="color:#CAAAAA">???m: Mancini et al.: Obstacle detection</span>
 +
* <span style="color:#CAAAAA">???: Ilg/Fischer et al.: FlowNet/FlowNet 2.0</span>
 +
* <span style="color:#CAAAAA">???: Pagnutti et al.: RGBD semantic segmentation with CNN + surface fitting</span>
 +
* <span style="color:#CAAAAA">???: Zhu et al.: CycleGANs</span>
 +
* <span style="color:#CAAAAA">???: Yang et al.: Full 3D reconstruction from single depth view</span>
 +
* <span style="color:#CAAAAA">???: Li et al.: Fully Convolutional Instance-aware Semantic Segmentation</span>
 +
* <span style="color:#CAAAAA">???: Kim et al.: Solving CRF with CNN (depth image)</span>
 +
* <span style="color:#CAAAAA">???: Liu et al.: Attribute Grammar Scene Reconstruction</span>
 +
* <span style="color:#CAAAAA">???: Tatarchenko et al.: Multi-view 3D models </span>
 +
* <span style="color:#CAAAAA">???: Häne et al.: Single-view voxel reconstruction</span>
 +
* <span style="color:#CAAAAA">???: Garg et al.: Geometry+CNN unsupervised</span>
 +
* <span style="color:#CAAAAA">???: Xie et al.: Deep3D</span>
 +
* <span style="color:#CAAAAA">???: Liu et al.: Convolutional Neural Field CRFs</span>
 +
* <span style="color:#CAAAAA">???: Hong et al.: Semantic segmentation for robot behaviour</span>
 +
* <span style="color:#CAAAAA">???: Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser</span>
 +
* <span style="color:#CAAAAA">???: Mirowski et al.: Learning to navigate</span>
 +
* <span style="color:#CAAAAA">???: Finn & Levine: Visual prediction for planning</span>
 +
* <span style="color:#CAAAAA">???: Roy & Todorovic: Neural Regression Forest</span>
 +
* <span style="color:#CAAAAA">???: Hoeim et al.: Photo pop-up </span>
 +
* <span style="color:#CAAAAA">???: Heitz et al.: Cascaded Classification Models</span>
 +
* <span style="color:#CAAAAA">???: Li et al.: Feedback-enabled Cascaded Classification Models</span>
 +
* <span style="color:#CAAAAA">???: Li et al.: Depth & Normals - CRF/regression</span>
 +
* <span style="color:#CAAAAA">???: Han et al.: Bayesian object-level reconstruction</span>
 +
* <span style="color:#CAAAAA">???: Liu et al.: Depth from semantics</span>
 +
* <span style="color:#CAAAAA">???: Wu et al.: Repetitive scene structure</span>
 +
* <span style="color:#CAAAAA">???: He et al.: Haze removal</span>
 +
* <span style="color:#CAAAAA">???: Hassner et al.: Example-based Depth</span>
 +
* <span style="color:#CAAAAA">???: Wu et al.: Repetition-based Depth</span>
 +
 +
</div>
  
 
= Details =
 
= Details =
 +
 +
<div style="opacity:0.8">
  
 
== Foley & Maitlin Chapter 6 - Distance & Size Perception ==
 
== Foley & Maitlin Chapter 6 - Distance & Size Perception ==
 +
 +
Because our project is about using machine learning to extract depth from a single image (with deep learning, then applying it to robot problems) it pays to learn a bit about how humans do it...
  
 
https://books.google.com.tr/books?id=jLBmCgAAQBAJ&printsec=frontcover
 
https://books.google.com.tr/books?id=jLBmCgAAQBAJ&printsec=frontcover
Line 49: Line 128:
  
 
If that doesn't work (some have reported finding it difficult to access Chapter 6), try the following link: http://tinyurl.com/yalnnwp9 - some have reported being able to access the chapter by doing a google search for content.  
 
If that doesn't work (some have reported finding it difficult to access Chapter 6), try the following link: http://tinyurl.com/yalnnwp9 - some have reported being able to access the chapter by doing a google search for content.  
 +
 +
Another thing to try that has worked for some is to log out of any google/gmail account before trying to access.
  
 
If nothing else works, email me.
 
If nothing else works, email me.
 
Because our project is about using machine learning to extract depth from a single image (with deep learning, then applying it to robot problems) it pays to learn a bit about how humans do it.
 
  
 
== Saxena, Min & Ng: Make3D ==
 
== Saxena, Min & Ng: Make3D ==
Line 59: Line 138:
  
 
This is the classic paper that brought machine learning to the problem of depth from a single image, quite successfully, considering previous attempts. It uses Markov Random Fields, which are a bit advanced but, importantly, quite slow.
 
This is the classic paper that brought machine learning to the problem of depth from a single image, quite successfully, considering previous attempts. It uses Markov Random Fields, which are a bit advanced but, importantly, quite slow.
 +
 +
'''Note''': because our library has a subscription to IEEE Xplore, you can access the above link from on-campus or via off-campus library access or via VPN.
 +
 +
But, here is an alternative link: http://www.cs.cornell.edu/~asaxena/reconstruction3d/saxena_make3d_learning3dstructure.pdf
 +
 +
There are some videos and things available here: http://make3d.cs.cornell.edu/ -- there used to be a live online demo but they've closed that. There is also a list of results on the Make3D dataset up till about 2012: http://make3d.cs.cornell.edu/results_stateoftheart.html
 +
 +
After that other datasets started being used also.
 +
 +
Superpixels are used in the study. Here is a quick intro to them: http://ttic.uchicago.edu/~xren/research/superpixel/
 +
 +
MRFs are more difficult and if anybody has seen a good tutorial for them let me know so that I can link to it here. The best I could find is https://mitpress.mit.edu/sites/default/files/titles/content/9780262015776_sch_0001.pdf but it is still a bit difficult. We will probably end up discussing what MRFs are a lot on Thursday.
  
 
== Michels, Saxena & Ng: High speed obstacle avoidance ==
 
== Michels, Saxena & Ng: High speed obstacle avoidance ==
Line 65: Line 156:
  
 
Here the same authors focus on a related problem, that of determining open spaces for guiding a vehicle, again using machine learning techniques.
 
Here the same authors focus on a related problem, that of determining open spaces for guiding a vehicle, again using machine learning techniques.
 +
 +
This version of the paper might be of higher quality (thanks to Hossein for finding):
 +
 +
http://ai.stanford.edu/~asaxena/rccar/ICML_ObstacleAvoidance.pdf
  
 
== Karsch, Liu & Kang: Depth Transfer ==
 
== Karsch, Liu & Kang: Depth Transfer ==
  
 +
Here is the target paper:
 
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5551153
 
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5551153
  
 +
For those who are not on campus, a temporary link:
 +
http://web.itu.edu.tr/djduff/Share/KarschEtAl2014.pdf
 +
 +
This is a nonparametric approach to depth from a single image. They search a database of images similar to the observed one then aligns the found image with the observed one then warps the found image retrieved from the database to estimate the depth of the current image. It depends on an approach called SIFTFlow to do the alignment.
 +
 +
Here is a paper describing "SIFTFlow" (if you have the time to go deeper):
 
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6787109
 
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6787109
  
This is a nonparametric approach to depth from a single image. They search a database of images similar to the observed one then warp the image retrieved from the database to estimate the depth of the current image.
+
Free version:
 +
http://people.csail.mit.edu/celiu/SIFTflow/
 +
 
 +
Or a shorter conference version available off-campus:
 +
http://people.csail.mit.edu/celiu/ECCV2008/
 +
 
 +
== LeCun, Bengio & Hinton: Deep Learning Review ==
 +
 
 +
http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html?foxtrotcallback=true
 +
 
 +
Alternative links: http://pages.cs.wisc.edu/~dyer/cs540/handouts/deep-learning-nature2015.pdf https://www.researchgate.net/publication/277411157_Deep_Learning
 +
 
 +
A whirlwind compressed intro to deep learning and its parts.
 +
 
 +
For a more gentle introduction to deep learning: http://cs231n.stanford.edu/
 +
 
 +
Or you can find lots of gentle short intros: https://www.google.com.tr/search?q=intro+to+deep+learning
 +
 
 +
== Rumelhart, Hinton & Williams: Backpropagation ==
 +
 
 +
An early paper introducing backpropagation, the main way we train neural networks nowadays: http://www.nature.com/articles/323533a0
 +
 
 +
Alternative link: http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf
 +
 
 +
These topics are also addressed in the tutorials shared just above (or you will find plenty online and most neural network tutorials attempt to explain backpropagation as it is the main way these networks are trained - I usually use a simple genetic algorithm when explaining how to train neural networks because it's simpler - an accelerated tutorial of neural networks without explaining backpropagation but explaining one of the tools is at http://files.djduff.net/nn.zip ).
  
== LeCun, Bottou, Bengio & Haffner: CNNs ==
+
== LeCun, Bottou, Bengio & Haffner: CNNs ==
 +
 
 +
This is the now classic paper describing LeNet architectures applying Convolutional Neural Networks (CNNs) to the problem of optical character recognition. It also embeds the neural network in an architecture for automatically segmenting text, including a system for automatically reading cheques.
  
 
http://ieeexplore.ieee.org/abstract/document/726791/
 
http://ieeexplore.ieee.org/abstract/document/726791/
  
Here is the classic paper applying convolutional neural networks to image processing.  
+
Alternative link: http://www.dengfanxin.cn/wp-content/uploads/2016/03/1998Lecun.pdf
 +
 
 +
Convolutional neural networks were introduced by the authors in 1990. It may be instructive to read that considerably simpler paper: http://yann.lecun.com/exdb/publis/pdf/lecun-90c.pdf
 +
 
 +
Or any tutorial about CNNs. One good place to follow is: http://cs231n.stanford.edu/syllabus.html
  
 
== Krizhevsky, Sutskever & Hinton: ImageNet/AlexNet ==
 
== Krizhevsky, Sutskever & Hinton: ImageNet/AlexNet ==
 +
 +
<span style="color:#B00000">We will not discuss this in the reading group.</span>
  
 
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
 
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
  
Here is when convolutional neural networks and deep learning really showed what it could do - the problem of image recognition.
+
Here is when convolutional neural networks and deep learning really showed what it could do - the problem of image recognition.
 +
 
 +
But we won't use this because a lot of the complexity it introduces turns out not to be necessary. Later methods are "cleaner". So we have taken it out of the reading list.
  
 
== Simonyan & Zisserman: VGG-16 ==
 
== Simonyan & Zisserman: VGG-16 ==
Line 91: Line 227:
  
 
A relatively recent "deep" deep net with 16 layers for image recognition. Note: successful recent networks have one thousand layers.
 
A relatively recent "deep" deep net with 16 layers for image recognition. Note: successful recent networks have one thousand layers.
 +
 +
Feel free to take a look at AlexNet above to get an idea of the space of approaches.
  
 
== Eigen, Puhrsch & Fergus: Depth map prediction ==
 
== Eigen, Puhrsch & Fergus: Depth map prediction ==
Line 97: Line 235:
  
 
Finally, we apply deep neural convolutional networks to the problem that we are interested in.
 
Finally, we apply deep neural convolutional networks to the problem that we are interested in.
 +
 +
== Luo et al.: Deep Learning + Stereo ==
 +
 +
Combining deep learning and stereo.
 +
 +
https://www.cs.toronto.edu/~urtasun/publications/luo_etal_cvpr16.pdf
  
 
== Shelhamer, Long & Darrell: Fully Convolutional Segmentation ==
 
== Shelhamer, Long & Darrell: Fully Convolutional Segmentation ==
Line 104: Line 248:
 
Here a related problem is solved, that of semantic segmentation, but this approach is applicable to our problem.
 
Here a related problem is solved, that of semantic segmentation, but this approach is applicable to our problem.
  
== Loffe & Szegedy: Batch Normalization ==
+
== Giuisti et al.: Forest trails CNN ==
  
https://arxiv.org/abs/1502.03167
+
http://ieeexplore.ieee.org/document/7358076/
  
A recent technique that has enabled powerful new methods and ultimately much deeper neural networks. Important stuff.
+
Alternative link: http://rpg.ifi.uzh.ch/docs/RAL16_Giusti.pdf
  
== He, Zhang, Ren & Sun: ResNets ==
+
See also youtube: https://www.youtube.com/watch?v=umRdt3zGgpU
  
https://arxiv.org/abs/1512.03385
+
Here we have a CNN-based update to the learn-to-navigate-from-images problem addressed by Saxena et al. above.
  
This work and variations on it have been the basis of the 1000 layer recent neural networks. Important stuff.
+
== Loffe & Szegedy: Batch Normalization ==
  
== Girshick, Donahue, Darrell & Malik: R-CNN ==
+
https://arxiv.org/abs/1502.03167
  
https://arxiv.org/abs/1311.2524
+
A recent technique that has enabled powerful new methods and ultimately much deeper neural networks. Important stuff.
  
We take a slight seque to check out how tracking has been done recently with neural networks. Note that Faster-RCNN and more recent alternatives use similar principles but do it faster.
+
Apparently the following video from the Stanford CS231N course contains details of Batch Normalization (37:00 to 59:30)
  
== Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser ==
+
https://youtu.be/gYpoJMlgyXA?list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC
  
https://arxiv.org/abs/1611.02174
+
Best,
 +
Damien
  
Here we see an interesting depth-from-single-image sensor fusion with robotics applications.
+
== He, Zhang, Ren & Sun: ResNets ==
  
== Giuisti et al.: Forest trails CNN ==
+
https://arxiv.org/abs/1512.03385
  
http://ieeexplore.ieee.org/document/7358076/
+
This work and variations on it have been the basis of the 1000 layer recent neural networks. Important stuff.
  
See also youtube.
+
The diagram at the top of Page 2 of this paper (which offers an improvement on the ResNet of the above paper) is quite useful in understanding the structure of a residual unit:
 
 
Here we have a CNN-based update to the learn-to-navigate-from-images problem addressed by Saxena et al. above.
 
  
 +
http://arxiv.org/abs/1603.05027
  
 
== Cao, Wu & Shen: Fully convolutional depth 1 ==
 
== Cao, Wu & Shen: Fully convolutional depth 1 ==
Line 142: Line 286:
  
 
Here we start a series of recent papers that take different approaches using deep nets to depth from a single image.
 
Here we start a series of recent papers that take different approaches using deep nets to depth from a single image.
 +
 +
Let me know if you can find some media for this work (I could only find the paper itself).
  
 
== Laina et al.: Fully convolutional depth 2 ==
 
== Laina et al.: Fully convolutional depth 2 ==
Line 155: Line 301:
 
Here we finalise a series of recent papers that take different approaches using deep nets to depth from a single image.
 
Here we finalise a series of recent papers that take different approaches using deep nets to depth from a single image.
  
== Luo et al.: Deep Learning + Stereo ==
+
== Güler et al. DenseReg ==
 +
 
 +
https://arxiv.org/abs/1612.01202
 +
 
 +
A key idea here is how to do regression using categorical prediction & an application of the Fully Convolutional Networks to regression problems.
 +
 
 +
== Godard et al. Unsupervised/train from stereo ==
 +
 
 +
https://arxiv.org/abs/1609.03677
 +
 
 +
http://visual.cs.ucl.ac.uk/pubs/monoDepth/
 +
 
 +
https://github.com/mrharicot/monodepth
 +
 
 +
== Zhou et al.: SfMLearner ==
 +
 
 +
Super cool stuff.
 +
 
 +
https://people.eecs.berkeley.edu/%7Etinghuiz/projects/SfMLearner/
 +
 
 +
https://arxiv.org/abs/1704.07813
 +
 
 +
A blog entry explaining the main ideas: http://bair.berkeley.edu/blog/2017/07/11/confluence-of-geometry-and-learning/
 +
 
 +
The code: https://github.com/tinghuiz/SfMLearner
 +
 
 +
== Fangchang & Karaman: depth from a single image & SLAM ==
 +
 
 +
http://www.mit.edu/~fcma/
 +
 
 +
https://youtu.be/vNIIT_M7x7Y
 +
 
 +
https://arxiv.org/pdf/1709.07492.pdf
 +
 
 +
https://github.com/fangchangma/sparse-to-dense.git
 +
 
 +
==  Pizzoli et al.: REMODE ==
 +
 
 +
In case you miss it: this is not a single-image method... but close to it. It is another structure from motion method. But the results are rather good (state of the art in 2014).
 +
 
 +
[1]M. Pizzoli, C. Forster, and D. Scaramuzza, “REMODE: Probabilistic, monocular dense reconstruction in real time,” in Robotics and Automation (ICRA), 2014 IEEE International Conference on, 2014, pp. 2609–2616.
 +
 
 +
http://rpg.ifi.uzh.ch/docs/ICRA14_Pizzoli.pdf
 +
 
 +
https://www.youtube.com/watch?v=QTKd5UWCG0Q
 +
 
 +
== Yan et al.: Superpixel CNN/CRF ==
 +
 
 +
http://ieeexplore.ieee.org/document/8105853/
 +
 
 +
== Mnih et al.: Deep reinforcement learning ==
  
Combining deep learning and stereo.
+
A modern classic.
  
https://www.cs.toronto.edu/~urtasun/publications/luo_etal_cvpr16.pdf
+
http://arxiv.org/abs/1312.5602
  
== Goodfellow et al.: Generative Adversarial Nets ==
+
https://www.nature.com/articles/nature14236 (temporary link: http://web.itu.edu.tr/djduff/2018/nature14236.pdf )
  
https://papers.nips.cc/paper/5423-generative-adversarial-nets
+
https://www.youtube.com/watch?v=iqXKQf2BOSE
  
Another important recent development that we may make use of.
+
Here it is done in Keras: https://keon.io/deep-q-learning/
  
 
== Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images ==
 
== Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images ==
Line 171: Line 367:
 
https://arxiv.org/abs/1411.5928
 
https://arxiv.org/abs/1411.5928
  
A non-adversarial approach to the same problem.
+
https://ieeexplore.ieee.org/document/7469347/media
 +
 
 +
A non-adversarial approach to generating images.
 +
 
 +
Also see:
 +
 
 +
https://www.youtube.com/watch?v=QCSW4isBDL0
 +
 
 +
https://www.youtube.com/watch?v=LAfmJQK4UW0
 +
 
 +
== Goodfellow et al.: Generative Adversarial Nets ==
 +
 
 +
https://papers.nips.cc/paper/5423-generative-adversarial-nets
 +
 
 +
Another important recent development that we may make use of.
  
 
== Oord et al.: Pixel-RNN & Pixel-CNN ==
 
== Oord et al.: Pixel-RNN & Pixel-CNN ==
Line 180: Line 390:
  
 
http://arxiv.org/abs/1606.05328
 
http://arxiv.org/abs/1606.05328
 +
 +
Some background on LSTMs:
 +
 +
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  
 
== Isola et al. Pix2Pix ==
 
== Isola et al. Pix2Pix ==
Line 185: Line 399:
 
https://arxiv.org/abs/1611.07004  
 
https://arxiv.org/abs/1611.07004  
  
We can use this too. And it's cool.  
+
We can use this too. And it's cool.
 +
 
 +
The project page:
 +
 
 +
https://phillipi.github.io/pix2pix/
 +
 
 +
An online demo:
 +
 
 +
https://affinelayer.com/pixsrv/
 +
 
 +
== Girshick, Donahue, Darrell & Malik: R-CNN ==
 +
 
 +
https://arxiv.org/abs/1311.2524
 +
 
 +
We take a slight segue to check out how tracking has been done recently with neural networks. Note that Faster-RCNN and more recent alternatives use similar principles but do it faster. Here we look at the older paper so that we can discuss some of the fundamentals.
 +
 
 +
== Redmon et al.: YOLO and YOLO9000 ==
 +
 
 +
https://arxiv.org/abs/1612.08242
 +
 
 +
https://arxiv.org/abs/1506.02640
 +
 
 +
== Khoreva et al.: Dense tracking/data augmentation ==
 +
 
 +
An attempt on the DAVIS dataset. The dataset is here: http://davischallenge.org/
 +
 
 +
This is interesting because of the data augmentation approach used.
 +
 
 +
The paper: http://arxiv.org/abs/1703.09554
 +
 
 +
Video: https://www.youtube.com/watch?v=QrsR5w-HR14
 +
 
 +
== Mancini et al.: Obstacle detection ==
 +
 
 +
https://arxiv.org/abs/1607.06349
 +
 
 +
== Pagnutti et al. RGBD semantic segmentation with CNN + surface fitting ==
 +
 
 +
http://ieeexplore.ieee.org/document/8120042/
 +
 
 +
https://pdfs.semanticscholar.org/7716/9ee225157e77d1632e3bed54c70235b4abf0.pdf
 +
 
 +
== Zhu et al.: CycleGANs ==
 +
 
 +
https://arxiv.org/abs/1703.10593
 +
 
 +
Here is a nice tutorial:
 +
 
 +
https://hardikbansal.github.io/CycleGANBlog/
 +
 
 +
== Ilg/Fischer et al.: FlowNet/FlowNet 2.0 ==
 +
 
 +
FlowNet:
 +
 
 +
https://www.youtube.com/watch?v=g-peWXaQnQc
 +
 
 +
https://arxiv.org/abs/1504.06852
 +
 
 +
FlowNet 2.0:
 +
 
 +
https://www.youtube.com/watch?v=JSzUdVBmQP4
 +
 
 +
https://arxiv.org/abs/1612.01925
 +
 
 +
== Yang et al.: Full 3D reconstruction from single depth view ==
 +
 
 +
https://arxiv.org/abs/1708.07969
 +
 
 +
== Li et al. Fully Convolutional Instance-aware Semantic Segmentation ==
 +
 
 +
https://arxiv.org/abs/1611.07709
 +
 
 +
== Kim et al. Solving CRF with CNN (depth image) ==
 +
 
 +
http://arxiv.org/abs/1603.06359
 +
 
 +
 
 +
== Liu et al. Attribute Grammar Scene Reconstruction ==
 +
 
 +
http://ieeexplore.ieee.org/document/7889053/?source=tocalert&dld=Z21haWwuY29t
 +
 
 +
== Tatarchenko et al. Multi-view 3D models ==
 +
 
 +
https://arxiv.org/abs/1511.06702
 +
 
 +
Not just inferring the depth image but also other views of it (related to the SfMLearner paper).
 +
 
 +
== Häne et al. Single-view voxel reconstruction ==
 +
 
 +
Blog summary: http://bair.berkeley.edu/blog/2017/08/23/high-quality-3d-obj-reconstruction/
 +
 
 +
Video intro: https://www.youtube.com/watch?v=BjwhMDhbqAs
 +
 
 +
Full paper: https://arxiv.org/abs/1704.00710
 +
 
 +
== Garg et al. Geometry+CNN unsupervised ==
 +
 
 +
We will have already read Godard et al. and Zhou et al. but this is for completeness.
 +
 
 +
http://arxiv.org/abs/1603.04992
 +
 
 +
== Xie et al. Deep3D ==
 +
 
 +
We will have already read Godard et al. and Zhou et al. but this is for completeness.
 +
 
 +
https://arxiv.org/pdf/1604.03650
 +
 
 +
== Liu et al. Convolutional Neural Field CRFs ==
 +
 
 +
https://arxiv.org/abs/1411.6387
 +
 
 +
== Hong et al.: Semantic segmentation for robot behaviour ==
 +
 
 +
https://arxiv.org/abs/1802.00285
 +
 
 +
== Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser ==
 +
 
 +
https://arxiv.org/abs/1611.02174
 +
 
 +
Here we see an interesting depth-from-single-image sensor fusion with robotics applications.
 +
 
 +
== Mirowski et al.: Learning to navigate ==
 +
 
 +
https://arxiv.org/abs/1611.03673
 +
 
 +
== Finn & Levine: Visual prediction for planning ==
 +
 
 +
http://arxiv.org/abs/1610.00696
 +
 
 +
== Hoeim et al. Photo pop-up ==
 +
 
 +
A classic.
 +
 
 +
http://repository.cmu.edu/cgi/viewcontent.cgi?article=1288&context=robotics
 +
 
 +
== Roy & Todorovic: Neural Regression Forest ==
 +
 
 +
https://www.cv-foundation.org/openaccess/content_cvpr_2016/app/S23-11.pdf
 +
 
 +
== Heitz et al. Cascaded Classification Models ==
 +
 
 +
Older pre-CNN machine learning papers for depth estimation from a single image.
 +
 
 +
http://papers.nips.cc/paper/3472-cascaded-classification-models-combining-models-for-holistic-scene-understanding.pdf
 +
 
 +
== Li et al. Feedback-enabled Cascaded Classification Models ==
 +
 
 +
Older pre-CNN machine learning papers for depth estimation from a single image.
 +
 
 +
https://arxiv.org/abs/1110.5102
 +
 
 +
== Li et al. Depth & Normals - CRF/regression ==
 +
 
 +
https://www.cv-foundation.org/openaccess/content_cvpr_2015/app/1B_001.pdf
 +
 
 +
== Han et al. Bayesian object-level reconstruction ==
 +
 
 +
http://escholarship.org/uc/item/9tk6935x.pdf
 +
 
 +
== Liu et al. Depth from semantics ==
 +
 
 +
Older pre-CNN machine learning papers for depth estimation from a single image.
 +
 
 +
http://ai.stanford.edu/people/koller/Papers/Liu+al:CVPR10.pdf
 +
 
 +
== Wu et al. Repetitive scene structure ==
 +
 
 +
Older pre-CNN machine learning papers for depth estimation from a single image.
 +
 
 +
http://www.academia.edu/download/30713855/WuCVPR11.pdf
 +
 
 +
== He et al. Haze removal ==
 +
 
 +
Might be interesting because of use of single-image cues.
 +
 
 +
http://mmlab.ie.cuhk.edu.hk/2009/dehaze_cvpr2009.pdf
 +
 
 +
== Hassner et al. Example-based Depth ==
 +
 
 +
Seems like an older version of the SIFTFlow based one of Karsch.
 +
 
 +
https://www.researchgate.net/publication/4245893_Example_Based_3D_Reconstruction_from_Single_2D_Images
 +
 
 +
== Wu et al. Repetition-based Depth ==
 +
 
 +
"Repetition-based dense single-view reconstruction"
 +
 
 +
http://www.academia.edu/download/30713855/WuCVPR11.pdf
 +
 
 +
</div>
 +
 
 +
= Additional Resources =
 +
 
 +
EBook: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/DeepLearning-NowPublishing-Vol7-SIG-039.pdf Deep Learning: Methods and Applications by Li Deng and Dong Yu
 +
 
 +
Online course with slides videos and assignments: http://cs231n.stanford.edu/ CS231n: Convolutional Neural Networks for Visual Recognition (Stanford)
  
= List of interested people =
+
My NN/Keras bootcamp slides: http://files.djduff.net/nn.zip
(who I will contact with information about the schedule etc.)
 
  
* Abdulmajeed M. K.
+
Foley & Maitlin's book: https://books.google.com.tr/books?id=jLBmCgAAQBAJ&printsec=frontcover
* Alican M.
 
* Anas M.
 
* K. Bulut Ö.
 
* Imaduddin A. M.
 
* Tolga C.
 
* Bilge A.
 
* Hatice K.
 
* Hossein P.
 
* Torkan G.
 
* Buse Sibel K.
 
* Oğuzhan C.
 
* M. Alperen Ö.
 
* Hatice K.
 
* Özgür Ö.
 
* Doğay K.
 
* Elena B. S.
 
* Müjde A.
 

Latest revision as of 05:38, 6 September 2018

Contents

Getting involved

There is now a mailing list for this reading group. Send me (Damien) an email to get on it. No problem.

Note: this reading group is about deep learning as applied to depth estimation from a single image - one of the super hot topics. If your interest is deep learning in general, you may find some of the readings a little bit off-topic. So let me know if you want some idea about which you should read for.

Proposed Schedule

    • Warning: This is now out of date. We are arranging via the email list now.**

The below schedule is only proposed, and subject to change.

  • 14 Sept 8.30am: Foley & Maitlin Chapter 6: Distance & Size Perception
    • Location: EEBF 4302
  • 21 Sept 8.30am: Saxena, Min & Ng: Make3D
    • Location: EEBF 4302
  • 28 Sept 8.30am: Michels, Saxena & Ng: High speed obstacle avoidance
    • Location: EEBF 4302
  • 05 Octr 8.30am: Karsch, Liu & Kang: Depth Transfer
    • Location: EEBF 4302
  • 12 Octr 8.30am: LeCun, Bengio & Hinton: Deep Learning Review
    • Location: EEBF 4302
  • 19 Octr 8.30am: Rumelhart, Hinton & Williams: Backpropagation
    • Location: EEBF 4302
  • 26 Octr 8.30am: LeCun, Bottou, Bengio & Haffner: CNNs
    • Location: EEBF 4302
  • 02 Novr 8:30am: LeCun, Bottou, Bengio & Haffner: CNNs
    • Location: EEBF 4302
  • 09 Novr 8.30am: Simonyan & Zisserman: VGG-16
    • Location: EEBF 4302
  • 16 Novr 8.30am: Eigen, Puhrsch & Fergus: Depth map prediction
    • Location: EEBF 4302
  • 23 Novr 8.30am: Luo et al.: Deep Learning + Stereo
    • Location: EEBF 4302
  • 30 Novr 8.30am: Shelhamer, Long & Darrell: Fully Convolutional Segmentation
    • Location: EEBF 4302
  • 07 Decr 8.30am: Giuisti et al.: Forest trails CNN
    • Location: EEBF 4302
  • 14 Decr 8.30am: Batch Normalization
    • Location: EEBF 4302
  • 22 Decr 8:30am: He, Zhang, Ren & Sun: ResNet
    • Location: EEBF 4302
  • Week of 25 Decr: Break
  • Week of 01 Janr: Break
  • Week of 08 Janr: Break
  • Week of 15 Janr: Break
  • Week of 22 Janr: Break
  • Week of 29 Janr: Break
  • Monday 05 Febr 3pm: Cao, Wu & Shen: Fully convolutional depth 1
    • Location: EEBF 4302
  • Monday 12 Febr 3:30pm: Laina et al.: Fully convolutional depth 2
    • Location: EEBF 4302
  • Monday 19 Febr 3:30pm: Li, Klein & Yao: Fully convolutional depth 3
    • Location: EEBF 4302
  • Monday 26 Febr 3:30pm: Güler et al.: DenseReg
    • Location: EEBF 4302
  • Monday 05 Marc 3:30pm: Godard et al.: Unsupervised/train from stereo
    • Location: EEBF 4302
  • Monday 12 Marc 3:30pm: Zhou et al.: SfMLearner
    • Location: EEBF 4302
  • Monday 19 Marc 3:30pm: Fangchang & Karaman: depth from a single image & SLAM
    • Location: EEBF 4302
  • Monday 26 Marc 3:30pm: Pizzoli et al.: REMODE
    • Location: EEBF 4302
  • Monday 02 Aprl 3:30pm: Yan et al.: Superpixel CNN/CRF
    • Location: EEBF 4302
  • Monday 09 Aprl 3:30pm: Break
    • Location: EEBF 4302
  • Monday 16 Aprl 3:30pm: Mnih et al.: Deep reinforcement learning
    • Location: EEBF 4302
  • Monday 23 Aprl 3:30pm: Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images
    • Location: EEBF 4302
  • Monday 30 Aprl 3:30pm: Goodfellow et al.: Generative Adversarial Nets
    • Location: EEBF 4302
  • Monday 07 May 3:30pm: Oord et al.: Pixel-RNN and Pixel-CNN
    • Location: EEBF 4302
  • Monday 14 May 3:30pm: Isola et al.: Pix2Pix
    • Location: EEBF 4302
  • Monday 21 May 3:30pm: Girshick, Donahue, Darrell & Malik: R-CNN
    • Location: EEBF 4302
  • ???: Redmon et al.: YOLO and YOLO9000
  • ???: Khoreva et al.: Dense tracking/data augmentation
  • ???m: Mancini et al.: Obstacle detection
  • ???: Ilg/Fischer et al.: FlowNet/FlowNet 2.0
  • ???: Pagnutti et al.: RGBD semantic segmentation with CNN + surface fitting
  • ???: Zhu et al.: CycleGANs
  • ???: Yang et al.: Full 3D reconstruction from single depth view
  • ???: Li et al.: Fully Convolutional Instance-aware Semantic Segmentation
  • ???: Kim et al.: Solving CRF with CNN (depth image)
  • ???: Liu et al.: Attribute Grammar Scene Reconstruction
  • ???: Tatarchenko et al.: Multi-view 3D models
  • ???: Häne et al.: Single-view voxel reconstruction
  • ???: Garg et al.: Geometry+CNN unsupervised
  • ???: Xie et al.: Deep3D
  • ???: Liu et al.: Convolutional Neural Field CRFs
  • ???: Hong et al.: Semantic segmentation for robot behaviour
  • ???: Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser
  • ???: Mirowski et al.: Learning to navigate
  • ???: Finn & Levine: Visual prediction for planning
  • ???: Roy & Todorovic: Neural Regression Forest
  • ???: Hoeim et al.: Photo pop-up
  • ???: Heitz et al.: Cascaded Classification Models
  • ???: Li et al.: Feedback-enabled Cascaded Classification Models
  • ???: Li et al.: Depth & Normals - CRF/regression
  • ???: Han et al.: Bayesian object-level reconstruction
  • ???: Liu et al.: Depth from semantics
  • ???: Wu et al.: Repetitive scene structure
  • ???: He et al.: Haze removal
  • ???: Hassner et al.: Example-based Depth
  • ???: Wu et al.: Repetition-based Depth

Details

Foley & Maitlin Chapter 6 - Distance & Size Perception

Because our project is about using machine learning to extract depth from a single image (with deep learning, then applying it to robot problems) it pays to learn a bit about how humans do it...

https://books.google.com.tr/books?id=jLBmCgAAQBAJ&printsec=frontcover

Go to Chapter 6.

If that doesn't work (some have reported finding it difficult to access Chapter 6), try the following link: http://tinyurl.com/yalnnwp9 - some have reported being able to access the chapter by doing a google search for content.

Another thing to try that has worked for some is to log out of any google/gmail account before trying to access.

If nothing else works, email me.

Saxena, Min & Ng: Make3D

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4531745

This is the classic paper that brought machine learning to the problem of depth from a single image, quite successfully, considering previous attempts. It uses Markov Random Fields, which are a bit advanced but, importantly, quite slow.

Note: because our library has a subscription to IEEE Xplore, you can access the above link from on-campus or via off-campus library access or via VPN.

But, here is an alternative link: http://www.cs.cornell.edu/~asaxena/reconstruction3d/saxena_make3d_learning3dstructure.pdf

There are some videos and things available here: http://make3d.cs.cornell.edu/ -- there used to be a live online demo but they've closed that. There is also a list of results on the Make3D dataset up till about 2012: http://make3d.cs.cornell.edu/results_stateoftheart.html

After that other datasets started being used also.

Superpixels are used in the study. Here is a quick intro to them: http://ttic.uchicago.edu/~xren/research/superpixel/

MRFs are more difficult and if anybody has seen a good tutorial for them let me know so that I can link to it here. The best I could find is https://mitpress.mit.edu/sites/default/files/titles/content/9780262015776_sch_0001.pdf but it is still a bit difficult. We will probably end up discussing what MRFs are a lot on Thursday.

Michels, Saxena & Ng: High speed obstacle avoidance

http://dl.acm.org/citation.cfm?id=1102426

Here the same authors focus on a related problem, that of determining open spaces for guiding a vehicle, again using machine learning techniques.

This version of the paper might be of higher quality (thanks to Hossein for finding):

http://ai.stanford.edu/~asaxena/rccar/ICML_ObstacleAvoidance.pdf

Karsch, Liu & Kang: Depth Transfer

Here is the target paper: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5551153

For those who are not on campus, a temporary link: http://web.itu.edu.tr/djduff/Share/KarschEtAl2014.pdf

This is a nonparametric approach to depth from a single image. They search a database of images similar to the observed one then aligns the found image with the observed one then warps the found image retrieved from the database to estimate the depth of the current image. It depends on an approach called SIFTFlow to do the alignment.

Here is a paper describing "SIFTFlow" (if you have the time to go deeper): http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6787109

Free version: http://people.csail.mit.edu/celiu/SIFTflow/

Or a shorter conference version available off-campus: http://people.csail.mit.edu/celiu/ECCV2008/

LeCun, Bengio & Hinton: Deep Learning Review

http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html?foxtrotcallback=true

Alternative links: http://pages.cs.wisc.edu/~dyer/cs540/handouts/deep-learning-nature2015.pdf https://www.researchgate.net/publication/277411157_Deep_Learning

A whirlwind compressed intro to deep learning and its parts.

For a more gentle introduction to deep learning: http://cs231n.stanford.edu/

Or you can find lots of gentle short intros: https://www.google.com.tr/search?q=intro+to+deep+learning

Rumelhart, Hinton & Williams: Backpropagation

An early paper introducing backpropagation, the main way we train neural networks nowadays: http://www.nature.com/articles/323533a0

Alternative link: http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf

These topics are also addressed in the tutorials shared just above (or you will find plenty online and most neural network tutorials attempt to explain backpropagation as it is the main way these networks are trained - I usually use a simple genetic algorithm when explaining how to train neural networks because it's simpler - an accelerated tutorial of neural networks without explaining backpropagation but explaining one of the tools is at http://files.djduff.net/nn.zip ).

LeCun, Bottou, Bengio & Haffner: CNNs

This is the now classic paper describing LeNet architectures applying Convolutional Neural Networks (CNNs) to the problem of optical character recognition. It also embeds the neural network in an architecture for automatically segmenting text, including a system for automatically reading cheques.

http://ieeexplore.ieee.org/abstract/document/726791/

Alternative link: http://www.dengfanxin.cn/wp-content/uploads/2016/03/1998Lecun.pdf

Convolutional neural networks were introduced by the authors in 1990. It may be instructive to read that considerably simpler paper: http://yann.lecun.com/exdb/publis/pdf/lecun-90c.pdf

Or any tutorial about CNNs. One good place to follow is: http://cs231n.stanford.edu/syllabus.html

Krizhevsky, Sutskever & Hinton: ImageNet/AlexNet

We will not discuss this in the reading group.

https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Here is when convolutional neural networks and deep learning really showed what it could do - the problem of image recognition.

But we won't use this because a lot of the complexity it introduces turns out not to be necessary. Later methods are "cleaner". So we have taken it out of the reading list.

Simonyan & Zisserman: VGG-16

http://arxiv.org/abs/1409.1556

A relatively recent "deep" deep net with 16 layers for image recognition. Note: successful recent networks have one thousand layers.

Feel free to take a look at AlexNet above to get an idea of the space of approaches.

Eigen, Puhrsch & Fergus: Depth map prediction

https://www.cs.nyu.edu/~deigen/depth/

Finally, we apply deep neural convolutional networks to the problem that we are interested in.

Luo et al.: Deep Learning + Stereo

Combining deep learning and stereo.

https://www.cs.toronto.edu/~urtasun/publications/luo_etal_cvpr16.pdf

Shelhamer, Long & Darrell: Fully Convolutional Segmentation

http://arxiv.org/abs/1605.06211

Here a related problem is solved, that of semantic segmentation, but this approach is applicable to our problem.

Giuisti et al.: Forest trails CNN

http://ieeexplore.ieee.org/document/7358076/

Alternative link: http://rpg.ifi.uzh.ch/docs/RAL16_Giusti.pdf

See also youtube: https://www.youtube.com/watch?v=umRdt3zGgpU

Here we have a CNN-based update to the learn-to-navigate-from-images problem addressed by Saxena et al. above.

Loffe & Szegedy: Batch Normalization

https://arxiv.org/abs/1502.03167

A recent technique that has enabled powerful new methods and ultimately much deeper neural networks. Important stuff.

Apparently the following video from the Stanford CS231N course contains details of Batch Normalization (37:00 to 59:30)

https://youtu.be/gYpoJMlgyXA?list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC

Best, Damien

He, Zhang, Ren & Sun: ResNets

https://arxiv.org/abs/1512.03385

This work and variations on it have been the basis of the 1000 layer recent neural networks. Important stuff.

The diagram at the top of Page 2 of this paper (which offers an improvement on the ResNet of the above paper) is quite useful in understanding the structure of a residual unit:

http://arxiv.org/abs/1603.05027

Cao, Wu & Shen: Fully convolutional depth 1

http://arxiv.org/abs/1605.02305

Here we start a series of recent papers that take different approaches using deep nets to depth from a single image.

Let me know if you can find some media for this work (I could only find the paper itself).

Laina et al.: Fully convolutional depth 2

http://arxiv.org/abs/1606.00373

Here we continue a series of recent papers that take different approaches using deep nets to depth from a single image.

Li, Klein & Yao: Fully convolutional depth 3

http://arxiv.org/abs/1607.00730

Here we finalise a series of recent papers that take different approaches using deep nets to depth from a single image.

Güler et al. DenseReg

https://arxiv.org/abs/1612.01202

A key idea here is how to do regression using categorical prediction & an application of the Fully Convolutional Networks to regression problems.

Godard et al. Unsupervised/train from stereo

https://arxiv.org/abs/1609.03677

http://visual.cs.ucl.ac.uk/pubs/monoDepth/

https://github.com/mrharicot/monodepth

Zhou et al.: SfMLearner

Super cool stuff.

https://people.eecs.berkeley.edu/%7Etinghuiz/projects/SfMLearner/

https://arxiv.org/abs/1704.07813

A blog entry explaining the main ideas: http://bair.berkeley.edu/blog/2017/07/11/confluence-of-geometry-and-learning/

The code: https://github.com/tinghuiz/SfMLearner

Fangchang & Karaman: depth from a single image & SLAM

http://www.mit.edu/~fcma/

https://youtu.be/vNIIT_M7x7Y

https://arxiv.org/pdf/1709.07492.pdf

https://github.com/fangchangma/sparse-to-dense.git

Pizzoli et al.: REMODE

In case you miss it: this is not a single-image method... but close to it. It is another structure from motion method. But the results are rather good (state of the art in 2014).

[1]M. Pizzoli, C. Forster, and D. Scaramuzza, “REMODE: Probabilistic, monocular dense reconstruction in real time,” in Robotics and Automation (ICRA), 2014 IEEE International Conference on, 2014, pp. 2609–2616.

http://rpg.ifi.uzh.ch/docs/ICRA14_Pizzoli.pdf

https://www.youtube.com/watch?v=QTKd5UWCG0Q

Yan et al.: Superpixel CNN/CRF

http://ieeexplore.ieee.org/document/8105853/

Mnih et al.: Deep reinforcement learning

A modern classic.

http://arxiv.org/abs/1312.5602

https://www.nature.com/articles/nature14236 (temporary link: http://web.itu.edu.tr/djduff/2018/nature14236.pdf )

https://www.youtube.com/watch?v=iqXKQf2BOSE

Here it is done in Keras: https://keon.io/deep-q-learning/

Dosovitskiy, Springenberg, Tatarchenko & Brox: Generating images

https://arxiv.org/abs/1411.5928

https://ieeexplore.ieee.org/document/7469347/media

A non-adversarial approach to generating images.

Also see:

https://www.youtube.com/watch?v=QCSW4isBDL0

https://www.youtube.com/watch?v=LAfmJQK4UW0

Goodfellow et al.: Generative Adversarial Nets

https://papers.nips.cc/paper/5423-generative-adversarial-nets

Another important recent development that we may make use of.

Oord et al.: Pixel-RNN & Pixel-CNN

https://arxiv.org/abs/1601.06759

Producing distributions over images. We have always intended to do something like this for depth images.

http://arxiv.org/abs/1606.05328

Some background on LSTMs:

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Isola et al. Pix2Pix

https://arxiv.org/abs/1611.07004

We can use this too. And it's cool.

The project page:

https://phillipi.github.io/pix2pix/

An online demo:

https://affinelayer.com/pixsrv/

Girshick, Donahue, Darrell & Malik: R-CNN

https://arxiv.org/abs/1311.2524

We take a slight segue to check out how tracking has been done recently with neural networks. Note that Faster-RCNN and more recent alternatives use similar principles but do it faster. Here we look at the older paper so that we can discuss some of the fundamentals.

Redmon et al.: YOLO and YOLO9000

https://arxiv.org/abs/1612.08242

https://arxiv.org/abs/1506.02640

Khoreva et al.: Dense tracking/data augmentation

An attempt on the DAVIS dataset. The dataset is here: http://davischallenge.org/

This is interesting because of the data augmentation approach used.

The paper: http://arxiv.org/abs/1703.09554

Video: https://www.youtube.com/watch?v=QrsR5w-HR14

Mancini et al.: Obstacle detection

https://arxiv.org/abs/1607.06349

Pagnutti et al. RGBD semantic segmentation with CNN + surface fitting

http://ieeexplore.ieee.org/document/8120042/

https://pdfs.semanticscholar.org/7716/9ee225157e77d1632e3bed54c70235b4abf0.pdf

Zhu et al.: CycleGANs

https://arxiv.org/abs/1703.10593

Here is a nice tutorial:

https://hardikbansal.github.io/CycleGANBlog/

Ilg/Fischer et al.: FlowNet/FlowNet 2.0

FlowNet:

https://www.youtube.com/watch?v=g-peWXaQnQc

https://arxiv.org/abs/1504.06852

FlowNet 2.0:

https://www.youtube.com/watch?v=JSzUdVBmQP4

https://arxiv.org/abs/1612.01925

Yang et al.: Full 3D reconstruction from single depth view

https://arxiv.org/abs/1708.07969

Li et al. Fully Convolutional Instance-aware Semantic Segmentation

https://arxiv.org/abs/1611.07709

Kim et al. Solving CRF with CNN (depth image)

http://arxiv.org/abs/1603.06359


Liu et al. Attribute Grammar Scene Reconstruction

http://ieeexplore.ieee.org/document/7889053/?source=tocalert&dld=Z21haWwuY29t

Tatarchenko et al. Multi-view 3D models

https://arxiv.org/abs/1511.06702

Not just inferring the depth image but also other views of it (related to the SfMLearner paper).

Häne et al. Single-view voxel reconstruction

Blog summary: http://bair.berkeley.edu/blog/2017/08/23/high-quality-3d-obj-reconstruction/

Video intro: https://www.youtube.com/watch?v=BjwhMDhbqAs

Full paper: https://arxiv.org/abs/1704.00710

Garg et al. Geometry+CNN unsupervised

We will have already read Godard et al. and Zhou et al. but this is for completeness.

http://arxiv.org/abs/1603.04992

Xie et al. Deep3D

We will have already read Godard et al. and Zhou et al. but this is for completeness.

https://arxiv.org/pdf/1604.03650

Liu et al. Convolutional Neural Field CRFs

https://arxiv.org/abs/1411.6387

Hong et al.: Semantic segmentation for robot behaviour

https://arxiv.org/abs/1802.00285

Liao, Huang, Wang, Kodagoda, Yu & Liu: Fuse with laser

https://arxiv.org/abs/1611.02174

Here we see an interesting depth-from-single-image sensor fusion with robotics applications.

Mirowski et al.: Learning to navigate

https://arxiv.org/abs/1611.03673

Finn & Levine: Visual prediction for planning

http://arxiv.org/abs/1610.00696

Hoeim et al. Photo pop-up

A classic.

http://repository.cmu.edu/cgi/viewcontent.cgi?article=1288&context=robotics

Roy & Todorovic: Neural Regression Forest

https://www.cv-foundation.org/openaccess/content_cvpr_2016/app/S23-11.pdf

Heitz et al. Cascaded Classification Models

Older pre-CNN machine learning papers for depth estimation from a single image.

http://papers.nips.cc/paper/3472-cascaded-classification-models-combining-models-for-holistic-scene-understanding.pdf

Li et al. Feedback-enabled Cascaded Classification Models

Older pre-CNN machine learning papers for depth estimation from a single image.

https://arxiv.org/abs/1110.5102

Li et al. Depth & Normals - CRF/regression

https://www.cv-foundation.org/openaccess/content_cvpr_2015/app/1B_001.pdf

Han et al. Bayesian object-level reconstruction

http://escholarship.org/uc/item/9tk6935x.pdf

Liu et al. Depth from semantics

Older pre-CNN machine learning papers for depth estimation from a single image.

http://ai.stanford.edu/people/koller/Papers/Liu+al:CVPR10.pdf

Wu et al. Repetitive scene structure

Older pre-CNN machine learning papers for depth estimation from a single image.

http://www.academia.edu/download/30713855/WuCVPR11.pdf

He et al. Haze removal

Might be interesting because of use of single-image cues.

http://mmlab.ie.cuhk.edu.hk/2009/dehaze_cvpr2009.pdf

Hassner et al. Example-based Depth

Seems like an older version of the SIFTFlow based one of Karsch.

https://www.researchgate.net/publication/4245893_Example_Based_3D_Reconstruction_from_Single_2D_Images

Wu et al. Repetition-based Depth

"Repetition-based dense single-view reconstruction"

http://www.academia.edu/download/30713855/WuCVPR11.pdf

Additional Resources

EBook: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/DeepLearning-NowPublishing-Vol7-SIG-039.pdf Deep Learning: Methods and Applications by Li Deng and Dong Yu

Online course with slides videos and assignments: http://cs231n.stanford.edu/ CS231n: Convolutional Neural Networks for Visual Recognition (Stanford)

My NN/Keras bootcamp slides: http://files.djduff.net/nn.zip

Foley & Maitlin's book: https://books.google.com.tr/books?id=jLBmCgAAQBAJ&printsec=frontcover