My Udacity colleague, Cezanne Camacho, is preparing a presentation on capsule networks and gave a draft version in the office today. Cezanne is a terrific engineer and teacher, and she’s already written a great blog post on capsule networks, and she graciously allowed me to share some of that here.
Capsule Networks provide a way to detect parts of objects in an image and represent spatial relationships between those parts. This means that capsule networks are able to recognize the same object in a variety of different poses even if they have not seen that pose in training data.
Cezanne explains that a “capsule” encompasses features that make up a piece of an image. Think of an image of a face, for example, and imagine capsules that capture each eye, and the nose, and the mouth.
These capsules organize into a tree structure. Larger structures, like a face, would be parent nodes in the tree, and smaller structures would be child nodes.
“In the example below, you can see how the parts of a face (eyes, nose, mouth, etc.) might be recognized in leaf nodes and then combined to form a more complete face part in parent nodes.”
“Dynamic routing” plays a role in capsule networks:
“Dynamic routing is a process for finding the best connections between the output of one capsule and the inputs of the next layer of capsules. It allows capsules to communicate with each other and determine how data moves through them, according to real-time changes in the network inputs and outputs!”
Dynamic routing is ultimately implemented via an iterative routing process that Cezanne does a really nice job describing, along with the accompanying math, in her blog post.
Capsule networks seem to do well with image classification on a few datasets, but they haven’t been widely deployed yet because they are slow to train.
The Saturday before Thanksgiving I will be speaking at an event in Philadelphia, discussing artificial intelligence and the future. The event is hosted by the Free Library of Philadelphi and sponsored by the Italian Consulate.
Sujay’s managed his data in a few clever ways for the traffic sign classifier project. First, he converted all of his images to grayscale. Then he skewed and augmented them. Finally, he balanced the data set. The result:
“The validation accuracy attained 98.2% on the validation set and the test accuracy was about 94.7%”
An’s post is a great step-through of how to use OpenCV to find lane lines on the road. It includes lots of code samples!
“Project summary: – Applying calibration on all chessboard images that are taken from the same camera recording the driving to obtain distort coefficients and matrix. – Applying perspective transform and warp image to obtain bird-eyes view on road. – Applying binary threshold by combining derivative x & y, magnitude, direction and S channel. – Reduce noise and locate left & right lanes by histogram data. – Draw line lanes over the image”
Cherkeng’s approach to the Traffic Sign Classification Project was based on an academic paper that uses “dense blocks” of convolutional layers to fit the training data tightly. He also uses several clever data augmentation techniques to prevent overfitting. Here’s how that works out:
“The new network is smaller with test accuracy of 99.40% and MAC (multiply–accumulate operation counts) of 27.0 million.”
Arnaldo has a thorough walk-through of the Udacity Advanced Lane Finding Project. If you want to know how to use computer vision to find lane lines on the road, this is a perfect guide!
“1 Camera calibration 2 Color and gradient threshold 3 Birds eye view 4 Lane detection and fit 5 Curvature of lanes and vehicle position with respect to center 6 Warp back and display information 7 Sanity check 8 Video”
I love this how-to post that lists all the components for a mid-line deep learning rig. Not too cheap, not too expensive. Just right.
Here’s how it does:
“As you can see above, my new machine (labeled “DL Rig”) is the clear winner. It performed this task more than 24 times faster than my MacBook Pro, and almost twice as fast as the AWS p2.large instance. Needless to say, I’m very happy with what I was able to get for the price.”
Companies like Uber and Lyft and Seamless and Fiverr and Upwork facilitate armies of independent contractors who work “gigs” on their own time, for as much money as they want, but without the structure of traditional employment.
Caleb makes the point that, for all the press the gig economy gets, the end might be in sight. Many of these gigs might soon be replaced by computers and robots. He illustrates this point with his colleague, Eric, who works as a safety driver for the autonomous vehicle startup Auro Robotics. Auro’s whole mission is to eliminate Eric’s job!
“Don’t feel too bad for Eric though. He’s become skilled with hardware and robotics. His experience working in cooperation with a robot can enable him to build better systems that don’t need explicit instructions.”
Andrew has lots of images in this blog post, including a spreadsheet of all the different functions he used in building his Traffic Sign Classifier with TensorFlow!
I got to explore TensorFlow and various libraries (see table below), different convolutional neural network models, pre-processing images, manipulating n-dimensional arrays and learning how to display results.
In this post, Param goes step-by-step through his iterative process of finding the right combination of pre-processing, augmentation, and network architecture for classifying traffic signs. 54 neural network architectures in all!
I went crazy by this point, nothing I would do would push me into the 90% range. I wanted to cry. A basic linearly connected model was giving me 85% and here I am using the latest hotness of convolution layers and not able to match.
Backpropagation is the most difficult and mind-bending concept to understand about deep neural networks. After backpropagation, everything else is a piece of cake. In this concise post, Jonathan takes a crack and summarizing backpropagation in a few paragraphs.
When we are training a neural network we need to figure out how to alter a parameter to minimize the cost/loss. The first step is to find out what effect that parameter has on the loss. Then find the total loss up to that parameters point and perform the gradient descent update equation to that parameter.
Arnaldo presents a number of lessons he learned while designing an end-to-end network for driving in the Behavioral Cloning Project. In particular, he came to appreciate the power of GPUSs.
Using GPU is magic. Is like to give a Coke to someone in the desert. Or to buy a new car — the feeling of ‘how I was using that crap old one’. Or to find a shortcut in the route to the office: you’ll never use the long route again. Or to find a secret code in a game that give superpowers…
Esmat presents a well-organized outline of his Finding Lane Lines Porject and the computer vision pipeline that he used. In particular, he has a nice explanation of the Hough transform, which is a tricky concept!
The probabilistic Hough line transform more efficient implementation of Hough transform. It gives as output the extremes of the detected lines (x0, y0, x1, y1). It is difficult to detect straight lines which are part of a curve because they are very very small. For detecting such lines it is important to properly set all the parameters of Hough transform. Two of most important parameters are: Hough votes and maximum distance between points which are to be joined to make a line. Both parameters are set at their minimum value.
Daniel has devoted a section of his blog to the Self-Driving Car projects, including applying his lane-line finder to video he took himself!
The first project for the Udacity Self-Driving Car Nanodegree was to create a software pipeline capable of detecting the lane lines in video feed. The project was done using python with the bulk of work being performed using the OpenCV library. The video to the side shows the software pipeline I developed in action using video footage I took myself.
Jessica’s post discusses the need to normalize image data before feeding it into a neural network, including a bonus explainer on the differences between normalization and standardization.
The same range of values for each of the inputs to the neural network can guarantee stable convergence of weights and biases. (Source: Mahmoud Omid on ResearchGate)
Suppose we have one image that’s really dark (almost all black) and one that’s really bright (almost all white). Our model has to address both cases using the same parameters (weights and biases). It’s hard for our model to be accurate and generalise well if it has to tackle both extreme cases.
Dylan is a student in both the Georgia Tech Online Master’s in Computer Science Program (run by Udacity) and also in CarND. He’s also turning his own Subaru into a self-driving car! (Note: We do not recommend this.)
Below I’ve put together a list of purchases needed for this project. There will definitely be more items coming soon, at least a decent power supply or UPS. Thankfully, this list covers all the big-ticket items.
Jetson TX1 Developement Kit (with .edu discount) NVIDIA $299 ZED Stereo Camera with 6-axis pose Stereolabs $449 CAN(-FD) to USB interface PEAK-System $299 Touch display, 10.1” Toguard $139 Wireless keyboard K400 Logitech $30 Total $1216
Backpropagation is a leaky abstraction; it is a credit assignment scheme with non-trivial consequences. If you try to ignore how it works under the hood because “TensorFlow automagically makes my networks learn”, you will not be ready to wrestle with the dangers it presents, and you will be much less effective at building and debugging neural networks.
I say it’s possible to use deep neural networks quite effectively without truly understanding backprop. But if your goal is to specialize in the field and apply this tool to a range of problems, then “yes you should understand backprop”.
By the way, @karpathy is a prolific Twitter feed with 37,100 followers.
Fun fact 27/120: There are nuclear submarines out there carrying 40 nuclear warheads controlled by a computer running Windows XP.
The easiest way to think of their relationship is to visualize them as concentric circles with AI — the idea that came first — the largest, then machine learning — which blossomed later, and finally deep learning — which is driving today’s AI explosion — fitting inside both.
I guess if I had to explain, I would say that:
“artificial intelligence” refers to techniques that help computers accomplish goals
“machine learning” refers to techniques that help computers accomplish goals by learning from data
“deep learning” refers to techniques that help computers accomplish goals by using deep neural networks to learn from data
But if you’re interested in these topics, then read the NVIDIA post. It’s good.
We’ll be covering behavioral cloning, which is a technique whereby you drive the car (or the simulated car, in this case) yourself and then pass the data to a neural network. The neural network trains on your driving data and auto-magically learns how to drive the car, without any other information. You don’t have to tell it about the color of the road or which way to turn or where the horizon is. You just pass in data of your own driving and it learns.
They also released a research paper that details their efforts to build a simulator that generates future road images from an existing camera shot. Basically, what do you think the road will look like a few milliseconds from now?
Oh, and if you want a job at Comma, they recommend you do something cool with the dataset.