Literature Review: Capsule Networks

My Udacity colleague, Cezanne Camacho, is preparing a presentation on capsule networks and gave a draft version in the office today. Cezanne is a terrific engineer and teacher, and she’s already written a great blog post on capsule networks, and she graciously allowed me to share some of that here.

Capsule networks come from a 2017 paper by Sara Sabour, Nicholas Frosst, and Geoffrey Hinton at Google: “Dynamic Routing Between Capsules”. Hinton, in particular, is one of the world’s foremost authorities on neural networks.

As my colleague, Cezanne, writes on her blog:

Capsule Networks provide a way to detect parts of objects in an image and represent spatial relationships between those parts. This means that capsule networks are able to recognize the same object in a variety of different poses even if they have not seen that pose in training data.

Love the Pacman GIF. Did I mention Cezanne is also an artist?

Cezanne explains that a “capsule” encompasses features that make up a piece of an image. Think of an image of a face, for example, and imagine capsules that capture each eye, and the nose, and the mouth.

These capsules organize into a tree structure. Larger structures, like a face, would be parent nodes in the tree, and smaller structures would be child nodes.

“In the example below, you can see how the parts of a face (eyes, nose, mouth, etc.) might be recognized in leaf nodes and then combined to form a more complete face part in parent nodes.”

“Dynamic routing” plays a role in capsule networks:

“Dynamic routing is a process for finding the best connections between the output of one capsule and the inputs of the next layer of capsules. It allows capsules to communicate with each other and determine how data moves through them, according to real-time changes in the network inputs and outputs!”

Dynamic routing is ultimately implemented via an iterative routing process that Cezanne does a really nice job describing, along with the accompanying math, in her blog post.

Capsule networks seem to do well with image classification on a few datasets, but they haven’t been widely deployed yet because they are slow to train.

In case you’d like to play with capsule networks yourself, Cezanne also published a Jupyter notebook with her PyTorch implementation of the Sabour, Frosst, and Hinton paper!

6 Awesome Projects from Udacity Students (and 1 Awesome Thinkpiece)

Udacity students are constantly impressing us with their skill, ingenuity, and their knowledge of the most obscure features in Slack.

Here are 6 blog posts that will astound you, and 1 think-piece that will blow your mind.

How to identify a Traffic Sign using Machine Learning !!

Sujay Babruwad

Sujay’s managed his data in a few clever ways for the traffic sign classifier project. First, he converted all of his images to grayscale. Then he skewed and augmented them. Finally, he balanced the data set. The result:

“The validation accuracy attained 98.2% on the validation set and the test accuracy was about 94.7%”

Udacity Advance Lane Finding Notes

A Nguyen

An’s post is a great step-through of how to use OpenCV to find lane lines on the road. It includes lots of code samples!

“Project summary:
– Applying calibration on all chessboard images that are taken from the same camera recording the driving to obtain distort coefficients and matrix.
– Applying perspective transform and warp image to obtain bird-eyes view on road.
– Applying binary threshold by combining derivative x & y, magnitude, direction and S channel.
– Reduce noise and locate left & right lanes by histogram data.
– Draw line lanes over the image”

P5: Vehicle Detection with Linear SVC classification

Rana Khalil

Rana’s video shows the amazing results that are achievable with Support Vector Classifiers. Look at how well the bounding boxes track the other vehicles on the highway!

Updated! My 99.40% solution to Udacity Nanodegree project P2 (Traffic Sign Classification)

Cherkeng Heng

Cherkeng’s approach to the Traffic Sign Classification Project was based on an academic paper that uses “dense blocks” of convolutional layers to fit the training data tightly. He also uses several clever data augmentation techniques to prevent overfitting. Here’s how that works out:

“The new network is smaller with test accuracy of 99.40% and MAC (multiply–accumulate operation counts) of 27.0 million.”

Advanced Lane Line Project

Arnaldo Gunzi

Arnaldo has a thorough walk-through of the Udacity Advanced Lane Finding Project. If you want to know how to use computer vision to find lane lines on the road, this is a perfect guide!

“1 Camera calibration
2 Color and gradient threshold
3 Birds eye view
4 Lane detection and fit
5 Curvature of lanes and vehicle position with respect to center
6 Warp back and display information
7 Sanity check
8 Video”

Build a Deep Learning Rig for $800

Nick Condo

I love this how-to post that lists all the components for a mid-line deep learning rig. Not too cheap, not too expensive. Just right.

Here’s how it does:

“As you can see above, my new machine (labeled “DL Rig”) is the clear winner. It performed this task more than 24 times faster than my MacBook Pro, and almost twice as fast as the AWS p2.large instance. Needless to say, I’m very happy with what I was able to get for the price.”

How Gig Economy Startups Will Replace Jobs with Robots

Caleb Kirksey

Companies like Uber and Lyft and Seamless and Fiverr and Upwork facilitate armies of independent contractors who work “gigs” on their own time, for as much money as they want, but without the structure of traditional employment.

Caleb makes the point that, for all the press the gig economy gets, the end might be in sight. Many of these gigs might soon be replaced by computers and robots. He illustrates this point with his colleague, Eric, who works as a safety driver for the autonomous vehicle startup Auro Robotics. Auro’s whole mission is to eliminate Eric’s job!

“Don’t feel too bad for Eric though. He’s become skilled with hardware and robotics. His experience working in cooperation with a robot can enable him to build better systems that don’t need explicit instructions.”

Udacity Students Experiment with Neural Networks and Computer Vision

The Udacity Self-Driving Car Engineer Nanodegree Program requires students to complete a number of projects, and each project requires some experimentation from students to figure out a solution that works.

Here are five posts by Udacity students, outlining how they used experimentation to complete their projects.

Self-Driving Car Engineer Diary — 4

Andrew Wilkie

Andrew has lots of images in this blog post, including a spreadsheet of all the different functions he used in building his Traffic Sign Classifier with TensorFlow!

I got to explore TensorFlow and various libraries (see table below), different convolutional neural network models, pre-processing images, manipulating n-dimensional arrays and learning how to display results.

Intricacies of Traffic Sign Classification with TensorFlow

Param Aggarwal

In this post, Param goes step-by-step through his iterative process of finding the right combination of pre-processing, augmentation, and network architecture for classifying traffic signs. 54 neural network architectures in all!

I went crazy by this point, nothing I would do would push me into the 90% range. I wanted to cry. A basic linearly connected model was giving me 85% and here I am using the latest hotness of convolution layers and not able to match.

I took a nap.

Backpropagation Explained

Jonathan Mitchell

Backpropagation is the most difficult and mind-bending concept to understand about deep neural networks. After backpropagation, everything else is a piece of cake. In this concise post, Jonathan takes a crack and summarizing backpropagation in a few paragraphs.

When we are training a neural network we need to figure out how to alter a parameter to minimize the cost/loss. The first step is to find out what effect that parameter has on the loss. Then find the total loss up to that parameters point and perform the gradient descent update equation to that parameter.

Teaching a car to drive itself

Arnaldo Gunzi

Arnaldo presents a number of lessons he learned while designing an end-to-end network for driving in the Behavioral Cloning Project. In particular, he came to appreciate the power of GPUSs.

Using GPU is magic. Is like to give a Coke to someone in the desert. Or to buy a new car — the feeling of ‘how I was using that crap old one’. Or to find a shortcut in the route to the office: you’ll never use the long route again. Or to find a secret code in a game that give superpowers…

Robust Extrapolation of Lines in Video Using Probabilistic Hough Transform

Esmat Nabil

Esmat presents a well-organized outline of his Finding Lane Lines Porject and the computer vision pipeline that he used. In particular, he has a nice explanation of the Hough transform, which is a tricky concept!

The probabilistic Hough line transform more efficient implementation of Hough transform. It gives as output the extremes of the detected lines (x0, y0, x1, y1). It is difficult to detect straight lines which are part of a curve because they are very very small. For detecting such lines it is important to properly set all the parameters of Hough transform. Two of most important parameters are: Hough votes and maximum distance between points which are to be joined to make a line. Both parameters are set at their minimum value.

CarND Students on Preparation, Generalization, and Hacking Cars

Here are five great posts from students in Udacity’s Self-Driving Car Engineer Nanodegree Program, dealing with generalizing machine learning models and hacking cars!


Daniel Stang

Daniel has devoted a section of his blog to the Self-Driving Car projects, including applying his lane-line finder to video he took himself!

The first project for the Udacity Self-Driving Car Nanodegree was to create a software pipeline capable of detecting the lane lines in video feed. The project was done using python with the bulk of work being performed using the OpenCV library. The video to the side shows the software pipeline I developed in action using video footage I took myself.

Traffic Sign Classifier: Normalising Data

Jessica Yung

Jessica’s post discusses the need to normalize image data before feeding it into a neural network, including a bonus explainer on the differences between normalization and standardization.

The same range of values for each of the inputs to the neural network can guarantee stable convergence of weights and biases. (Source: Mahmoud Omid on ResearchGate)

Suppose we have one image that’s really dark (almost all black) and one that’s really bright (almost all white). Our model has to address both cases using the same parameters (weights and biases). It’s hard for our model to be accurate and generalise well if it has to tackle both extreme cases.

Hardware, tools, and cardboard mockups.

Dylan Brown

Dylan is a student in both the Georgia Tech Online Master’s in Computer Science Program (run by Udacity) and also in CarND. He’s also turning his own Subaru into a self-driving car! (Note: We do not recommend this.)

Below I’ve put together a list of purchases needed for this project. There will definitely be more items coming soon, at least a decent power supply or UPS. Thankfully, this list covers all the big-ticket items.

Jetson TX1 Developement Kit (with .edu discount) NVIDIA $299
ZED Stereo Camera with 6-axis pose Stereolabs $449 CAN(-FD) to USB interface PEAK-System $299 
Touch display, 10.1” Toguard $139 
Wireless keyboard K400 Logitech $30 
Total $1216

Self-Driving Car Engineer Diary — 1

Andrew Wilkie

Andrew has a running blog of his experiences in CarND, including his preparation.

I REALLY want a deep understanding of the material so followed Gilad Gressel’s recommendation (course mentor): Essence Of Linear Algebra (for linear classifiers which is step 1 towards CNNs), Gradients & Derivatives (for back propagation understanding) and CS231n: Convolutional Neural Networks for Visual Recognition lectures (for full Neural Networks and Convolutional Deep Neural Networks understanding).

Comparing model performance: Including Max Pooling and Dropout Layers

Jessica Yung

Another post by Jessica Yung! This time, she runs experiments on her model by training with and without different layers, to see which version of the model generalizes best.

Means of training accuracy - validation accuracy in epochs 80-100 (lower gap first):

Pooling and dropout (0.0009)
Dropout but no pooling (0.0061)
Pooling but no dropout (0.0069)
No pooling or dropout (0.0094)

Should You Understand Backpropagation?

Backpropagation is a leaky abstraction; it is a credit assignment scheme with non-trivial consequences. If you try to ignore how it works under the hood because “TensorFlow automagically makes my networks learn”, you will not be ready to wrestle with the dangers it presents, and you will be much less effective at building and debugging neural networks.

That is from the excellent Andrej Karpathy, “Yes you should understand backprop”.

I say it’s possible to use deep neural networks quite effectively without truly understanding backprop. But if your goal is to specialize in the field and apply this tool to a range of problems, then “yes you should understand backprop”.

By the way, @karpathy is a prolific Twitter feed with 37,100 followers.

Artificial Intelligence, Machine Learning, Deep Learning

Udacity has separate courses on Artificial Intelligence, Machine Learning (actually we have two), and Deep Learning.

What is the difference between all of these? It can be a little hard to explain.

Fortunately, NVIDIA has a nice blog post up explaining these concepts as concentric circles:

The easiest way to think of their relationship is to visualize them as concentric circles with AI — the idea that came first — the largest, then machine learning — which blossomed later, and finally deep learning — which is driving today’s AI explosion — fitting inside both.

I guess if I had to explain, I would say that:

  1. “artificial intelligence” refers to techniques that help computers accomplish goals
  2. “machine learning” refers to techniques that help computers accomplish goals by learning from data
  3. “deep learning” refers to techniques that help computers accomplish goals by using deep neural networks to learn from data

But if you’re interested in these topics, then read the NVIDIA post. It’s good.

Behavioral Cloning

One of the first modules in our Self-Driving Car Nanodegree program will be Deep Learning. This is such a fun topic!

We’ll be covering behavioral cloning, which is a technique whereby you drive the car (or the simulated car, in this case) yourself and then pass the data to a neural network. The neural network trains on your driving data and auto-magically learns how to drive the car, without any other information. You don’t have to tell it about the color of the road or which way to turn or where the horizon is. You just pass in data of your own driving and it learns.

By the end, students will be building their own neural networks to drive cars, just like in this video. Releases a Dataset

Yesterday George Hotz and the team released a dataset of highway driving. 7.5 hours of camera images, steering angles, and other vehicle data.

Hotz says his goal is for other companies to be able to develop self-driving systems without making the mistakes his team made.

They also released a research paper that details their efforts to build a simulator that generates future road images from an existing camera shot. Basically, what do you think the road will look like a few milliseconds from now?

Oh, and if you want a job at Comma, they recommend you do something cool with the dataset.