Graph Neural Networks

A Waymo blog post caught my eye recently, “VectorNet: Predicting behavior to help the Waymo Driver make better decisions.”

The blog post describes how Waymo uses deep learning to tackle the challenging problem of predicting the future. Specifically, Waymo vehicles need to predict what everyone else on the road is going to do.

As Mercedes-Benz engineers teach in Udacity’s Self-Driving Car Engineer Nanodegree Program, approaches to this problem tend to be either model-based or data-driven.

A model-based approach relies on our knowledge (“model”) of how actors behave. A car turning left through an intersection is likely to continue turning left, rather than come to a complete stop, or reverse, or switch to a right-turn.

A data-driven approach uses machine learning to process data from real world-observations and apply the resulting model to new scenarios.

VectorNet is a data-driven approach takes relies heavily on the semantic information from its high-definition maps. Waymo converts semantic information — turn lanes, stop lines, intersections — into vectors, and then feeds those vectors into a hierarchical graph neural network.

I’m a bit out of touch with the state-of-the-art in deep learning, so I followed a link from Waymo down a rabbit hole. First I read “An Illustrated Guide to Graph Neural Networks,” by a Singaporean undergrad named Rishabh Anand.

That article led me to an hour-long lecture on GNNs by Islem Rekik at Istanbul Technical University.

It was a longer rabbit hole than I anticipated, but this talk was just right for me. It has a quick fifteen minute review of CNNs, followed by a quick fifteen minute review of graph theory. About thirty-minutes in she does a really nice job covering the fundamentals of graph neural networks and how they allow us to feed structured data from a graph into a neural network.

Now that I have a bit of an understanding of GNNs, I’ll need to pop all the way back up to the Waymo blog post and follow it to their academic paper, “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation.”

The Waymo team is scheduled to present that paper at CVPR 2020 next month.

Literature Review: Capsule Networks

My Udacity colleague, Cezanne Camacho, is preparing a presentation on capsule networks and gave a draft version in the office today. Cezanne is a terrific engineer and teacher, and she’s already written a great blog post on capsule networks, and she graciously allowed me to share some of that here.

Capsule networks come from a 2017 paper by Sara Sabour, Nicholas Frosst, and Geoffrey Hinton at Google: “Dynamic Routing Between Capsules”. Hinton, in particular, is one of the world’s foremost authorities on neural networks.

As my colleague, Cezanne, writes on her blog:

Capsule Networks provide a way to detect parts of objects in an image and represent spatial relationships between those parts. This means that capsule networks are able to recognize the same object in a variety of different poses even if they have not seen that pose in training data.

Love the Pacman GIF. Did I mention Cezanne is also an artist?

Cezanne explains that a “capsule” encompasses features that make up a piece of an image. Think of an image of a face, for example, and imagine capsules that capture each eye, and the nose, and the mouth.

These capsules organize into a tree structure. Larger structures, like a face, would be parent nodes in the tree, and smaller structures would be child nodes.

“In the example below, you can see how the parts of a face (eyes, nose, mouth, etc.) might be recognized in leaf nodes and then combined to form a more complete face part in parent nodes.”

“Dynamic routing” plays a role in capsule networks:

“Dynamic routing is a process for finding the best connections between the output of one capsule and the inputs of the next layer of capsules. It allows capsules to communicate with each other and determine how data moves through them, according to real-time changes in the network inputs and outputs!”

Dynamic routing is ultimately implemented via an iterative routing process that Cezanne does a really nice job describing, along with the accompanying math, in her blog post.

Capsule networks seem to do well with image classification on a few datasets, but they haven’t been widely deployed yet because they are slow to train.

In case you’d like to play with capsule networks yourself, Cezanne also published a Jupyter notebook with her PyTorch implementation of the Sabour, Frosst, and Hinton paper!

The “MiniFlow” Lesson

Exploring how to build a Self-Driving Car, step-by-step with Udacity!

Editor’s note: David Silver (Program Lead for Udacity’s Self-Driving Car Engineer Nanodegree program), continues his mission to write a new post for each of the 67 lessons currently in the program. We check in with him today as he introduces us to Lesson 5!

The 5th lesson of the Udacity Self-Driving Car Engineer Nanodegree Program is “MiniFlow.” Over the course of this lesson, students build their own neural network library, which we call MiniFlow.

The lesson starts with a fairly basic, feedforward neural network, with just a few layers. Students learn to build the connections between the artificial neurons and implement forward propagation to move calculations through the network.

A feedforward network.

The real mind-bend comes in the “Linear Transform” concept, where we go from working with individual neurons to working with layers of neurons. Working with layers allows us to dramatically accelerate the calculations of the networks, because we can use matrix operations and their associated optimizations to represent the layers. Sometimes this is called vectorization, and it’s a key to why deep learning has become so successful.

Once students implement layers in MiniFlow, they learn about a particular activation function: the sigmoid function. Activation functions define the extent to which each neuron is “on” or “off”. Sophisticated activation functions, like the sigmoid function, don’t have to be all the way “on” or “off”. They can hold a value somewhere along the activation function, between 0 and 1.

The sigmoid function.

The next step is to train the network to better classify our data. For example, if we want the network to recognize handwriting, we need to adjust the weight associated with each neuron in order to achieve the correct classification. Students implement an optimization technique called gradient descent to determine how to adjust the weights of the network.

Gradient descent, or finding the lowest point on the curve.

Finally, students implement backpropagation to relay those weight adjustments backwards through the networks, from finish to start. If we do this thousands of times, hopefully we’ll wind up with a trained, accurate network.

And once students have finished this lesson, they have their own Python library they can use to build as many neural networks as they want!

If all of that sounds interesting to you, maybe you should apply to join the Udacity Self-Driving Car Engineer Nanodegree Program and learn to become a Self-Driving Car Engineer!

The “Introduction to Neural Networks” Lesson

Editor’s note: On November 1st of this year, David Silver (Program Lead for Udacity’s Self-Driving Car Engineer Nanodegree program) made a pledge to write a new post for each of the 67 lessons currently in the program. We check in with him today as he introduces us to Lesson 4!

The 4th lesson of the Udacity Self-Driving Car Engineer Nanodegree Program introduces students to neural networks, a powerful machine learning tool.

This is a fast lesson that covers the basic mechanics of machine learning and how neural networks operate. We save a lot of the details for later lessons.

My colleague Luis Serrano starts with a quick overview of how regression and gradient descent work. These are foundational machine learning concepts that almost any machine learning tool builds from.

Luis is great at this stuff. I love Mt. Errorest.

Moving on from these lessons, Luis goes deeper into the distinction between linear and logistic regression and then explores how these concepts can reveal the principles behind a basic neural network.

See the slash between the red and green colors there? If you ever meet Luis in person, ask him to sing you the forward-slash-backward-slash alphabet song. It’s amazing.

From here we introduce perceptrons, which historically were the precursor to the “artificial neurons” that make up a neural network.

As we string together lots of these perceptrons, or “artificial neurons”, my colleague Mat Leonard shows that we can take advantage of a process called backpropagation, that helps train the network to perform a task.

And that’s basically what a neural network is: a machine learning tool built from layers of artificial neurons, which takes an input and produces an output, trained via backpropagation.

This lesson has 23 concepts (pages), so there’s a lot more to it than the 3 videos I posted here. If some of this looks confusing, don’t worry! There’s a lot more detail in the lesson, as well as lots of quizzes to help make sure you get it.

If you find neural networks interesting in their own right, perhaps you should sign up for Udacity’s Deep Learning Nanodegree Foundation Program. And if you find them interesting for how they can help us build a self-driving car, then of course you should apply to join the Udacity Self-Driving Car Nanodegree Program!

Deep Learning

I have been studying a little bit about deep learning recently, and hope to learn more over the next week.

In particular, I have been progressing through NVIDIA’s introductory Deep Learning course, which offers an overview of Deep Neural Networks (DNNs). The course covers three DNN frameworks (Caffe, Theano, and Torch) and one visualization tool (DIGITS).

This type of course is super-helpful, in that it’s geared toward practitioners and problem-solving, and less on the theory of DNNs. The Caffe framework, combined with the DIGITS visualization tool, seems particularly well-suited to quickly constructing a DNN and seeing where it leads.

So I’m a big fan of the NVIDIA course.

Next I’d like to take either Coursera’s Neural Networks for Machine Learning, or Udacity’s Deep Learning.

Coursera’s course is taught by the famed neural network researcher Geoffrey Hinton, whereas Udacity’s courses have a great UI and often a more practical (versus theoretical) approach.

I’ll let you know what I choose, and let me know if you have any recommendations!