The “Convolutional Neural Networks” Lesson

The 8th lesson of the Udacity Self-Driving Car Engineer Nanodegree program is “Convolutional Neural Networks.” This is where students learn to apply deep learning to camera images!

Convolutional neural networks (CNNs) are a special category of deep neural networks that are specifically designed to work with images. CNNs have multiple layers, with each layer connected to the next by “convolutions.”

In practice, what this means is that we slide a patch-like “filter” over the input layer, and the filter applies weights to each artificial neuron in the input layer. The filter connects to a single artificial neuron in the output layer, thereby connecting each neuron in the output layer to a small set of neurons from the input layer.

To make this more concrete, consider this photograph of a dog:

When we run this photograph through a CNN, we’ll slide a filter over the image:

This filter will, broadly speaking, identify basic “features.” It might identify one frame as a curve, and another as a hole:


The next layer in the CNN would pass a different filter over a stack of these basic features, and identify more sophisticated features, like a nose:


The final layer of the CNN is responsible for classifying these increasingly sophisticated features as a dog.

This is of course simplified for the sake of explanation, but hopefully it helps to make the process clear.

One of the more vexing aspects of deep learning is that the actual “features” that a network identifies are not necessarily anything humans would think of as a “curve” or a “nose.” The network learns whatever it needs to learn in order to identify the dog most effectively, but that may not be anything humans can really describe well. Nonetheless, this description gets at the broad scope of how a CNN works.

Once students learn about CNNs generally, it’s time to practice building and training them with TensorFlow. As Udacity founder Sebastian Thrun says, “You don’t lose weight by watching other people exercise.” You have to write the code yourself!

The back half of the lesson covers some deep learning topics applicable to CNNs, like dropout and regularization.

The lesson ends with a lab in which students build and train LeNet, the famous network by Yann LeCun, to identify characters. This is a classic exercise for learning convolutional neural networks, and great way to learn the fundamentals.

Ready to start learning how to build self-driving cars yourself? Great! If you have some experience already, you can apply to our Self-Driving Car Engineer Nanodegree program here, and if you’re just getting started, then we encourage you to enroll in our Intro to Self-Driving Cars Nanodegree program here!


Thanks to my former colleague, Dhruv Parthasarathy, who built out this intuitive explanation in even greater detail as part of this lesson!

We’re also grateful to Vincent Vanhoucke, Principal Scientist at Google Brain, who teaches the free Udacity Deep Learning course, from which we drew for this lesson.

How Self-Driving Cars Work

Earlier this fall I spoke about how self-driving cars work at TEDxWilmington’s Transportation Salon, which was a lot of fun.

The frame for my talk was a collection of projects students have done as part of the Udacity Self-Driving Car Engineer Nanodegree Program.

So, how do self-driving cars work?

Glad you asked!

Self-driving cars have five core components:

  1. Computer Vision
  2. Sensor Fusion
  3. Localization
  4. Path Planning
  5. Control

Computer vision is how we use cameras to see the road. Humans demonstrate the power of vision by handling a car with basically just two eyes and a brain. For a self-driving car, we can use camera images to find lane lines, or track other vehicles on the road.

Sensor fusion is how we integrate data from other sensors, like radar and lasers—together with camera data—to build a comprehensive understanding of the vehicle’s environment. As good as cameras are, there are certain measurements — like distance or velocity — at which other sensors excel, and other sensors can work better in adverse weather, too. By combining all of our sensor data, we get a richer understanding of the world.

Localization is how we figure out where we are in the world, which is the next step after we understand what the world looks like. We all have cellphones with GPS, so it might seem like we know where we are all the time already. But in fact, GPS is only accurate to within about 1–2 meters. Think about how big 1–2 meters is! If a car were wrong by 1–2 meters, it could be off on the sidewalk hitting things. So we have much more sophisticated mathematical algorithms that help the vehicle localize itself to within 1–2 centimeters.

Path planning is the next step, once we know what the world looks like, and where in it we are. In the path planning phase, we chart a trajectory through the world to get where we want to go. First, we predict what the other vehicles around us will do. Then we decide which maneuver we want to take in response to those vehicles. Finally, we build a trajectory, or path, to execute that maneuver safely and comfortably.

Control is the final step in the pipeline. Once we have the trajectory from our path planning block, the vehicle needs to turn the steering wheel and hit the throttle or the brake, in order to follow that trajectory. If you’ve ever tried to execute a hard turn at a high speed, you know this can get tricky! Sometimes you have an idea of the path you want the car to follow, but actually getting the car to follow that path requires effort. Race car drivers are phenomenal at this, and computers are getting pretty good at it, too!

The video at the beginning of this post covers similar territory, and I hope between that, and what I’ve written here, you have a better sense of how Self-Driving Cars work.

Ready to start learning how to do it yourself? Apply for our Self-Driving Car Engineer Nanodegree program, or enroll in our Intro to Self-Driving Cars Nanodegree program, depending on your experience level, and let’s get started!

The Driverless Future of Los Angeles

I landed at Los Angeles International Airport this morning, on the way to a friend’s wedding in Rancho Palos Verdes.

As the LA Basin spread out underneath our approach, I was treated to a wonderful view of the LA sprawl, which really can look beautiful in the morning sun, from thousands of feet up.

The 405 stretched from Beverly Hills in the north to Long Beach in the south, and the 5 shot from Downtown LA to Anaheim and beyond.

Los Angeles is the ultimate car city. Most sprawl, worst traffic, best status cars, largest parking lots. Everything.

What will happen when drivers are no longer needed?

Parking will be a big change. More so than any other city (I think), LA devotes lots of valuable land to parking. That prime real estate will become a lot more productive once the parking demand subsides.

Traffic will be another change. There are mixed views about whether driverless cars will improve traffic (better, faster drivers), or worsen it (more miles traveled).

Sprawl may actually change LA more than other cities. Southern California has already maxed out its sprawl capacity. Hemmed in by mountains to the north and east, the ocean to the west, it has sprawled south into Orange County and San Diego, but even there minimal rural land is available for further expansion. Unlike, say Houston or Phoenix.

So, in the optimistic case, maybe driverless cars spur infill and bring forth a denser but also less congested Los Angeles.

Originally published at on December 5, 2015.