Deep Learning Projects by Udacity Students

Udacity democratizes education by bringing world-class instruction to students around globe. Often, we’re humbled to see how students build on that education to create their own projects outside of the classroom.

Here are five amazing deep learning projects by students in the Udacity Self-Driving Car Engineer Nanodegree Program.

HomographyNet: Deep Image Homography Estimation

Mez Gebre

Mez starts off with a plain-English explanation of what isomorphism and homography are. Homography is basically the study of how one object can look different when viewed from different places. Think about how your image of a car changes when you take a step to the left and look at it again.

After the conceptual explanation, Mez dives into the mechanics of how to combine computer vision, image processing, and deep learning to train a VGG-style network to perform homography.

I imagine this could be a useful technique for visual localization, as it helps you stitch together different images into a larger map.

“HomographyNet is a VGG style CNN which produces the homography relating two images. The model doesn’t require a two stage process and all the parameters are trained in an end-to-end fashion!”

ConvNets Series. Image Processing: Tools of the Trade

Kirill Danilyuk

Kirill uses the Traffic Sign Classifier Project from the Nanodegree Program as a jumping off point for discussing approaches to image pre-processing. He covers three approaches: visualization, scikit-learn, and data augmentation. Critical topics for any perception engineer!

“Convnets cannot be fed with “any” data at hand, neither they can be viewed as black boxes which extract useful features “automagically”. Bad to no preprocessing can make even a top-notch convolutional network fail to converge or provide a low score. Thus, image preprocessing and augmentation (if available) is highly recommended for all networks.”

Launch a GPU-backed Google Compute Engine instance and setup Tensorflow, Keras and Jupyter

Steve Domin

We teach students in the Nanodegree Program how to use Amazon Web Services to launch a virtual server with a GPU, which accelerates training neural networks. There are alternatives, though, and Steve does a great job explaining how you would accomplish the same thing using Google Cloud Platform.

“Good news: if it’s your first time using Google Cloud you are also eligible for $300 in credits! In order to get this credit, click on the big blue button “Sign up for free trial” in the top bar.”

Yolo-like network for vehicle detection using KITTI dataset

Vivek Yadav

Vivek has written terrific posts on a variety of neural network architectures. In this post, which is the first in a series, he prepares YOLO v2 to classify KITTI data. He goes over six pre-processing steps: learning bounding boxes, preprocessing the ground truth bounding boxes, preprocessing the ground truth labels, overfitting an initial network (a Vivek specialty), data augmentation, and transfer learning.

“ YOLOv2 has become my go-to algorithm because the authors correctly identified majority of short comings of YOLO model, and made specific changes in their model to address these issues. Futher YOLOv2 borrows several ideas from other network designs that makes it more powerful than other models like Single Shot Detection.”

Sachin Abeywardana

Sachin has built an 18 lesson curriculum for deep learning, hosted via GitHub, called The lessons start with the math of deep learning, take students through building feedforward and convolutional networks, and finish with using LSTMs to classify #FakeNews! Yay, 21st century America.


Make Deep Learning easier (minimal code).

Minimise required mathematics.

Make it practical (runs on laptops).

Open Source Deep Learning Learning.

Cool Projects from Udacity Students

I have a pretty awesome backlog of blog posts from Udacity Self-Driving Car students, partly because they’re doing awesome things and partly because I fell behind on reviewing them for a bit.

Here are five that look pretty neat.

Visualizing lidar data

Alex Staravoitau

Alex visualizes lidar data from the canonical KITTI dataset with just a few simple Python commands. This is a great blog post if you’re looking to get started with point cloud files.

“A lidar operates by streaming a laser beam at high frequencies, generating a 3D point cloud as an output in realtime. We are going to use a couple of dependencies to work with the point cloud presented in the KITTI dataset: apart from the familiar toolset of numpy and matplotlib we will use pykitti. In order to make tracklets parsing math easier we will use a couple of methods originally implemented by Christian Herdtweck that I have updated for Python 3, you can find them in source/ in the project repo.”

TensorFlow with GPU on your Mac

Darien Martinez

The most popular laptop among Silicon Valley software developers is the Macbook Pro. The current version of the Macbook Pro, however, does not include an NVIDIA GPU, which restricts its ability to use CUDA and cuDNN, NVIDIA’s tools for accelerating deep learning. However, older Macbook Pro machines do have NVIDIA GPUs. Darien’s tutorial shows you how to take advantage of this, if you do have an older Macbook Pro.

“Nevertheless, I could see great improvements on performance by using GPUs in my experiments. It worth trying to have it done locally if you have the hardware already. This article will describe the process of setting up CUDA and TensorFlow with GPU support on a Conda environment. It doesn’t mean this is the only way to do it, but I just want to let it rest somewhere I could find it if I needed in the future, and also share it to help anybody else with the same objective. And the journey begins!”

(Part 1) Generating Anchor boxes for Yolo-like network for vehicle detection using KITTI dataset.

Vivek Yadav

Vivek is constantly posting super-cool things he’s done with deep neural networks. In this post, he applies YOLOv2 to the KITTI dataset. He does a really nice job going through the process of how he prepares the data and selects his parameters, too.

“In this post, I covered the concept of generating candidate anchor boxes from bounding box data, and then assigning them to the ground truth boxes. The anchor boxes or templates are computed using K-means clustering with intersection over union (IOU) as the distance measure. The anchors thus computed do not ignore smaller boxes, and ensure that the resulting anchors ensure high IOU between ground truth boxes. In generating the target for training, these anchor boxes are assigned or are responsible for predicting one ground truth bounding box. The anchor box that gives highest IOU with the ground truth data when located at its center is responsible for predicting that ground truth label. The location of the anchor box is the center of the grid cell within which the ground truth box falls.”

Building a Bayesian deep learning classifier

Kyle Dorman

“Illustrating the difference between aleatoric and epistemic uncertainty for semantic segmentation. You can notice that aleatoric uncertainty captures object boundaries where labels are noisy. The bottom row shows a failure case of the segmentation model, when the model is unfamiliar with the footpath, and the corresponding increased epistemic uncertainty.” link

This post is kind of a tour de force in investigating the links between probability, deep learning, and epistemology. Kyle is basically replicating and summarizing the work of Cambridge researchers who are trying to merge Bayesian probability with deep learning learning. It’s long, and it will take a few passes through to grasp everything here, but I am interested in Kyle’s assertion that this is a path to merge deep learning and Kalman filters.

“Self driving cars use a powerful technique called Kalman filters to track objects. Kalman filters combine a series of measurement data containing statistical noise and produce estimates that tend to be more accurate than any single measurement. Traditional deep learning models are not able to contribute to Kalman filters because they only predict an outcome and do not include an uncertainty term. In theory, Bayesian deep learning models could contribute to Kalman filter tracking.”

Build your own self driving (toy) car

Bogdan Djukic

Bogdon started off with the now-standard Donkey Car instructions, and actually got ROS running!

“I decided to go for Robotic Operating System (ROS) for the setup as middle-ware between Deep learning based auto-pilot and hardware. It was a steep learning curve, but it totally paid off in the end in terms of size of the complete code base for the project.”

Ian Goodfellow on the Future of Deep Learning

Ian Goodfellow on the right, and my Udacity colleague, Mat Leonard, on the left.

Ian Goodfellow recently published a list of the top “areas of expansion” for deep learning in response to a Quora question.

The number one item on the list is:

“Better reinforcement learning / integration of deep learning and reinforcement learning. Reinforcement learning algorithms that can reliably learn how to control robots, etc.”

To a large extent, this depends on how well we can map features and performance from a simulator (where we would perform reinforcement learning) to the real world. So far, this has been a challenge, but I’ve seen several companies recently working on this problem.

The other seven items on the list are all worth a read, too.

And if you’d like to learn more from Ian, he has a book, and also he’s an instructor in Udacity’s Deep Learning Foundations Nanodegree Program.

Literature Review: Fully Convolutional Networks

Here’s what I pulled out of “Fully Convolutional Networks for Semantic Segmentation”, by Long, Shelhamer, and Darrell, all at UC Berkeley. This is a pretty important research result for semantic segmentation, which we’ll be covering in the elective Advanced Deep Learning Module in the Udacity Self-Driving Car Program.


The ultimate goal of FCNs is to produce “semantic segmentation”. This is an output that is the same size as the original input image, and roughly resembles the original input image, but in which each pixel in the image is colored one of C colors, where C is the number of classes we are segmenting.

For a road image, this could be as simple as C=2 (“road”, or “not road”). Or C could capture a much richer class set.

An example of semantic segmentation from the KITTI dataset.

Fully Convolutional

The basic idea behind a fully convolutional network is that it is “fully convolutional”, that is, all of its layers are convolutional layers.

FCNs don’t have any of the fully-connected layers at the end, which are typically use for classification. Instead, FCNs use convolutional layers to classify each pixel in the image.

So the final output layer will be the same height and width as the input image, but the number of channels will be equal to the number of classes. If we’re classifying each pixel as one of fifteen different classes, then the final output layer will be height x width x 15 classes.

Using a softmax probability function, we can find the most likely class for each pixel.

Learnable Upsampling

A logistical hurdle to overcome in FCNs is that the intermediate layers typically get smaller and smaller (although often deeper), as striding and pooling reduce the height and width dimensions of the tensors.

FCNs use “deconvolutions”, or essentially backwards convolutions, to upsample the intermediate tensors so that they match the width and height of the original input image.

Justin Johnson has a pretty good visual explanation of deconvolutions (start at slide 46 here).

Because backward convolution layers are just convolutions, turned around, their weights are learnable, just like normal convolutional layers.


Combining Layers

The authors had success converting canonical networks like AlexNet, VGG, and GoogLeNet into FCNs by replacing their final layers. But there was a consistent problem, which was that upsampling from the final convolutional tensor seemed to be inaccurate. Too much spatial information had been lost by all the downsampling in the network.

So they combined upsampling from that final intermediate tensor with upsampling from earlier tensors, to get more precise spatial information.

Pretty neat paper.

Literature Review: MultiNet

Today I downloaded the academic paper “MultiNet: Real-Time Joint Semantic Reasoning for Autonomous Driving”, by Teichmann et al, as the say in the academy.

I thought I’d try to summarize it, mostly as an exercise in trying to understand the paper myself.


This paper appears to originate out of the lab of Raquel Urtasun, the University of Toronto professor who just joined Uber ATG. Prior to Uber, Urtasun compiled the KITTI benchmark dataset.

KITTI has a great collection of images for autonomous driving, corresponding leaderboards in various tasks, like visual odometry and object tracking. The MultiNet paper is part of the overall KITTI Lane Detection leaderboard.

Right now, MultiNet sits at 15th place on the leaderboard, but it’s the top entry that’s been formally written up in an academic paper.


Interestingly, the goal of MultiNet is exactly to win the KITTI Lane Detection competition. Rather, it’s to train a network that can segment the road quickly, in real-time. Adding complexity, the network also detects and classifies vehicles on the road.

¿Por qué no?


The MultiNet architecture is three-headed. The beginning of the network is just VGG16, without the three fully connected layers at the end. This part of the network is the “encoder” part of the standard encoder-decoder architecture.

Conceptually, the “CNN Encoder” reduces each input image down to a set of features. Specifically, 512 features, since the output tensor (“Encoded Features”) of the encoder is 39x12x512.

For each region of an input image, this Encoded Features tensor captures a measure of how strongly each of 512 features is represented in that region.

Since this is a neural network, we don’t really know what these features are, and they may not even really be things we can explain. It’s just whatever things the network learns to be important.

The three-headed outputs are more complex.

Classification: Actually, I just lied. This output head is pretty straightforward. The network applies a 1×1 convolution to the encoded features (I’m not totally sure why), then adds a fully connected layer and a softmax function. Easy.

(Update: Several commenters have added helpful explanations of 1×1 convolutional layers. My uncertainty was actually more about why MultiNet adds a 1×1 convolutional layer in this precise place. After chewing on it, though, I think I understand. Basically, the precise features encoded by the encoder sub-network may not be the best match for classification. Instead, the classification output perform best if the shared features are used to build a new set of features that is specifically tuned for classification. The 1×1 convolutional layer transforms the common encoded features into that new set of features specific to classification.)

Detection: This output head is complicated. They say it’s inspired by Yolo and Faster-RCNN, and involves a series of 1×1 convolutions that output a tensor that has bounding box coordinates.

Remember, however, the encoded features only have dimensions 39×12, while the original input image is a whopping 1248×384. Apparently 39×12 winds up being too small to produce accurate bounding boxes. So the network has “rezoom layers” that combine the first pass at bounding boxes with some of the less down-sampled VGG convolutional outputs.

The result is more accurate bounding boxes, but I can’t really say I understand how this works, at least on a first readthrough.

Segmentation: The segmentation output head applies fully-convolutional upsampling layers to blow up the encoded features from 39x12x512 back to the original image size of 1248x312x2.

The “2” at the end is because this head actually outputs a mask, not the original image. The mask is binary and just marks each pixel in the image as “road” or “not road”. This is actually how the network is scored for the KITTI leaderboard.


The paper includes a detailed discussion of loss function and training. The main point that jumped out at me is that there are only 289 training images in the KITTI lane detection training set. So the network is basically relying on transfer learning from VGG.

It’s pretty amazing that any network can score at levels of 90%+ road accuracy, given a training set of only 289 images.

I’m also surprised that the 200,000 training steps don’t result in severe overfitting.


MultiNet seems like a really neat network, in that it accomplishes several tasks at once, really fast. The writeup is also pretty easy follow, so kudos to them for that.

If you’re so inclined, it might worth downloading the KITTI dataset and trying out some of this on your own.

Udacity Students at Track, in the Didi Challenge, and Building Deep Learning Servers

Udacity Self-Driving Car students have been writing about the Self Racing Cars track day, the Didi Challenge, and building their own deep learning machines!

Self Racing Cars 2017 Photo Gallery — The Day Before

Kunfeng Chen

Udacity students were sponsored by PolySync to compete in the Self-Racing Cars track day at Thunderhill last weekend, and these photos show what it was like!

Self Racing Cars 2017 Photo Gallery — Day 1

Kunfeng Chen

Self Racing Cars 2017 Video Gallery — Shot on iPhone 6

Kunfeng Chen

Deep Learning PC Build

Tim Camber

Here’s how Tim built his own GPU-enabled deep learning machine. He provides helpful instructions, a bill of materials, links to graphs comparing the value of different NVIDIA GPUs.

“The GPU is the main component of our system, and hopefully comprises a significant fraction of the cost of the system. ServeTheHome has a nice article in which they show the following graph of GPU compute per unit price.”

part.1: Didi Udacity Challenge 2017 — Car and pedestrian Detection using Lidar and RGB

This is one student’s journal of tackling the Udacity-Didi Challenge. Pay attention to the different neural network architectures he uses!

“Just from these 2 simple steps, I observed the following possible issues:

Small object detection. This is a well-known weakness in the original plain faster rcnn net.

Creation of 2d top view image could be slow. There are quite a number of 3d points needs to be processed

Now that I am sure that the implementation is correct, the next step will be to start training with the actual dataset, which contains many images.”

6 Awesome Projects from Udacity Students (and 1 Awesome Thinkpiece)

Udacity students are constantly impressing us with their skill, ingenuity, and their knowledge of the most obscure features in Slack.

Here are 6 blog posts that will astound you, and 1 think-piece that will blow your mind.

How to identify a Traffic Sign using Machine Learning !!

Sujay Babruwad

Sujay’s managed his data in a few clever ways for the traffic sign classifier project. First, he converted all of his images to grayscale. Then he skewed and augmented them. Finally, he balanced the data set. The result:

“The validation accuracy attained 98.2% on the validation set and the test accuracy was about 94.7%”

Udacity Advance Lane Finding Notes

A Nguyen

An’s post is a great step-through of how to use OpenCV to find lane lines on the road. It includes lots of code samples!

“Project summary:
– Applying calibration on all chessboard images that are taken from the same camera recording the driving to obtain distort coefficients and matrix.
– Applying perspective transform and warp image to obtain bird-eyes view on road.
– Applying binary threshold by combining derivative x & y, magnitude, direction and S channel.
– Reduce noise and locate left & right lanes by histogram data.
– Draw line lanes over the image”

P5: Vehicle Detection with Linear SVC classification

Rana Khalil

Rana’s video shows the amazing results that are achievable with Support Vector Classifiers. Look at how well the bounding boxes track the other vehicles on the highway!

Updated! My 99.40% solution to Udacity Nanodegree project P2 (Traffic Sign Classification)

Cherkeng Heng

Cherkeng’s approach to the Traffic Sign Classification Project was based on an academic paper that uses “dense blocks” of convolutional layers to fit the training data tightly. He also uses several clever data augmentation techniques to prevent overfitting. Here’s how that works out:

“The new network is smaller with test accuracy of 99.40% and MAC (multiply–accumulate operation counts) of 27.0 million.”

Advanced Lane Line Project

Arnaldo Gunzi

Arnaldo has a thorough walk-through of the Udacity Advanced Lane Finding Project. If you want to know how to use computer vision to find lane lines on the road, this is a perfect guide!

“1 Camera calibration
2 Color and gradient threshold
3 Birds eye view
4 Lane detection and fit
5 Curvature of lanes and vehicle position with respect to center
6 Warp back and display information
7 Sanity check
8 Video”

Build a Deep Learning Rig for $800

Nick Condo

I love this how-to post that lists all the components for a mid-line deep learning rig. Not too cheap, not too expensive. Just right.

Here’s how it does:

“As you can see above, my new machine (labeled “DL Rig”) is the clear winner. It performed this task more than 24 times faster than my MacBook Pro, and almost twice as fast as the AWS p2.large instance. Needless to say, I’m very happy with what I was able to get for the price.”

How Gig Economy Startups Will Replace Jobs with Robots

Caleb Kirksey

Companies like Uber and Lyft and Seamless and Fiverr and Upwork facilitate armies of independent contractors who work “gigs” on their own time, for as much money as they want, but without the structure of traditional employment.

Caleb makes the point that, for all the press the gig economy gets, the end might be in sight. Many of these gigs might soon be replaced by computers and robots. He illustrates this point with his colleague, Eric, who works as a safety driver for the autonomous vehicle startup Auro Robotics. Auro’s whole mission is to eliminate Eric’s job!

“Don’t feel too bad for Eric though. He’s become skilled with hardware and robotics. His experience working in cooperation with a robot can enable him to build better systems that don’t need explicit instructions.”

6 Different End-to-End Neural Networks

One of the highlights of the Udacity Self-Driving Car Engineer Nanodegree Program is the Behavioral Cloning Project.

In this project, each student uses the Udacity Simulator to drive a car around a track and record training data. Students use the data to train a neural network to drive the car autonomously. This is the same problem that world-class autonomous vehicle engineering teams are working on with real cars!

There are so many ways to tackle this problem. Here are six approaches that different Udacity students took.

Self-Driving Car Engineer Diary — 5

Andrew Wilkie

Andrew’s post highlights the differences between the Keras neural network framework and the TensorFlow framework. In particular, Andrew mentions how much he likes Keras:

“We were introduced to Keras and I almost cried tears of joy. This is the official high-level library for TensorFlow and takes much of the pain out of creating neural networks. I quickly added Keras (and Pandas) to my Deep Learning Pipeline.”

Self-Driving Car Simulator — Behavioral Cloning (P3)

Jean-Marc Beaujour

Jean-Marc used extensive data augmentation to improve his model’s performance. In particular, he used images from offset cameras to create “synthetic cross-track error”. He built a small model-predictive controller to correct for this and train the model:

“A synthetic cross-track error is generated by using the images of the left and of the right camera. In the sketch below, s is the steering angle and C and L are the position of the center and left camera respectively. When the image of the left camera is used, it implies that the center of the car is at the position L. In order to recover its position, the car would need to have a steering angle s’ larger than s:

tan(s’) = tan(s) + (LC)/h”

Behavioral Cloning — Transfer Learning with Feature Extraction

Alena Kastsiukavets

Alena used transfer learning to build her end-to-end driving model on the shoulders of a famous neural network called VGG. Her approach worked great. Transfer learning is a really advanced technique and it’s exciting to see Alena succeed with it:

I have chosen VGG16 as a base model for feature extraction. It has good performance and at the same time quite simple. Moreover it has something in common with popular NVidia and models. At the same time use of VGG16 means you have to work with color images and minimal image size is 48×48.

Introduction to Udacity Self-Driving Car Simulator

Naoki Shibuya

The Behavioral Cloning Project utilizes the open-source Udacity Self-Driving Car Simulator. In this post, Naoki introduces the simulator and dives into the source code. Follow Naoki’s instructions and build a new track for us!

“If you want to modify the scenes in the simulator, you’ll need to deep dive into the Unity projects and rebuild the project to generate a new executable file.”

MainSqueeze: The 52 parameter model that drives in the Udacity simulator

Mez Gebre

In this post, Mez explains the implementation of SqueezeNet for the Behavioral Cloning Project. This is smallest network I’ve seen yet for this project. Only 52 parameters!

“With a squeeze net you get three additional hyperparameters that are used to generate the fire module:

1: Number of 1×1 kernels to use in the squeeze layer within the fire module

2: Number of 1×1 kernels to use in the expand layer within the fire module

3: Number of 3×3 kernels to use in the expand layer within the fire module”

GTA V Behavioral Cloning 2

Renato Gasoto

Renato ported his behavioral cloning network to Grand Theft Auto V. How cool is that?!

How Udacity Students Learn Computer Vision

The Udacity Self-Driving Car Engineer Nanodegree Program teaches both standard computer vision techniques, and deep learning with convolutional neural networks.

Both of these approaches can be for working with images, and it’s important to understand standard computer vision techniques, particularly around camera physics. This knowledge improves the performance of almost all image manipulation tools.

Here are some of the skills that Udacity students mastered while using standard computer vision techniques to handle highway perception tasks. Check out how similar these images and videos look to what you might see on cutting edge autonomous driving systems!

Advanced Lane Finding

Milutin N. Nikolic

This is a terrific summary of the mathematics underpinning lane-finding. Milutin covers vanishing points, camera calibration and undistortion, and temporal filtering. If you’re interested in diving into the details of how a camera can find lane lines, this is a great start.

Here’s an example:

“Before we move further on, lets just reflect on what the camera matrix is. The camera matrix encompasses the pinhole camera model in it. It gives the relationship between the coordinates of the points relative to the camera in 3D space and position of that point on the image in pixels. If X, Y and Z are coordinates of the point in 3D space, its position on image (u and v) in pixels is calculated using:

where M is camera matrix and s is scalar different from zero.”

Feature extraction for Vehicle Detection using HOG+

Mohan Karthik

Feature extraction is the key step in building a vehicle detection pipeline. There are a variety of tools that can extract vehicle features that we can use to differentiate vehicles from non-vehicles, including neural networks and gradient thresholds. This post provides a practical guide to using a histogram of oriented gradients (HOG) to extract features. In particular, the examination of different color spaces is of interest:

“Here, we see a decent difference in S and V channel, but not much in the H channel. So maybe in terms of color histogram, RGB and the S & V channel of HSV are looking good.”

Advanced Lane detection

Mehdi Sqalli

This is a step-by-step guide to how to identify lane lines using standard computer vision techniques on a variety of highway driving videos.

“1: Camera Calibration and Image Undistortion.

2: Image filtering.

3: Perspective transform

4: Lane detection

5: Displaying the detected lane.”

Term 2: In-Depth on Udacity’s Self-Driving Car Curriculum

Update: Udacity has a new self-driving car curriculum! The post below is now out-of-date, but you can see the new syllabus here.

The very first class of students has finished Term 1 of the Udacity Self-Driving Car Engineer Nanodegree Program! We are so excited by their accomplishments—they have built traffic sign classifiers, end-to-end neural networks for driving, lane-finding algorithms, and vehicle tracking pipelines.

Now it’s time for Term 2 — hardcore robotics.

The focus of Term 1 was applying machine learning to automotive tasks: deep learning, convolutional neural networks, support vector machines, and computer vision.

In Term 2, students will build the core robotic functions of an autonomous vehicle system: sensor fusion, localization, and control. This is the muscle of a self-driving car!

Term 2

Sensor Fusion

Our terms are broken out into modules, which are in turn comprised of a series of focused lessons. This Sensor Fusion module is built with our partners at Mercedes-Benz. The team at Mercedes-Benz is amazing. They are world-class automotive engineers applying autonomous vehicle techniques to some of the finest vehicles in the world. They are also Udacity hiring partners, which means the curriculum we’re developing together is expressly designed to nurture and advance the kind of talent they would like to hire!

Lidar Point Cloud

Below please find descriptions of each of the lessons that together comprise our Sensor Fusion module:

  1. Sensors
    The first lesson of the Sensor Fusion Module covers the physics of two of the most import sensors on an autonomous vehicle — radar and lidar.
  2. Kalman Filters
    Kalman filters are the key mathematical tool for fusing together data. Implement these filters in Python to combine measurements from a single sensor over time.
  3. C++ Primer
    Review the key C++ concepts for implementing the Term 2 projects.
  4. Project: Extended Kalman Filters in C++
    Extended Kalman filters are used by autonomous vehicle engineers to combine measurements from multiple sensors into a non-linear model. Building an EKF is an impressive skill to show an employer.
  5. Unscented Kalman Filter
    The Unscented Kalman filter is a mathematically-sophisticated approach for combining sensor data. The UKF performs better than the EKF in many situations. This is the type of project sensor fusion engineers have to build for real self-driving cars.
  6. Project: Pedestrian Tracking
    Fuse noisy lidar and radar data together to track a pedestrian.


This module is also built with our partners at Mercedes-Benz, who employ cutting-edge localization techniques in their own autonomous vehicles. Together we show students how to implement and use foundational algorithms that every localization engineer needs to know.

Particle Filter

Here are the lessons in our Localization module:

  1. Motion
    Study how motion and probability affect your belief about where you are in the world.
  2. Markov Localization
    Use a Bayesian filter to localize the vehicle in a simplified environment.
  3. Egomotion
    Learn basic models for vehicle movements, including the bicycle model. Estimate the position of the car over time given different sensor data.
  4. Particle Filter
    Use a probabilistic sampling technique known as a particle filter to localize the vehicle in a complex environment.
  5. High-Performance Particle Filter
    Implement a particle filter in C++.
  6. Project: Kidnapped Vehicle
    Implement a particle filter to take real-world data and localize a lost vehicle.


This module is built with our partners at Uber Advanced Technologies Group. Uber is one of the fastest-moving companies in the autonomous vehicle space. They are already testing their self-driving cars in multiple locations in the US, and they’re excited to introduce students to the core control algorithms that autonomous vehicles use. Uber ATG is also a Udacity hiring partner, so pay attention to their lessons if you want to work there!

Here are the lessons:

  1. Control
    Learn how control systems actuate a vehicle to move it on a path.
  2. PID Control
    Implement the classic closed-loop controller — a proportional-integral-derivative control system.
  3. Linear Quadratic Regulator
    Implement a more sophisticated control algorithm for stabilizing the vehicle in a noisy environment.
  4. Project: Lane-Keeping
    Implement a controller to keep a simulated vehicle in its lane. For an extra challenge, use computer vision techniques to identify the lane lines and estimate the cross-track error.

I hope this gives you a good sense of what students can expect from Term 2! Things may change along the way of course, as we absorb feedback, incorporate new content, and take advantage of new opportunities that arise, but we’re really excited about the curriculum we’ve developed with our partners, and we can’t wait to see what our students build!

In case you’d like a refresher on what was covered in Term 1, you can read my Term 1 curriculum post here.

In closing, if you haven’t yet applied to join the Udacity Self-Driving Car Engineer Nanodegree Program, please do! We are taking applications for the 2017 terms and would love to have you in the class!