Udacity Students Experiment with Neural Networks and Computer Vision

The Udacity Self-Driving Car Engineer Nanodegree Program requires students to complete a number of projects, and each project requires some experimentation from students to figure out a solution that works.

Here are five posts by Udacity students, outlining how they used experimentation to complete their projects.

Self-Driving Car Engineer Diary — 4

Andrew Wilkie

Andrew has lots of images in this blog post, including a spreadsheet of all the different functions he used in building his Traffic Sign Classifier with TensorFlow!

I got to explore TensorFlow and various libraries (see table below), different convolutional neural network models, pre-processing images, manipulating n-dimensional arrays and learning how to display results.

Intricacies of Traffic Sign Classification with TensorFlow

Param Aggarwal

In this post, Param goes step-by-step through his iterative process of finding the right combination of pre-processing, augmentation, and network architecture for classifying traffic signs. 54 neural network architectures in all!

I went crazy by this point, nothing I would do would push me into the 90% range. I wanted to cry. A basic linearly connected model was giving me 85% and here I am using the latest hotness of convolution layers and not able to match.

I took a nap.

Backpropagation Explained

Jonathan Mitchell

Backpropagation is the most difficult and mind-bending concept to understand about deep neural networks. After backpropagation, everything else is a piece of cake. In this concise post, Jonathan takes a crack and summarizing backpropagation in a few paragraphs.

When we are training a neural network we need to figure out how to alter a parameter to minimize the cost/loss. The first step is to find out what effect that parameter has on the loss. Then find the total loss up to that parameters point and perform the gradient descent update equation to that parameter.

Teaching a car to drive itself

Arnaldo Gunzi

Arnaldo presents a number of lessons he learned while designing an end-to-end network for driving in the Behavioral Cloning Project. In particular, he came to appreciate the power of GPUSs.

Using GPU is magic. Is like to give a Coke to someone in the desert. Or to buy a new car — the feeling of ‘how I was using that crap old one’. Or to find a shortcut in the route to the office: you’ll never use the long route again. Or to find a secret code in a game that give superpowers…

Robust Extrapolation of Lines in Video Using Probabilistic Hough Transform

Esmat Nabil

Esmat presents a well-organized outline of his Finding Lane Lines Porject and the computer vision pipeline that he used. In particular, he has a nice explanation of the Hough transform, which is a tricky concept!

The probabilistic Hough line transform more efficient implementation of Hough transform. It gives as output the extremes of the detected lines (x0, y0, x1, y1). It is difficult to detect straight lines which are part of a curve because they are very very small. For detecting such lines it is important to properly set all the parameters of Hough transform. Two of most important parameters are: Hough votes and maximum distance between points which are to be joined to make a line. Both parameters are set at their minimum value.

Udacity Students on Cutting-Edge Autonomous Vehicle Tools

Students in Udacity’s Self-Driving Car Engineer Nanodegree Program go above and beyond to build terrific implementations of vehicle detectors, lane line detectors, neural networks for end-to-end learning, and career advice.

Small U-Net for vehicle detection

Vivek Yadav

In the Vehicle Detection Project, students use standard computer vision methods to detect and localize vehicles in images taken from highway driving. Vivek went well beyond standard computer vision methods, and used U-Net, an encoder-decoder architecture that has proven effective for medical imaging. The results are astounding.

Another advantage of using a U-net is that it does not have any fully connected layers, therefore has no restriction on the size of the input image. This feature allows us to extract features from images of different sizes, which is an attractive attribute for applying deep learning to high fidelity biomedical imaging data. The ability of U-net to work with very little data and no specific requirement on input image size make it a strong candidate for image segmentation tasks.

My Lane Detection Project in the Self Driving Car Nanodegree by Udacity

Param Aggarwal

Param provides a great walkthrough of his first project — Finding Lane Lines. He also includes a video that shows all of the intermediate steps necessary to find lane lines on the road. Then he applies his computer vision pipeline to a new set of videos!

This is the most important step, we use the Hough Transform to convert the pixel dots that were detected as edges into meaningful lines. It takes a bunch of parameters, including how straight should a line be to be considered a line and what should be the minimum length of the lines. It will also connect consecutive lines for us, is we specify the maximum gap that is allowed. This is a key parameter for us to be able to join a dashed lane into a single detected lane line.

Extrapolate lines with numpy.polyfit

Peteris Nikiforovs

Leading up to the Finding Lane Lines project, we teach students about some important computer vision functions for extracting lines from images. These are tools like Hough transforms and Canny edge detection. However, we leave it to the students to actually identify which lines correspond to the lane lines. Most students find some points and extrapolate y=mx+b. Peteris went beyond this, though, and taught himself how to use the numpy.polyfit() function in order to identify the line equation automatically!

If return to the original question, how do we extrapolate the lines?

Since we got a straight line, we can simply plug in points that are outside of our data set.

An augmentation based deep neural network approach to learn human driving behavior

Vivek Yadav

While training his end-to-end driving network for the Behavioral Cloning project, Vivek made us of extensive image augmentation. He flipped his images, resized them, added shadows, changed the brightness, and applied vertical and horizontal shifts. All of this allowed his model to generalize to an entirely new track that it had never seen before.

This was perhaps the weirdest project I did. This project challenged all the previous knowledge I had about deep learning. In general large epoch size and training with more data results in better performance, but in this case any time I got beyond 10 epochs, the car simply drove off the track. Although all the image augmentation and tweaks seem reasonable n0w, I did not think of them apriori.

But, Self-Driving Car Engineers don’t need to know C/C++, right?

Miguel Morales

Miguel’s practical post covers some of the different angles from which a self-driving car engineer might need to know C++, ROS, and other autonomous vehicle development tools. It’s a great read if you’re looking for a job in the industry!

Self-Driving Car Engineers use C/C++ to squeeze as much speed out of the machine as possible. Remember, all processing in autonomous vehicles is done in real-time and even sometimes in parallel architectures, so you will have to learn to code for the CPU but also the GPU. It is vital for you to deliver software that can process large amount of images (think about the common fps — 15, 30 or even 60) every second.

Udacity Students on Neural Networks, AWS, and Why They Enrolled in CarND

Here are five terrific posts by Udacity Self-Driving Car students covering advanced convolutional neural network architectures, how to set up AWS instances, and aspirations for CarND.

Traffic signs classification with a convolutional network

Alex Staravoitau

Alex took the basic convolutional neural network tools we teach in the program, and built on them to create a killer traffic sign classifier. He used extensive data augmentation, and an advanced network architecture with multi-scale feature extraction.

Basically with multi-scale features it’s up to classifier which level of abstraction to use, as it has access to outputs from all convolutional layers (e.g. features on all abstraction levels).

Self Driving Car Nanodegree Experience So Far….

Sridhar Sampath

Sridhar has a fun summary of his experience in the program so far, including great detail about some sophisticated data augmentation and network architectures that he used. I also laughed when mentioned why he enrolled.

So then why did I choose this course over other available courses? “The main reason was that I have experience in ADAS so this course was a perfect fit for my career passion”. Also, it was like a monopoly.

Detecting lanes

Subhash Gopalakrishnan

Subhash has clear and concise descriptions of the computer vision tools he uses for his Finding Lane Lines Project. A bonus section includes him trying to find lanes on roads in India!

The part remaining is to discover lines in the edge pixels. Before attempting this, we need to rethink a point in terms of all the lines that can possibly run through it. Two points will then have their own sets of possible lines with one common line that runs through both of them. If we could plot the line-possibilities of these two points, both points will “vote” for that line that passes through both of them.

AWS setup for Deep Learning

Himanshu Babal

Himanshu has a great tutorial on how to set up an AWS EC2 instance with a GPU to accelerate deep learning. It includes tips on how to get free AWS credits! (I should note that since Himanshu wrote this we have included our own tutorial within the program, but this is still a great post and more free credits are always welcome!)

I will be helping you out in the following setup
* AWS Account setup and $150 Student Credits.
* Tensorflow-GPU setup with all other libraries.

Udacity Will Help Me To Achieve My Goals

Mojtaba VĂ lipour

Mojtaba joins us from Iran, which is really inspiring given the backdrop of world events right now. We are excited to have him and he is excited to be in the program!

Maybe Sebastian Thrun has no idea who am I and how much respect I have for him. I made a autonomous vehicle because I saw his course (Artificial Intelligence for Robotics), I learned a lot from him and the power of ROS (Robot Operating System). I really love this field of study and I follow everything related to autonomous vehicles since 2004 (When DARPA started everything). And now I am in the first Cohort in the Self Driving Cars Nano Degree (SDCND) thanks to David Silver, Todd Gore, Oliver Cameron, Stuart Frye and other Udacians.

Udacity Students on Lane Lines, Curvature, and Cutting-Edge Network Architectures

Here is a terrific collection of blog posts from Udacity Self-Driving Car students.

They cover the waterfront — from debugging computer vision algorithms, to detecting radius of curvature on the road, to using Faster-RCNN, YOLO, and other cutting edge network architectures.

Bugger! Detecting Lane Lines

Jessica Yung

Jessica has a fun post analyzing some of the bugs she had to fix during her first project — Finding Lane Lines. Click through to see why the lines above are rotated 90 degrees!

Here I want to share what I did to investigate the bug. I printed the coordinates of the points my algorithm used to extrapolate the linens and plotted them separately. This was to check whether the problem was in the points or in the way the points were extrapolated into a line. E.g.: did I just throw away many of the useful points because they didn’t pass my test?

Towards a real-time vehicle detection: SSD multibox approach

Vivek Yadav

Vivek has gone above and beyond the minimum requirements in almost every area of the Self-Driving Car program, including helping students on the forums and in our Slack community, and in terms of his project submissions. He really outdid himself with this post, which compares using several different cutting-edge neural network architectures for vehicle detection.

The final architecture, and the title of this post is called the Single Shot Multibox Detector (SSD). SSD addresses the low resolution issue in YOLO by making predictions based on feature maps taken at different stages of the convolutional network, it is as accurate and in some cases more accurate than the state-of-the-art faster-RCNN. As the layers closer to the image have higher resolution. To keep the number of bounding boxes manageable an atrous convolutional layer was proposed. Atrous convolutional layers are inspired by “algorithme a trous” in wavelet signal processing, where blank filters are applied to subsample the data for faster calculations.

CNN Model Comparison in Udacity’s Driving Simulator

Chris Gundling

This is a fantastic post by Chris comparing and contrasting the performance of two different CNN architectures for end-to-end driving. Chris looked at an average-sized CNN architecture proposed by NVIDIA, and a huge, VGG-style architecture he built himself.

I experimented with various data pre-processing techniques, 7 different data augmentation methods and varied the dropout of each of the two models that I tested. In the end I found that while the VGG style model drove slightly smoother, it took more hyperparameter tuning to get there. NVIDIA’s architecture did a better job generalizing to the test Track (Track2) with less effort.

Hello Lane Lines

Josh Pierro

Josh Pierro took his lane-finding algorithm for a spin in his 1986 Mercedes-Benz!

From the moment I started project 1 (p1 — finding lane lines on the road) all I wanted to do was hook up a web cam and pump a live stream through my pipeline as I was driving down the road.

So, I gave it a shot and it was actually quite easy. With a PyCharm port of P1, cv2 (open cv) and a cheap web cam I was able to pipe a live stream through my model!

Udacity SDCND : Advanced Lane Finding Using OpenCV

Paul Heraty

Paul does a great job laying out his computer vision pipeline for detecting lane lines on a curving road. He even compares his findings for radius of curvature to US Department of Transportation standards!

Overall, my pipeline looks like the following:

Let’s look at each stage in some detail

CarND Students on Preparation, Generalization, and Hacking Cars

Here are five great posts from students in Udacity’s Self-Driving Car Engineer Nanodegree Program, dealing with generalizing machine learning models and hacking cars!

SDC

Daniel Stang

Daniel has devoted a section of his blog to the Self-Driving Car projects, including applying his lane-line finder to video he took himself!

The first project for the Udacity Self-Driving Car Nanodegree was to create a software pipeline capable of detecting the lane lines in video feed. The project was done using python with the bulk of work being performed using the OpenCV library. The video to the side shows the software pipeline I developed in action using video footage I took myself.

Traffic Sign Classifier: Normalising Data

Jessica Yung

Jessica’s post discusses the need to normalize image data before feeding it into a neural network, including a bonus explainer on the differences between normalization and standardization.

The same range of values for each of the inputs to the neural network can guarantee stable convergence of weights and biases. (Source: Mahmoud Omid on ResearchGate)

Suppose we have one image that’s really dark (almost all black) and one that’s really bright (almost all white). Our model has to address both cases using the same parameters (weights and biases). It’s hard for our model to be accurate and generalise well if it has to tackle both extreme cases.

Hardware, tools, and cardboard mockups.

Dylan Brown

Dylan is a student in both the Georgia Tech Online Master’s in Computer Science Program (run by Udacity) and also in CarND. He’s also turning his own Subaru into a self-driving car! (Note: We do not recommend this.)

Below I’ve put together a list of purchases needed for this project. There will definitely be more items coming soon, at least a decent power supply or UPS. Thankfully, this list covers all the big-ticket items.

Jetson TX1 Developement Kit (with .edu discount) NVIDIA $299
ZED Stereo Camera with 6-axis pose Stereolabs $449 CAN(-FD) to USB interface PEAK-System $299 
Touch display, 10.1” Toguard $139 
Wireless keyboard K400 Logitech $30 
Total $1216

Self-Driving Car Engineer Diary — 1

Andrew Wilkie

Andrew has a running blog of his experiences in CarND, including his preparation.

I REALLY want a deep understanding of the material so followed Gilad Gressel’s recommendation (course mentor): Essence Of Linear Algebra (for linear classifiers which is step 1 towards CNNs), Gradients & Derivatives (for back propagation understanding) and CS231n: Convolutional Neural Networks for Visual Recognition lectures (for full Neural Networks and Convolutional Deep Neural Networks understanding).

Comparing model performance: Including Max Pooling and Dropout Layers

Jessica Yung

Another post by Jessica Yung! This time, she runs experiments on her model by training with and without different layers, to see which version of the model generalizes best.

Means of training accuracy - validation accuracy in epochs 80-100 (lower gap first):

Pooling and dropout (0.0009)
Dropout but no pooling (0.0061)
Pooling but no dropout (0.0069)
No pooling or dropout (0.0094)

Udacity Students on Computer Vision, Neural Networks, and Careers

Finding the right parameters for your Computer Vision algorithm

maunesh

In this post maunesh discusses the challenges of tuning parameters in computer vision algorithms, specifically using the OpenCV library. maunesh built a GUI for parameter tuning, to help him develop intuition for the effect of each parameter. He published the GUI to GitHub so other students can use it, too!

For Canny edge detection algorithm to work well, we need to tune 3 main parameters — kernel size of the Gaussian Filter, the upper bound and the lower bound for Hysteresis Thresholding. More info on this can be found here. Using a GUI tool, I am trying to determine the best values of these parameters that I should use for my input.

Behavioral Cloning For Self Driving Cars

Mojtaba Valipour

In this post, Mojtaba walks through the development of his behavioral cloning model in detail. I particularly like the graphs he built to visualize the data set and figure out which approaches would be most promising for data augmentation.

Always the first step to train a model for an specific dataset is visualizing the dataset itself. There are many visualization techniques that can be used but I chose the most straightforward option here.

Building a lane detection system using Python 3 and OpenCV

Galen Ballew

Galen explains his image processing pipeline for the first project of the program — Finding Lane Lines — really clearly. In particular, he has a admirably practical explanation of Hough space.

Pixels are considered points in XY space

hough_lines() transforms these points into lines inside of Hough space

Wherever these lines intersect, there is a point of intersection in Hough space

The point of intersection corresponds to a line in XY space

What kind of background do you need to get into Machine Learning?

Chase Schwalbach

This is a great post for anybody interested in learning about self-driving cars, but concerned they might not be up to the challenge.

I’ll put the summary right up top — if I can do it, you can too. I wanted to share this post to show some of the work I’m doing with Udacity’s Self-Driving Car Nanodegree, and I also want to share some of my back story to show you that if I can do it, there’s nothing stopping you. The only thing that got me to this point is consistent, sustained effort.

Self-driving car in a simulator with a tiny neural network

Mengxi Wu

Mengxi wasn’t satisfied training a convolutional neural network that successfully learns end-to-end driving in the Udacity simulator. He systematically removed layers from his network and pre-processed the images until he was able to drive the simulated car with a tiny network of only 63 parameters!

I tried the gray scale converted directly from RGB, but the car has some problem at the first turn after the bridge. In that turn, a large portion of the road has no curb and the car goes straight through that opening to the dirt. This behavior seems to related to that fact that the road is almost indistinguishable from the dirt in grayscale. I then look into other color space, and find that the road and the dirt can be separated more clearly in the S channel of the HSV color space.

Image Augmentation

Data is the key to deep learning, and machine learning generally.

In fact, Stanford professor and machine learning guru (and Coursera founder, and Baidu Chief Scientist, and…) Andrew Ng says that it’s not the engineer with the best machine learning model that wins, rather it’s whoever has the most data.

One way to get a lot of data is to painstakingly collect a lot of it. All else equal, this is the best way to compile a huge machine learning dataset.

But all else is rarely equal, and compiling a big dataset is often prohibitively expensive.

Enter data augmentation.

The idea behind data augmentation (or image augmentation, when the data consists of images) is that an engineer can start with a relatively small data set, make lots of copies, and then perform interesting transformations on those copies. The end result will be a really large dataset.

One of the Udacity Self-Driving Car Engineer Nanodegree Program students, Vivek Yadav, has a terrific tutorial on how he used image augmentation to train his network for the Behavioral Cloning Project.

1. Augmentation:
 A. Brightness Augmentation
 B. Perspective Augmentation
 C. Horizontal and Vertical Augmentation
 D. Shadow Augmentation
 E. Flipping

2. Preprocessing

3. Sub-sampling

Read the whole thing!

Should You Understand Backpropagation?

Backpropagation is a leaky abstraction; it is a credit assignment scheme with non-trivial consequences. If you try to ignore how it works under the hood because “TensorFlow automagically makes my networks learn”, you will not be ready to wrestle with the dangers it presents, and you will be much less effective at building and debugging neural networks.

That is from the excellent Andrej Karpathy, “Yes you should understand backprop”.

I say it’s possible to use deep neural networks quite effectively without truly understanding backprop. But if your goal is to specialize in the field and apply this tool to a range of problems, then “yes you should understand backprop”.

By the way, @karpathy is a prolific Twitter feed with 37,100 followers.

Is Deep Learning Overhyped?

One of the questions I get every now and again is whether self-driving cars are a solved problem. Is there any work left to be done in this field?

The answer is that there is so much work left to be done! It only seems like a solved problem from the outside 🙂

So I was interested to read Francois Chollet’s answer to “Is Deep Learning Overhyped?” on Quora.

Chollet is the author of Keras, which is a deep learning library we use in the Udacity Self-Driving Car Program. He explains at length why artificial intelligence generally, much like autonomous driving specifically, is not a solved problem.

Overall: deep learning has made us really good at turning large datasets of perceptual inputs (images, sounds, videos) and simple human-annotated targets (e.g. the list of objects present in a picture) into models that can automatically map the inputs to the targets. That’s great, and it has a ton a transformative practical applications. But it’s still the only thing we can do really well. Let’s not mistake this fairly narrow success in supervised learning for having “solved” machine perception, or machine intelligence in general. The things about intelligence that we don’t understand still massively outnumber the things that we do understand, and while we are standing one step closer to general AI than we did ten years ago, it’s only by a small increment.

There’s still a lot of work left to do!

TensorFlow on Windows

TensorFlow is the main deep learning library we are using in the Udacity Self-Driving Car Engineer Nanodegree Program, and it’s been a little bit painful because of the lack of Windows support.

We’ve had to work with our Windows users to set up Docker containers in which to run TensorFlow, and frankly we haven’t done as good a job with that as we should.

So thank goodness Google announced yesterday that they’re releasing Windows support for TensorFlow.

It looks to be early stages and I’m not sure if we can safely point our students there yet, but hopefully it means we can get there soon.

And it’s also another step toward TensorFlow becoming the library of choice for deep learning.