This version has brand new courses and projects on deep learning, sensor fusion, localization, and planning, with brand new instructors. It’s a fantastic 2021 update to the original program, which dated to 2016.
I helped build the first half of this new version back when I was still at Udacity last year, and then my long-time Udacity colleague, Michael Virgo, led the effort to completion.
A particularly awesome aspect of this new version of the Nanodegree program is that we created in conjunction with Udacity’s long-time partner, Mercedes-Benz, and with a new partner, Waymo. Several projects in the program teach students how to work with the Waymo Open Dataset, which is a fantastic opportunity for students to gain hands-on skills.
Last week, TechCrunch reported that Aurora intends to become a publicly-traded company, via a merger with a special purpose acquisition company (SPAC), specifically Reinvent Technology Partners Y. To my mind, the most interesting parts of this announcement related to the massive amounts of capital that AV companies need and are frequently able to raise.
Aurora’s valuation will be $13 billion, despite an absence of revenue. After closing the SPAC, Aurora will have about $2.5 billion of cash on-hand.
To figure out how far that $2.5 billion will take them, we can do some back of the envelope math. According to TechCrunch, Aurora has about 1,600 employees. Since Aurora remains in the research and development stage, most of those employees are probably engineers, and many of them are probably well-compensated machine learning and robotics engineers.
For a run-of-the-mill web software company, I might assume a “fully loaded” cost of $150,000 to $200,000 per engineer, per year (including salary, benefits, taxes, and implicit equipment, such as rent). Aurora has a bunch of equipment costs, like buying trucks and sensors and data storage, so let’s bump that fully loaded cost to $225,000 per engineer, per year.
1,600 engineers times $225,000 per engineer equals $360 million in costs per year. That’s surely not exactly correct, but it gives a sense of the order of magnitude.
That suggests that Aurora’s $2.5 billion of post-SPAC cash will last around 7 years, although probably Aurora has significant expansion plans that will both increase its expenses and also generate revenue within that timeframe.
Also notable is Aurora’s current cash situation. The $2.5 billion in post-SPAC cash includes, according to TechCrunch, approximately $1 billion dollars from the Reinvent SPAC itself, plus another $1 billion in private investment in public entity (PIPE) financing attached to the SPAC merger. That suggests Aurora’s current cash pile is about $500 million dollars, which is approximately one year of burn.
This is a company that probably needs to raise funds soon, one way or another. Looks like they’ve found a very lucrative way to do that.
Interestingly, the NFX episode didn’t really touch on the Google acquisition. They referenced Bardin’s existing write-up and basically referred people to that if they were interested in the details.
Instead, the podcast focused on the early days and hypergrowth phases of Waze.
Waze bootstrapped their own maps, instead of licensing maps providers. When a user in a new area downloaded Waze, they would get a blank canvas, and they would essentially draw the map themselves, by driving. Then they could log onto the website later to polish the map they’d drawn.
Waze’s version of the 1/9/90 rule was that 1% of users would build the map, 9% would report traffic, and 90% would consume.
Waze churned a lot of users for a long time because the product wasn’t good enough. People loved the promise of the product, but Waze couldn’t deliver fast enough. Figuring this out gave them confidence that if they just executed fast enough, users would come.
The biggest competitor for many years wasn’t Google Maps or Apple Maps, but rather FourSquare. However, FourSquare never broke out of the “cool kids” customer segment, whereas Waze started outside of the “cool kids” segment by default, because their users were boring suburbanites driving to work.
Global companies need to succeed in the US. This means that companies based in the US are more likely to succeed globally, but also companies in tiny countries (e.g. Israel, but also Nordic or Baltic countries). Companies in tiny countries have no home market, so they have to go global from Day 1. The toughest spot is middle-sized countries (e.g. Germany). They can grow initially in their home market, but eventually they are likely to be consumed by local (i.e. not US) product priorities, and never make the leap to become global winners.
Waze’s best information is that the maps market is now 40% Google Maps, 35% Waze, and 25% Apple Maps.
Great program managers are amazingly effective at reducing stress, increasing performance, and especially at hitting timelines.
For years, I did not believe this, mainly because I hadn’t seen many good program managers in action. Mostly, I had seen engineers, or product managers, or executives corralled into program managements roles, where they performed adequately but not impressively.
I myself have been corralled into that role a few times. I am not a great program manager.
But then I joined Udacity, which had phenomenal program managers, and I realized how effective they could be. Holding everyone accountable, foreseeing the future and addressing upcoming complications, and reporting out progress are all really important for organizational progress.
This specific role at Cruise will, “lead the management, implementation, and reporting of Cruise’s programs. You will be responsible for creating roadmaps, developing and adhering to timelines, and working cross-functionally to ensure alignment and collaboration.”
Waymo announced a new simulation framework recently, both on its own blog and in a feature story with The Verge. The framework is called SimulationCity.
SimulationCity seems awfully reminiscent of CarCraft, the simulation engine that Waymo made famous in 2017. It’s been four years, which is certainly time for a refresh.
The Verge article is a little cagey about the distinction between SimulationCity and CarCraft:
“The company decided it needed a second simulation program after discovering “gaps” in its virtual testing capabilities, said Ben Frankel, senior product manager at the company. Those gaps included using simulation to validate new vehicle platforms, such as the Jaguar I-Pace electric SUV that Waymo has recently begun testing in California, and the company’s semi-trailer trucks outfitted with sensing hardware and the Waymo driver software.”
Waymo is using a new neural network they developed called SurfelGAN (“surface element generative adversarial network”) to better simulate sensor data, especially complex weather conditions like rain, snow, and fog.
Waymo’s blog post features several different videos and GIFs of SimulationCity, and each looks a little different. One video seems focused on behavioral planning, and features an animated Waymo semi-truck on a highway surrounded by moving green rectangular prisms that are meant to represent other vehicles on the road.
Another video seems to be simulating lidar point clouds.
And yet another video shows high-resolution simulated images paired side-by-side with real camera frames. It’s genuinely challenging to figure out which half of the image is simulated and which half is real.
All of that together seems to indicate that SimulationCity is a comprehensive simulation solution, more than a specialized solution for just camera images. I bet they can run perception, localization, prediction, planning, and maybe even control simulations within the framework, at varying speeds. Impressive.
“Self-driving is an all-encompassing AI and engineering challenge. It’s easy to see an AV on the streets and think only about the AI models that power them or the compute and sensor suites built as part of it, but there is a virtual software assembly line built alongside the car itself that enables us to meet the unique scale and safety imperative at play here.
To enable AVs to drive superiorly in any given scenario, and continuously evolve and adapt new paradigms, it requires an ecosystem capable of ingesting petabytes of data and hundreds of years worth of compute every day, training and testing models on a continuous loop for multiple times a week software updates that improve performance and ensure safety. The complex network of new tools, testing infrastructure and development platforms that are behind every seamless handling of a construction zone or double-parked car are themselves significant engineering achievements that stand to have an outsized impact beyond AV as they push the boundaries of ML, robotics and more.”
This was probably the biggest surprise upon joining Cruise, which is embarrassing to admit. Cruise has invested tremendously in developing an entire AV software infrastructure that supports the core AV stack. There are front-end engineers working on visualization tools for machine learning scientists, and site reliability engineers ensuring the performance of cloud services. It’s a little bit like an iceberg, 90% of the activity is below the surface of what we might think of as “core AV engineering.”
The rest of the answers in the article are great, too, including Sterling Anderson (Aurora), Jesse Levinson (Zoox), and Raquel Urtasun (Waabi).
“The crux of the challenge involves making decisions under uncertainty; that is, choosing actions based on often imperfect observations and incomplete knowledge of the world. Autonomous robots have to observe the current state of the world (imperfect observations), understand how this is likely to evolve (incomplete knowledge), and make decisions about the best course of action to pursue in every situation. This cognitive capability is also essential to interpersonal interactions because human communications presuppose an ability to understand the motivations of the participants and subjects of the discussion. As the complexity of human–machine interactions increases and automated systems become more intelligent, we strive to provide computers with comparable communicative and decision-making capabilities. This is what takes robots from machines that humans supervise to machines with which humans can collaborate.“
“They contracted prototyping to Roding, production planning to HÖRMANN Automotive, and series manufacturing to an international Tier 1 automotive supplier. Downstream functions are also handled by partners. A partner manages vehicle leasing, and the digital platform is being developed by Porsche subsidiary MHP.”
Elon Musk is catching a lot of flack for tweeting, “Didn’t expect [generalized self-driving] to be so hard, but the difficulty is obvious in retrospect.” I think every single automotive company (and me!) that predicted self-driving cars by 2020 could write some version of that tweet. I guess the difference is Tesla actually sold Full Self-Driving packages, so they’re on the hook.
Autonomous underwater robots will greatly accelerate the discovery of the more than three million lost shipwrecks around the world, according to the discoverer of The Titanic. “New chapters of human history are to be read.”
UL, a major standards publisher, joins the World Economic Forum’s Safe Drive Initiative. Creating the definitive set of self-driving standards remains a big opportunity.
Ken Washington leaves the CTO post at Ford for a VP role at Amazon. Ken was VP of advanced engineering and research while I was at Ford. I briefly met him a couple times, including when he visited Udacity. Great hire for Amazon.
Andrej Karpathy, Tesla’s Senior Director of AI, presented Tesla’s recent work at CVPR 2021. CVPR is one of the foremost conferences for academic research into computer vision. Karpathy always does a great job explaining cutting-edge work in an intelligible format (he is an AI researcher with over 350,000 Twitter followers!).
Karpathy’s presentation is about 40 minutes, but it comes at the end of an 8.5 hour session recording. Hence the timestamps start at 7:51:26.
[7:52:46] As a way of emphasizing the importance of automated driving, Karpathy describes human drivers as “meat computers.” I saw some people take offense to this on the Twitter. I think the shortcomings of human drivers are widely acknowledged and this statements wasn’t necessary, but neither was I extremely offended. Human drivers kill a lot of people.
[7:55:14] Karpathy describes Autopilot’s Pedal Misapplication Mitigation (PMM) feature. I’d not heard of this, but I like it. Malcolm Gladwell released a podcast a few years ago hypothesizing that the Toyota recalls of the aughts and early 2010s were largely due to confused drivers flooring the accelerator pedal when they meant to (and thought they were) flooring the brake pedal. Although Consumer Reports disagrees.
[7:57:40] Karpathy notes that Waymo’s approach to self-driving relies on HD maps and lidar, whereas Tesla’s approach relies only on cameras. He claims this makes Tesla’s approach much more scalable, because of the effort required in building and maintaining the HD map. I’m not sure I agree with him about this – a lot of effort goes into automating the mapping process to make it scalable. And even if mapping does prove to be unscalable, lidar has a lot of uses besides localizing to an HD map.
[8:01:20] One reason that Tesla has removed radar from its sensor suite, according to Karpathy, is to liberate engineers to focus on vision. “We prefer to focus all of our infrastructure on this [cameras] and we’re not wasting people working on the radar stack and the sensor fusion stack.” I had not consider the organizational impact of removing the radar sensor.
[8:02:30] Radar signals are really accurate most of the time, but occasionally the radar signal goes haywire, because the radar wave bounces off a bridge or some other irrelevant object. Sorting the signal from the noise is a challenge.
[8:03:25] A good neural network training pipeline has data that is large, clean, and diverse. With that, “Success is guaranteed.”
[8:04:35] Karpathy explains that Tesla generates such a large dataset by using automated techniques that wouldn’t work for a realtime self-driving system. Because the system is labeling data, rather than processing the data in order to drive, the system can run much slower and use extra sensors, in order to get the labeling correct. Humans even help clean the data.
[8:07:10] Karpathy shares a sample of the 221 “triggers” Tesla uses to source interesting data scenarios from the customer fleet. “radar vision mismatch”, “bounding box jitter”, “detection flicker”, “driver enters/exits tunnel”, “objects on the roof (e.g. canoes)”, “brake lights are detected as on but acceleration is positive”, etc.
[8:08:40] Karpathy outlines the process of training a network, deploying it to customers in “shadow mode”, measuring how accurately the model predicts depth, identifying failure cases, and re-training. He says they’ve done 7 rounds of shadow mode. I’m a little surprised the process is that discrete. I would’ve guessed Tesla had a nearly continuous cycle of re-training and re-deploying models.
[8:10:00] Karpathy shows a very high-level schematic of the neural network architecture. There’s a RESnet-style “backbone” that identifies features and then fuses data across all the sensors on the vehicle and then across time. Then the network branches into heads, then “trunks”, then “terminals.” The combined network shares features but also allows engineers interested in specific features (e.g. velocity for vehicles in front of the car) to tune their branches in isolation.
[8:11:30] “You have a team of, I would say, 20 people who are tuning networks full-time, but they’re all cooperating. So, what is the architecture by which you do is an interesting question and I would say continues to be a challenge over time.” In a few different cases now, Karpathy has discussed organizational dynamics within the engineering team and a significant factor in development.
[8:11:50] Karpathy flashes an image and specs of Tesla’s new massive computer. That Karpathy knows enough about computer architecture to even describe what’s going on here is impressive. He also plugs recruiting for their super-computing team.
[8:14:20] In the vein of integration, Karpathy shares that the team gets to design everything from the super-computer, to the in-vehicle FSD chip, to the neural networks. Vertical integration!
[8:16:00] Karpathy shows an example of radar tracking a vehicle and reporting a lot of noise. He explains that maybe they could work on the radar to fix this, but kind of shrugs and says it’s not worth it, since radar isn’t that useful anyway.
[8:19:40] Karpathy references both the validation and simulation processes, but at such a high level I can’t really tell what they’re doing. He mentions unit tests, simulations, track tests, QA drives, and shadow modes.
[8:20:20] Tesla reports FSD has run about 1.7M Autopilot miles with no crashes. Karpathy warns that crashes are inevitable, at Tesla’s scale. He reports that the legacy stack has a crash “every 5M miles or so.” For context, in the US, human drivers experience fatal crashes about every 65M miles. (Do note the distinction between “fatal crashes”, which is the available data for human drivers, and “all crashes” which is the reference Karpathy provides. We would expect “all crashes” to occur much more frequently than “fatal crashes.”)
[8:22:40] Karpathy speculates that training for vision alone basically requires a fleet (and a super-computer), in order to gather sufficient data. He seems like such a nice guy that I wouldn’t even consider this a dig at lidar-reliant autonomous vehicle companies, but rather I chalk this up to a defense to all the criticism that Tesla’s vision-only approach has received.