Andrei Karpathy is one of the most impressive and celebrated computer scientists in the world, and has worked for the past several years as Senior Director, AI, at Tesla. Essentially, he leads their Autopilot team.
Reilly Brennan’s Future of Transportation newsletter (you should subscribe) pointed to a talk Karpathy recently gave at a conference called ScaledML. It’s pretty great, so I decided to annotate it, as a way to capture all of the details for myself, as much as anything else.
[00:00] Karpathy’s title is Senior Director. I remember him joining Tesla as a Director, so I think he got a promotion. Congratulations!
[00:19] Karpathy starts by defining what Autopilot is. This seems like good presentation technique. Establish the basics before moving on to advanced topics.
[00:50] Karpathy shows 8 Tesla vehicle models, noting that some of them have “only been announced.” Models S, 3, X, Y, T(ruck), A(TV — joking?), R, and S(emi). Globally Tesla has over 1 million vehicles.
[01:35] Autopilot has 3 billion miles, “which sounds(?) like a lot.”
[01:58] “We think of it (Autopilot) as roughly autonomy on the highway.” Sounds like Level 3 to me.
[02:24] “Smart Summon is quite magical, when it works (audience laughs).” I actually don’t know, is Smart Summon unreliable?
[03:12] Euro NCAP has rated Teslas as the safest vehicles, which isn’t a surprise but also puts the Autopilot lawsuits in perspective.
[03:45] Karpathy shows some examples of Tesla safety features working, even when Autopilot is not turned on. Probably this means that Karpathy’s team is working on the broader array of safety features, not just Autopilot.
[04:43] “The goal of the team is to produce full self-driving.” Karpathy has always struck me as more reliable and realistic than Musk. “Full Self-Driving” means more coming from Karpathy.
[06:30] “We do not build high-definition maps. When we come to an intersection, we encounter it basically for the first time.” This is striking, and I don’t think I’ve heard Tesla put it quite like this before. Tesla is famous for eschewing lidar, but I wonder why they don’t build vision-based maps?
[08:00] Karpathy mentions that the neural networks on the car really have two separate tasks — (a) driving, and (b) showing the humans in the vehicle that the computer perceives the environment, so the humans trust the system.
[09:16] We see a photo of a crossing guard with a handheld stop sign, hanging loose from the guard’s limp arm. Karpathy calls this “an inactive state.” This really highlights to me how hard it is for a computer to know whether a stop sign is real or not.
[10:10] Karpathy mentions Tesla builds maps, “of course, but they’re not high-definition maps.” I wonder what kind of maps they are.
[10:35] The Autopilot team spends at least part of its day-to-day work going through the long-tail and sourcing examples of weird stop signs. And presumably other weird scenarios. Man that sounds like a grind — I would imagine they must automate or outsource a lot of that.
[11:15] Bayesian uncertainty in the neural network seems to play a role.
[12:21] When Tesla needs more data, they just send an extra neural network to their vehicle fleet and ask the cars to run that network in the background, gathering potential training images. I would be it will take traditional automotive companies a long time to develop this capability.
[13:16] Test-Driven Development! TDD for the win!
[14:37] HydraNet is a collection of 48 neural networks with a “shared backbone” and 1000 distinct predictions. This is a multi-headed neural network on steroids.
[14:59] “None of these predictions can ever regress, and all of them must improve over time.” I don’t really understand what he means here. Surely there must be times a network predicts a dog and then later realizes it’s a child, etc.
[15:15] Autopilot is maintained by “a small, elite team — basically a few dozen people.” Wow.
[15:54] The goal of the Tesla AI team is to build infrastructure that other, more tactical people can then use to execute tasks. They call this approach Operation Vacation. (ruh-ruh)
[16:46] For example, if somebody at Tesla wants to detect a new type of stop sign, they supposedly don’t even have to bother Karpathy’s team. The AI team has already built out all the infrastructure for the rest of Tesla to plug new “landmark” images into.
[17:56] Karpathy shows an occupancy tracker that looks like something out of a 2-D laser scanner from twenty years ago. I wonder if they’re basically using cameras to fake what lidars do (Visual SLAM, etc.).
[19:36] Autopilot code used to be a lot of C++ code, written by engineers. As the neural networks get better, they’re eating up a lot of that “1.0” codebase.
[19:51] Aha! The occupancy tracker is old, “1.0” code, written by people. The future is neural networks!
[20:00] There is a “neural net fusion layer, that stitches up the feature maps and projects to birds-eye view.”
[20:15] There is a “temporal module” that smoothes and a “BEV net decoder”. What is are these things? I probably need to spend a few weeks getting back up to speed on the latest neural network research.
[22:15] Karpathy shows off how well this system works, but it’s hard to follow and judge for myself.
[22:35] Tesla takes a “pseudo-lidar approach, where you predict the depth of every since pixel and you basically simulate lidar input purely from vision.” Why not just use lidar, then? The unit price is coming down. Probably Tesla can’t depend on lidar because it already has a million vehicles on the road, none of which have lidar, and many of which have paid for full self-driving already. Realistically, though, this sounds like Tesla will start to add lidar at some point.
[24:02] The gap between lidar and a camera’s ability to simulate lidar is “quickly shrinking.” What’s the gap now? Is this tracked somewhere in academic literature?
[24:36] The driving policy (the motion planning), is still human-coded. But not for long! This is where Tesla’s fleet really shines. Karpathy notes that their human drivers are basically building supervised motion planning datasets for free.
[26:17] Really nice job summarizing his own talk. It’s just amazing that one guy can be such a phenomenal computer scientist and also so skilled at communication — in a second language, no less!
[26:40] They’re hiring!
[27:30] During Q&A, Karpathy notes that Tesla builds low-definition semantic maps, which somewhat contradicts his earlier statement that every intersection is basically approached as if it were a new intersection.
[29:45] The hand-coded, “software 1.0” stack is used to keep the neural network within “guardrails.”