The blog post, “Deep Reinforcement Learning Doesn’t Work Yet”, has been making the rounds for the last few months, but I only just sat down to read it.
It’s a terrific summary of the current state of deep learning research, reasons why DeepRL is not yet living up to its hype, and hope for the future. It’s written by Alexander Irpan, who works on RL at Google Brain.
“For purely getting good performance, deep RL’s track record isn’t that great, because it consistently gets beaten by other methods.”
There is a lot of interest in using DeepRL for self-driving cars. While this is a super-exciting opportunity in theory, in practice DeepRL has not been effective.
“I tried to think of real-world, productionized uses of deep RL…The way I see it, either deep RL is still a research topic that isn’t robust enough for widespread use, or it’s usable and the people who’ve gotten it to work aren’t publicizing it. I think the former is more likely.”
A big challenge for RL generally, and particularly when it comes to self-driving cars, is the design of a reward function. It’s not clear what the reward function for driving a car would be. And, as Irpan makes clear, unless the reward function is designed near perfectly, the learning agent is going to find all sorts of disastrous shortcuts to maximize the learning function at the expense of violating the implicit rules of the game.
“A friend is training a simulated robot arm to reach towards a point above a table. It turns out the point was defined with respect to the table, and the table wasn’t anchored to anything. The policy learned to slam the table really hard, making the table fall over, which moved the target point too. The target point just so happened to fall next to the end of the arm.”
Irpan is hopeful about the future of RL for practical problems, but cautiously so. Definitely worth a read.