Reinforcement Learning Specialization on Coursera

zo 08 november 2020

The value of Data Science is most often in the action taken. However most of the tooling used are fancy correlation computations (I'm looking at you supervised learning). Reinforcement learning focuses on taking action: What is the best action to take in a certain situation? I have done projects with RL but wanted to broaden my knowledge. So I did the Reinforcement Learning Specialization on Coursera, created by instructors from the University of Alberta including Sutton and Barto themselves.

The specialization is divided in 4 courses, each building on the previous ones. The first explores the central concepts of RL and simple methods. The second one on algorithms that learn from interactions with the environment, bypassing the otherwise necessary knowledge about state transitions (how the world changes after you take an action). The third introduces ways to generalize between states with function approximation. The last is a capstone project where you write a way to learn how to safely get a moon-lander on the moon in a simulator.

The good. Instructors Adam and Martha White explain the concepts very well in series of short clips, with a lot of examples. The book follows the book Reinforcement Learning by Sutton and Barto, and being able to read back or ahead about the subjects in the course is a plus. Clips from other researchers are included to give an idea of the different applications in the field, as well al related topics not covered in the course or book.

I fondly remember the clip by Micheal L. Littman, where he mentions evolutionary optimization of the reward signal. What if every agent has a mapping from sensory input to a reward, and evolution selects/shapes for a reward mapping that results in a thriving community of agents. Doesn't sound extremely far fetched right?

Also noteworthy is the clip from Joelle Pineau. She shows that also in the active academic RL field there is a problem with fair comparison. When using open source implementations of the algorithms she looked at there were even large differences between implementations. You would expect the same outcome for the same algorithm. When I did my master thesis I encountered a paper with a similar message about repeating patterns in timeseries: "On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration" by Keogh & Kasetty.

The bad. The course ends where things get really interesting. It doesn't mention situations where you compete with other agents. And it basically skips the most interesting RL breakthrough of the last few years: deep neural networks to play Atari games. I would have loved for the specialization to continue, maybe with more open ended projects to tinker with.

The ugly. I was disappointment with the programming sessions. A lot of work has been done for you, what remains is filling in some lines, sometimes literally repeating the instructions, sometimes frustratingly unclear. I expected more challenges: You get an environment to program an agent in, try a few algorithms to see how they differ, and why some algorithms are better suited to tackle some problems.

In conclusion I really enjoyed the series of courses. The instructional clips were clear and interesting. And although I would have liked differently structured programming assignments, I guess I can do that myself with all the extra knowledge I have now.