Reinforcement Learning: The Strange New Kid On The block
If Bayesian statistics is the black sheep of the statistics family (and some people think it is), reinforcement learning is the strange new kid on the data science and machine learning block. It employs many of the familiar techniques from machine learning, but the setting is fundamentally different. You don’t follow the usual ritual of taking a big bunch of data, splitting it into partitions, train, evaluate and improve your model. The data your model works with in reinforcement learning is not some entity that is separate from the model itself. Instead, your model must choose from a set of actions, and gets a reward depending on this choice. Then it chooses the next action, gets the next reward, and so on, with your model trying to maximize the reward. Hence, data is not given. It is being produced while the model interacts with its environment.
Reinforcement Learning: Why Bayes?
The best-known applications of reinforcement learning are connected to games. The defeat of an e-sports champion in the computer game Dota by OpenAI’s deep reinforcement agents has attracted a lot of attention. The same is true for Deepmind’s board game program AlphaZero, which is also based on reinforcement learning. The computational resources invested for this kind of approach are huge: OpenAI’s agents have played a total of 45 000 years of Dota in fast forward mode. And the importance of games and simulations in reinforcement learning is not restricted to high-profile cases that make the headlines. When you look at OpenAI Gym, a popular environment for training reinforcement agents, you see lots of computer game classics like Pong and several Atari games, along with physics simulations where an agent can learn to balance a pole on a cart. There is an interesting connection here to the Bayesian approach: In reinforcement learning, we often assume we know the rules of the environment and their interaction so well that we can set up a simulation as a training environment for the agents. In other words, reinforcement learning routinely works with strong assumptions. So strong that it is often applied to a purely simulated game setting that is isolated from the “real world”. What if we could use that other thing that works with strong assumptions, Bayesian statistics, to break through this isolation and use reinforcement learning in the real world?
Reinforcement Learning And Bayesian Statistics: A Child’s Game
Let’s try these abstract ideas and build something concrete. We will stay in the reinforcement learning tradition by using a game, but we’ll break with tradition in other ways: the learning environment will not be simulated. It will be the interaction with a real human like you, for example. As this is intended to be as simple as possible, the game we use will be the childhood’s classic rock, paper, scissors. Game theory says this game has a single equilibrium in which both players choose their actions uniformly at random. In plain English: you can’t do better than choosing randomly. But also, game theory makes strong assumptions, and they are rarely fulfilled when humans are involved. Humans are not good at being truly random, and so it is interesting to design a reinforcement learning agent that exploits the biases of its human counterpart.