Reinforcement learning (RL) is a machine learning training method based on rewarding desired behaviours and punishing undesired ones. “Reward” and “punishment” are to be understood in this context as merely numerical values which help the algorithm to find the “best” way to a solution. This approach allows an agent to learn to navigate the complex demands of the specific environment for which it was created so that over time, the agent optimises its behaviours.
This technique is often used to train agents for purposes like gaming and autonomous systems. For example, the Alpha Go model trained with RL was able to defeating one of the world’s best human players in the highly complex board game Go.
With its trial and error approach, it most closely resembles the natural learning behaviour of humans and thus differs from two other common methods for training a model, supervised learning and unsupervised learning, in which results are usually already predetermined.
Therefore, one of the great advantages of RL is the fact that it normally does not need extensive training data set. The disadvantages include the relatively expensive computing power, which is also due to the open nature of the results, as well as the so-called “exploration vs. exploitation” dilemma, i.e. the question of whether an already known solution path should be used and possibly improved or a completely new path should be sought.
http://scholarpedia.org/article/Reinforcement_learning https://www.ki.nrw/ki-schluesselbegriffe/#23 (German)