Reinforcement Learning is a type of machine learning that focuses on developing algorithms in which agents interact with an environment and learn to make decisions based on trial and error experiences. The goal is to learn to choose actions that maximize rewards.
To make the concept clearer, let’s use Pitfall (an Atari classic). The objective of this game is to guide Harry through a forest avoiding several deadly dangers and trying to collect the greatest number of treasures in the shortest possible time.
Harry (agent) must explore the forest (environment) and make decisions (action) that should prevent him from losing a life (punishment) and collect as many coins as possible (reward) to complete the adventure.
As the game evolves, we carry out actions that generate consequences, we collect coins and earn points or we are bitten by a snake and die, and through this interaction we adjust our behavior based on the feedback we receive from the environment.
Based on the experience we gained, we realized that some areas of the forest are more dangerous than others and therefore we learned to avoid them or that to guarantee a safe jump against the crocodiles it is better to wait for it to close its mouth. That’s the idea behind reinforcement learning, exploring the environment and learning, by trial and error, the best way to achieve your goal. Pretty cool, don’t you think?
Source: Google Images
Overall, reinforcement learning has many applications, including robotics, gaming, finance, healthcare, social networking, advertising, and many others. In this article I will mention four very cool applications that made use of this technique.
Autonomous car applications
A Wayve.ai applied reinforcement learning to train a car how to drive in one day. The algorithm was used to perform the task where the car must learn from scratch how to follow the lane. In about 20 minutes, the vehicle was able to learn and complete the activity.
In each episode, the track was randomly generated and the agent explored the environment until leaving the track, when the episode ended. Then the policy was optimized based on the collected data and the process repeated.
Natural language processing applications
In the field of natural language processing, RL can be used for machine translation as demonstrated by the authors of theUniversity of Colorado and University of Maryland who proposed an approach based on reinforcement learning for simultaneous machine translation.
In short, the translator uses reinforcement learning to more confidently predict what the end of the sentence will be before it is completely typed, eliminating the need to wait for the full entry to appear before starting the translation. Which, according to the authors, would be a bottleneck for the traditional translator.
Applications in robotic manipulation
A Google Research applied deep learning combined with reinforcement learning using 7 robots thatexecuted 800 hours/robot over a period of 4 months to train an object collection policy.
At the end of the process, they managed to succeed with an algorithm capable of generalizing to a diverse set of objects not seen during training.
Fonte: Google Research
Applications in energy conservation
In 2018, inDeepmind made use of AI agents to cool Google’s Data Centers, which led to a reduction of around 30% in energy consumption in the first months of implementation.
Every five minutes to AIstrip a snapshot of the data center’s cooling system and feeds algorithms that predict how different combinations of potential actions will affect future energy consumption. Then, artificial intelligence identifies which actionswill minimize this consumption and these actions are sent back to the data center where they will be implemented.
Reinforcement learning is an ever-evolving research area and significant efforts have been made to boost its use in solving complex human tasks. One of the great advantages of this approach is the possibility of obtaining the best actions without the need to know all the characteristics of the environment. I hope this text has shed light on this field and aroused your interest in exploring it further.