What is deep reinforcement learning: The next step in AI and deep learning

Reinforcement learning is well-suited for autonomous decision-making where supervised learning or unsupervised learning techniques alone can’t do the job

Contributor, InfoWorld |

What is deep reinforcement learning: The next step in AI and deep learning — Thinkstock

Reinforcement learning has traditionally occupied a niche status in the world of artificial intelligence. But reinforcement learning has started to assume a larger role in many AI initiatives in the past few years. Its application sweet spot is in calculation of optimal actions to be taken by agents in environmentally contextualized decision scenarios.

Using trial-and-error approaches to maximize an algorithmic reward function, reinforcement learning is well suited to many adaptive-control and multiagent automation applications in IT operations management, energy, health care, commerce, finance, transportation, and finance. And it’s being used to train the AI that powers both its traditional focus areas—robotics, gaming, and simulation—and a new generation of AI solutions in edge analytics, natural language processing, machine translation, computer vision, and digital assistants.

Reinforcement learning is also fundamental to the development of autonomous edge applications in the internet of things. Much of edge application development—for industrial, transportation, health care, and consumer applications—involves building AI-infused robotics that can operate with varying degrees of contextual autonomy under dynamic environmental circumstances.

How reinforcement learning works

In such application domains, edge devices’ AI brains must rely on reinforcement learning, in which, lacking a pre-existing “ground truth” training data set, they seek to maximize a cumulative reward function, such as assembling a manufactured component according to a set of criteria included in a spec. This is in contrast to how other types of AI learn, which is either by (as with supervised learning) minimizing an algorithmic loss function with respect to the ground truth data or (as with unsupervised learning) minimizing a distance function among data points.

However, these AI learning methods are not necessarily silos. One of the most interesting AI trends is the convergence of reinforcement learning with supervised and unsupervised learning in more advanced applications. AI developers are blending these approaches in applications for which no single learning method is sufficient.

For example, by itself, supervised learning is useless in the absence of labeled training data, which is often lacking in applications such as autonomous driving, where every split-second environmental circumstance is essentially unlabeled and unique. Likewise, unsupervised learning—which uses cluster analysis to detect patterns in sensor feeds and other complex unlabeled data—is not geared to identifying the optimal action that an intelligent endpoint should take in a real-world decisioning scenario.

What is deep reinforcement learning

Then there’s deep reinforcement learning, a leading-edge technique in which autonomous agents use reinforcement learning’s trial-and-error algorithms and cumulative-reward functions to accelerate neural network designs. These designs are what power many AI applications that depend on supervised and/or unsupervised learning.

Deep reinforcement learning is a core focus area in the automation of AI development and training pipelines. It involves the use of reinforcement learning-driven agents to rapidly explore the performance trade-offs associated with the myriad architectures, node types, connections, hyperparameter settings, and other options available to designers of deep learning, machine learning, and other AI models.

For example, researchers are using deep reinforcement learning to quickly ascertain which of myriad deep-learning convolutional neural network (CNN) architectures might be best suited to various challenges in feature engineering, computer vision, and image classification. The results gained through deep reinforcement learning might then be used by AI tools to autogenerate the optimal CNN, using deep-learning development tools like TensorFlow, MXNet, or PyTorch for that task.

In that regard, it’s encouraging to see the emergence of open frameworks for reinforcement-learning development and training. As you explore deep reinforcement learning, you’ll probably want to explore the following reinforcement learning frameworks that leverage, extend, and interface with TensorFlow and other deep-learning and machine-learning modeling tools that have gained broad adoption:


Reinforcement Learning Framework	What It Does and Where to Get It
TensorFlow Agents	TensorFlow Agents provides tools for building and training diverse intelligent applications through reinforcement learning. The framework, an extension to TensorFlow, extends the OpenAI Gym interface to multiple parallel environments and allows agents to be implemented in TensorFlow and perform batched computation. Its batched interface for OpenAI Gym environments fully integrates with TensorFlow for efficient algorithm implementations. The framework incorporates BatchPPO, an optimized implementation of the Proximal Policy Optimization algorithm. Its core components include an environment wrapper that constructs an OpenAI Gym environment inside of an external process; a batch integration that makes TensorFlow graph step and reset functions accessible as reinforcement learning operations; and a component that fuses in-graph TensorFlow batch processes and reinforcement learning algorithms into a single operation inside a training loop.
Ray RLLib	RLLib provides a flexible task-based programming model for building agent-based reinforcement learning applications for diverse applications. Developed at UC-Berkeley and currently in version 2, RLLib works within Ray, a flexible, high-performance distributed execution framework. Noteworthy among RLLib’s developers is one of the principal creators of Apache Spark. RLLib works within the TensorFlow and PyTorch frameworks, enables sharing of models between algorithms, and integrates with the Ray Tune hyperparameter tuning tool. The framework incorporates a composable and scalable library of standard reinforcement learning components. Each RLLib component can be parallelized, extended, combined, and reused within distributed applications. RLLib includes three reinforcement learning algorithms—Proximal Policy Optimization (PPO), Asynchronous Advantage Actor-Critic (A3C), and Deep Q Networks (DQN)—all of which can be run on any OpenAI Gym Markov decision process. It provides scalable primitives for developing new algorithms, a Python API for applying RLLib to new problems, a repository of agent hyperparameter settings and pluggable distributed reinforcement learning execution strategies. It supports user creation of custom reinforcement learning algorithms.
Roboschool	Roboschool provides open-source software for building and training robot simulations through reinforcement learning. It facilitates concurrent reinforcement learning training of multiple agents together in the same environment. With multiplayer training, you can train the same agent playing for both parties (so it plays with itself), train two agents using the same algorithm, or set two algorithms against each other. Roboschool was developed by OpenAI, the industry nonprofit whose sponsors include Elon Musk, Sam Altman, Reid Hoffman, and Peter Thiel. It is integrated with OpenAI Gym, which is an open-source toolkit for developing and evaluating reinforcement learning algorithms. OpenAI Gym is compatible TensorFlow, Theano, and other deep-learning libraries. OpenAI Gym includes code for numerical computation, gaming, and physics engines. Roboschool is based on the Bullet Physics Engine, an open-source, permissively licensed physics library that has been used by other simulation software such as Gazebo and Virtual Robot Experimentation Platform (V-REP). It includes several reinforcement-learning algorithms: Asynchronous Methods for Deep Reinforcement Learning, Actor-Critic with Experience Replay, Actor-Critic using Kronecker-Factored Trust Region, Deep Deterministic Policy Gradients, Proximal Policy Optimization, and Trust Region Policy Optimization.
Machine Learning Agents	Still in beta, Unity Technology’s Machine Learning Agents supports development and reinforcement learning training of intelligent agents for games, simulations, self-driving vehicles, and robots. ML-Agents supports diverse reinforcement learning training scenarios, which involve different configurations and interactions among agents, brains, and rewards. The framework’s SDK supports single and multi-agent scenarios as well as discrete and continuous action spaces. It provides a Python API for access to reinforcement learning, neuroevolution, and other machine learning methods. An ML-Agents learning environment consists of agents executing reinforcement learning through interactions with automated components known as “brains.” Each agent can have a unique set of states and observations, take unique actions within the environment, and can receive unique rewards for events within the environment. An agent’s actions are decided by the brain it is linked to. Each brain defines a specific state and action space and decides which actions each of its linked agents will take. In addition, each ML-Agents environment contains a single “academy” that defines the scope of the environment, in terms of engine configuration (the speed and rendering quality of the game engine in both training and inference modes), frameskip (how many engine steps to skip between each agent making a new decision), and global episode length (how long the episode will last). One of the modes to which a brain may be set is external, in which action decisions are made using TensorFlow or another machine learning library of choice through communication over an open socket with ML-Agent’s Python API. Another mode is internal, in which agent action decisions are made using a trained model embedded into the project via an embedded TensorFlowSharp agent.
Coach	Intel’s Nervana Coach is an open-source reinforcement learning framework for modeling, training, and evaluating intelligent agents for games, robotics, and other agent-based intelligent applications. Coach provides a modular sandbox, reusable components, and Python API for composing new reinforcement learning algorithms and training new intelligent apps in diverse application domains. The framework uses OpenAI Gym as the main tool for interacting with different reinforcement learning environments. It also supports external extensions to Gym such as Roboschool, gym-extensions, PyBullet, and ViZDoom. Coach’s environment wrapper allows adding other custom reinforcement learning environments to solve other learning problems. The framework enables efficient training of reinforcement learning agents on a desktop computer and uses multicore CPU processing. It provides single- and multithreaded implementations for some reinforcement learning algorithms, including Asynchronous Advantage Actor-Critic, Deep Deterministic Policy Gradient, Proximal Policy Optimization, Direct Future Prediction, and Normalized Advantage Function. All the algorithms are implemented using Intel-optimized TensorFlow, and some are also available through Intel’s Neon deep-learning framework. Coach includes implementations for many reinforcement learning agent types, including transition from single threaded implementations to multithreaded implementations. It supports development of new agents for single- and multiworker (synchronous or asynchronous) reinforcement learning implementations. It supports continuous and discrete action spaces, as well as visual observations spaces or observation spaces that include only raw measurements.

The reinforcement-learning skills that AI developers need

Going forward, AI developers will need to immerse themselves in the wide range of reinforcement learning algorithms implemented in these and other frameworks. You will also need to deepen your understanding of multiagent reinforcement-learning architectures, many of which heavily leverage the established body of game-theory research. You will also need to familiarize yourself with deep reinforcement learning as a tool for identifying security vulnerabilities in computer vision applications associated with an attack method known as “fuzzing.”

Last but not least, here are some excellent resources for developers needing to bootstrap their skills in the convergence of reinforcement learning and deep learning:

Next read this:

James Kobielus is principal analyst at Franconia Research.