Introduction to Reinforcement Learning | DigiKey Electronics
Reinforcement Learning (RL) is a field of machine learning that aims to find optimal solutions to control theory problems for various tasks. It employs an artificial intelligence (AI) “agent” that takes in observations, chooses actions, and learns from rewards. Modern RL algorithms train agents using trial-and-error approaches that involve directly interacting with the given environment.
In the video, we cover the basic theory behind RL and demonstrate how to use Farama Foundation Gymnasium and Stable Baselines3 in Python to train an AI agent to solve the classic cartpole control theory problem. At the end of the video, we encourage you to try applying the knowledge to solve the slightly more advanced inverted pendulum problem.
The solution to the challenge can be found here: https://www.digikey.com/en/maker/projects/intro-to-reinforcement-learning-using-gymnasium-and-stable-baselines3/28c6602f5d1e4ce1b5a90642a1ac7efc
Code for training RL agents to solve both the cartpole and pendulum problems can be found here: https://github.com/ShawnHymel/reinforcement-learning-demos
In RL, the environment can be anything the agent interacts with, such as board games, video games, virtual settings, or the real world. We often use a code wrapper (e.g. Gymnasium) to observe this environment, perform agent-specified actions, and assign rewards. Note that rewards are considered part of the environment and are instrumental in training.
The decision-making process for choosing actions based on observations is known as the “policy.” During training, the agent selects actions randomly or per policy. The environment then offers a new observation and reward, guiding the training algorithm to help the agent choose actions leading to higher predicted total rewards in the future.
The cartpole problem consists of a virtual pole balanced on top of a cart that can only move left and right. The goal is to design an AI agent that can keep the pole balanced by pushing the cart left or right. In the video, we use Deep Q-Learning to train a Deep Q-Network (DQN) to solve the cartpole problem.
We list some recommended reading and viewing materials below if you would like to dive deeper into reinforcement learning.
Full courses: Coursera: Reinforcement Learning Specialization EdX: Fundamentals of Deep Reinforcement Learning Udemy: Artificial Intelligence: Reinforcement Learning in Python OpenAI: Spinning Up in Deep RL Hugging Face: Deep RL Course Google DeepMind: Introduction to Reinforcement Learning with David Silver
Videos: Reinforcement Learning in 3 Hours | Full Course using Python by Nicholas Renotte Deep RL for Robotics talk by Mat Kelcey
Books: Reinforcement Learning: An Introduction by Sutton and Barto Grokking Deep Reinforcement Learning by Miguel Morales
Articles: Reinforcement Learning Algorithms — an intuitive overview Which Reinforcement learning-RL algorithm to use where, when and in what scenario? Q-Learning vs. Deep Q-Learning vs. Deep Q-Network Deep Q Networks (DQN) With the Cartpole Environment RL — Proximal Policy Optimization (PPO) Explained Proximal Policy Optimization (PPO)
Product Links:
Related Videos: Exploring Reinforcement Learning: Can AI Learn to Play QWOP? Intro to Edge AI Related Project Links: Intro to Reinforcement Learning Using Gymnasium and Stable Baselines3 Related Articles: Teach an AI to play QWOP What is Edge AI? Machine Learning + IoT
Learn more: Maker.io - https://www.digikey.com/en/maker Digi-Key’s Blog – TheCircuit https://www.digikey.com/en/blog Connect with Digi-Key on Facebook https://www.facebook.com/digikey.electronics/ And follow us on Twitter https://twitter.com/digikey
00:00 - Intro 00:59 - History of reinforcement learning 02:14 - Environment and agent interaction loop 06:21 - Gymnasium and Stable Baselines3 07:55 - Hands-on: how to set up a gymnasium environment 26:57 - Markov decision process 31:02 - Bellman equation for the state-value function 34:12 - Bellman equation for the action-value function 35:47 - Bellman optimality equations 36:43 - Exploration vs. exploitation 38:39 - Recommended textbook 39:25 - Model-based vs. model-free algorithms 40:27 - On-policy vs. off-policy algorithms 41:19 - Discrete vs. continuous action space 42:36 - Discrete vs. continuous observation space 43:56 - Overview of modern reinforcement learning algorithms 46:29 - Q-learning 49:27 - Deep Q-network (DQN) 51:59 - Hands-on: how to train a DQN agent 72:36 - Usefulness of reinforcement learning 73:26 - Challenge: inverted pendulum 74:10 - Conclusion

