Description
In this course, you will learn the basics of Reinforcement Learning, one of the three paradigms of modern artificial intelligence. You will implement from scratch adaptive algorithms that solve control tasks based on experience. You will also learn to combine these algorithms with Deep Learning techniques and neural networks, giving rise to the branch known as Deep Reinforcement Learning.
This course covers both the basics of this branch and the most popular advanced algorithms and focuses on developing practical skills. Therefore, after learning the core concepts in each section of the course, we will implement the algorithms covered from scratch.
Course Trailer
Featured review
"I am doing my PhD thesis in Electrical Engineering. I had to learn RL from very beginning to implement in my thesis, and this course amazingly helped me to get the intuition and whole basic algorithms from beginning to the end. I highly recommended it".
- Sarah Allahmoradi
Try our courses risk-free
All our courses have a 30 day money back guarantee. But we are pretty sure you'll love them.
Prerequisites
- Be comfortable programming in Python
- Know basic linear algebra and calculus (matrices, vectors, determinants, derivatives, etc.)
- Know basic statistics and probability theory (mean, variance, normal distribution, etc.)
Course Curriculum
- Elements common to all control tasks (5:44)
- The Markov decision process (MDP) (5:52)
- Types of Markov decision process (2:23)
- Trajectory vs episode (1:13)
- Reward vs return (1:39)
- Discount factor (4:19)
- Policy (2:16)
- State values and action values (1:11)
- Bellman equations (3:16)
- Solving a Markov decision process (3:21)
- Coding - The Markov decision process 1 (13:03)
- Coding - The Markov decision process 2 (13:08)
- Introduction to Dynamic programming (5:19)
- Value iteration (4:00)
- Coding - Value iteration 1 (4:11)
- Coding - Value iteration 2 (5:27)
- Coding - Value iteration 3 (1:16)
- Coding - Value iteration 4 (7:40)
- Coding - Value iteration 5 (3:09)
- Policy iteration (2:19)
- Coding - Policy iteration 1 (5:09)
- Policy evaluation (2:11)
- Coding - Policy iteration 2 (8:22)
- Policy improvement (2:56)
- Coding - Policy iteration 3 (6:33)
- Coding - Policy iteration 4 (6:12)
- Policy iteration in practice (2:08)
- Generalized Policy Iteration (GPI) (2:17)
- Monte Carlo methods (3:09)
- Solving control tasks with Monte Carlo methods (6:56)
- On-policy Monte Carlo control (4:33)
- Coding - On-policy Monte Carlo control 1 (10:05)
- Coding - On-policy Monte Carlo control 2 (10:20)
- Coding - On-policy Monte Carlo control 3 (2:51)
- Coding - Constant alpha Monte Carlo (4:26)
- Off-policy Monte Carlo control (7:08)
- Coding - Off-policy Monte Carlo control 1 (11:32)
- Coding - Off-policy Monte Carlo control 2 (12:36)
- Coding - Off-policy Monte Carlo control 3 (3:13)
- Temporal difference methods (3:16)
- Solving control tasks with temporal difference methods (3:58)
- Monte Carlo vs temporal difference methods (1:23)
- SARSA (3:53)
- Coding - SARSA 1 (5:18)
- Coding - SARSA 2 (8:39)
- Q-Learning (2:22)
- Coding - Q-Learning 1 (5:09)
- Coding - Q-Learning 2 (9:08)
- Advantages of temporal difference methods (0:56)
- Coding - Classic control tasks (10:53)
- Working with continuous state spaces (3:02)
- State aggregation (4:08)
- Coding - State aggregation 1 (20:29)
- Coding - State aggregation 2 (3:04)
- Coding - State aggregation 3 (3:44)
- Tile coding (5:14)
- Coding - Tile coding 1 (21:28)
- Coding - Tile coding 2 (7:35)
- Coding - Tile coding 3 (3:03)
- Deep SARSA (2:39)
- Neural Network optimization (Deep Q-Learning) (2:41)
- Experience replay (1:57)
- Target network (3:28)
- Deep SARSA en código - Parte 1 (7:40)
- Deep SARSA en código - Parte 2 (13:47)
- Deep SARSA en código - Parte 3 (4:09)
- Deep SARSA en código - Parte 4 (1:51)
- Deep SARSA en código - Parte 5 (2:08)
- Deep SARSA en código - Parte 6 (5:42)
- Deep SARSA en código - Parte 7 (7:15)
- Deep SARSA en código - Parte 8 (6:42)
- Deep SARSA en código - Parte 9 (11:49)
- Deep SARSA en código - Parte 10 (5:21)
- Policy gradient methods (4:16)
- Representing policies using neural networks (4:43)
- Policy performance (2:16)
- The policy gradient theorem (3:20)
- REINFORCE (3:38)
- Parallel learning (3:06)
- Entropy regularization (5:39)
- REINFORCE 2 (2:03)
- Coding - REINFORCE 1 (8:10)
- Coding - REINFORCE 2 (13:12)
- Coding - REINFORCE 3 (7:56)
- Coding - REINFORCE 4 (11:15)
- Coding - REINFORCE 5 (14:37)