Autoplay
Autocomplete
Previous Lecture
Complete and Continue
Reinforcement Learning beginner to master
Introduction
Welcome (9:18)
Course structure (2:00)
Setup - Mac (5:05)
The Markov decision process (MDP)
Elements common to all control tasks (5:44)
The Markov decision process (MDP) (5:52)
Types of Markov decision process (2:23)
Trajectory vs episode (1:13)
Reward vs return (1:39)
Discount factor (4:19)
Policy (2:16)
State values and action values (1:11)
Bellman equations (3:16)
Solving a Markov decision process (3:21)
Coding - The Markov decision process 1 (13:03)
Coding - The Markov decision process 2 (13:08)
Dynamic programming
Introduction to Dynamic programming (5:19)
Value iteration (4:00)
Coding - Value iteration 1 (4:11)
Coding - Value iteration 2 (5:27)
Coding - Value iteration 3 (1:16)
Coding - Value iteration 4 (7:40)
Coding - Value iteration 5 (3:09)
Policy iteration (2:19)
Coding - Policy iteration 1 (5:09)
Policy evaluation (2:11)
Coding - Policy iteration 2 (8:22)
Policy improvement (2:56)
Coding - Policy iteration 3 (6:33)
Coding - Policy iteration 4 (6:12)
Policy iteration in practice (2:08)
Generalized Policy Iteration (GPI) (2:17)
Monte Carlo methods
Monte Carlo methods (3:09)
Solving control tasks with Monte Carlo methods (6:56)
On-policy Monte Carlo control (4:33)
Coding - On-policy Monte Carlo control 1 (10:05)
Coding - On-policy Monte Carlo control 2 (10:20)
Coding - On-policy Monte Carlo control 3 (2:51)
Coding - Constant alpha Monte Carlo (4:26)
Off-policy Monte Carlo control (7:08)
Coding - Off-policy Monte Carlo control 1 (11:32)
Coding - Off-policy Monte Carlo control 2 (12:36)
Coding - Off-policy Monte Carlo control 3 (3:13)
Temporal difference methods
Temporal difference methods (3:16)
Solving control tasks with temporal difference methods (3:58)
Monte Carlo vs temporal difference methods (1:23)
SARSA (3:53)
Coding - SARSA 1 (5:18)
Coding - SARSA 2 (8:39)
Q-Learning (2:22)
Coding - Q-Learning 1 (5:09)
Coding - Q-Learning 2 (9:08)
Advantages of temporal difference methods (0:56)
N-step bootstrapping
N-step temporal difference methods (3:30)
Where do n-step methods fit? (2:53)
Effect of changing n (4:36)
N-step SARSA (16:14)
N-step SARSA in action (1:55)
Coding - N-step SARSA (16:14)
Continuous state spaces
Coding - Classic control tasks (10:53)
Working with continuous state spaces (3:02)
State aggregation (4:08)
Coding - State aggregation 1 (20:29)
Coding - State aggregation 2 (3:04)
Coding - State aggregation 3 (3:44)
Tile coding (5:14)
Coding - Tile coding 1 (21:28)
Coding - Tile coding 2 (7:35)
Coding - Tile coding 3 (3:03)
Brief introduction to neural networks
Function approximators (7:35)
Artificial Neural Networks (3:32)
Artificial Neurons (5:38)
How to represent a Neural Network (6:44)
Stochastic Gradient Descent (5:40)
Neural Network optimization (4:01)
Deep SARSA
Deep SARSA (2:39)
Neural Network optimization (Deep Q-Learning) (2:41)
Experience replay (1:57)
Target network (3:28)
Deep SARSA en código - Parte 1 (7:40)
Deep SARSA en código - Parte 2 (13:47)
Deep SARSA en código - Parte 3 (4:09)
Deep SARSA en código - Parte 4 (1:51)
Deep SARSA en código - Parte 5 (2:08)
Deep SARSA en código - Parte 6 (5:42)
Deep SARSA en código - Parte 7 (7:15)
Deep SARSA en código - Parte 8 (6:42)
Deep SARSA en código - Parte 9 (11:49)
Deep SARSA en código - Parte 10 (5:21)
Deep Q-Learning
Deep Q-Learning (3:02)
Coding - Deep Q-Learning 1 (6:06)
Coding - Deep Q-Learning 2 (10:15)
Coding - Deep Q-Learning 3 (9:43)
REINFORCE
Policy gradient methods (4:16)
Representing policies using neural networks (4:43)
Policy performance (2:16)
The policy gradient theorem (3:20)
REINFORCE (3:38)
Parallel learning (3:06)
Entropy regularization (5:39)
REINFORCE 2 (2:03)
Coding - REINFORCE 1 (8:10)
Coding - REINFORCE 2 (13:12)
Coding - REINFORCE 3 (7:56)
Coding - REINFORCE 4 (11:15)
Coding - REINFORCE 5 (14:37)
Advantage Actor-Critic (A2C)
Advantage Actor-Critic (A2C) (10:48)
Coding - A2C 1 (5:20)
Coding - A2C 2 (4:29)
Coding - A2C 3 (5:49)
Coding - A2C 4 (11:30)
Outro
Looking back (2:47)
Solving control tasks with temporal difference methods
Lecture content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock