CS294-112 深度强化学习秋季学期（伯克利）NO.6 Value functions introduction NO.7 Advanced Q learning

--------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------

understand that correlated samples cause problem. and how paralled solve the problem

another solution is replay buffers, fully ultilizing the advantage of off policy in Q-learning.

there‘s still a problem: Q learning is not gradient descent

divide Q function into two parts: the target net and the evolving net.

sacrifice speed to get the convergence.

overestimation of Natural DQN

get trouble in left and right dilemma of avoiding bumping on a tree

原文地址：https://www.cnblogs.com/ecoflex/p/9094123.html

时间： 2024-10-05 09:08:23

CS294-112 深度强化学习秋季学期（伯克利）NO.6 Value functions introduction NO.7 Advanced Q learning的相关文章

CS294-112 深度强化学习秋季学期（伯克利）NO.4 Policy gradients introduction

green bar is the reward function, blue curve is the possibility of differenct trajectories if green bars are equally increased to yellow bars, the result will change! 原文地址:https://www.cnblogs.com/ecoflex/p/9085805.html

CS294-112 深度强化学习秋季学期（伯克利）NO.5 Actor-critic introduction

in most AC algorithms, we actually just fit value function. less common to fit Q function as well. batch:off line, monte carlo.online: bootstrap,TD 原文地址:https://www.cnblogs.com/ecoflex/p/9092566.html

CS294-112 深度强化学习秋季学期（伯克利）NO.9 Learning policies by imitating optimal controllers

make compromise between learnt policy and minimal cost! π hat is using states π theta is using observations 原文地址:https://www.cnblogs.com/ecoflex/p/9097988.html

CS294-112 深度强化学习秋季学期（伯克利）NO.15 Exploration 2

jump over this lecture 原文地址:https://www.cnblogs.com/ecoflex/p/9106152.html

CS294-112 深度强化学习秋季学期（伯克利）NO.19 Guest lecture: Igor Mordatch (Optimization and Reinforcement Learning in Multi-Agent Settings)

skip over 原文地址:https://www.cnblogs.com/ecoflex/p/9112359.html

CS294-112 深度强化学习秋季学期（伯克利）NO.6 Value functions introduction NO.7 Advanced Q learning

CS294-112 深度强化学习秋季学期（伯克利）NO.6 Value functions introduction NO.7 Advanced Q learning的相关文章

CS294-112 深度强化学习秋季学期（伯克利）NO.4 Policy gradients introduction

CS294-112 深度强化学习秋季学期（伯克利）NO.5 Actor-critic introduction

CS294-112 深度强化学习秋季学期（伯克利）NO.9 Learning policies by imitating optimal controllers

CS294-112 深度强化学习秋季学期（伯克利）NO.15 Exploration 2

CS294-112 深度强化学习秋季学期（伯克利）NO.19 Guest lecture: Igor Mordatch (Optimization and Reinforcement Learning in Multi-Agent Settings)

CS294-112 深度强化学习秋季学期（伯克利）NO.21 Guest lecture: Aviv Tamar (Combining Reinforcement Learning and Planning)

【干货总结】| Deep Reinforcement Learning 深度强化学习

深度强化学习泡沫及路在何方？

深度强化学习（Deep Reinforcement Learning）入门：RL base & DQN-DDPG-A3C introduction

CS294-112 深度强化学习 秋季学期（伯克利）NO.6 Value functions introduction NO.7 Advanced Q learning

CS294-112 深度强化学习 秋季学期（伯克利）NO.6 Value functions introduction NO.7 Advanced Q learning的相关文章

CS294-112 深度强化学习秋季学期（伯克利）NO.6 Value functions introduction NO.7 Advanced Q learning

CS294-112 深度强化学习秋季学期（伯克利）NO.6 Value functions introduction NO.7 Advanced Q learning的相关文章