CS294-112 深度强化学习 秋季学期(伯克利)NO.6 Value functions introduction NO.7 Advanced Q learning

understand that correlated samples cause problem. and how paralled solve the problem

another solution is replay buffers, fully ultilizing the advantage of off policy in Q-learning.

there‘s still a problem: Q learning is not gradient descent

divide Q function into two parts: the target net and the evolving net.

sacrifice speed to get the convergence.

overestimation of Natural DQN

get trouble in left and right dilemma of avoiding bumping on a tree


时间: 2024-07-30 09:30:34

【干货总结】| Deep Reinforcement Learning 深度强化学习



一.深度强化学习的泡沫 2015年,DeepMind的Volodymyr Mnih等研究员在<自然>杂志上发表论文Human-level control through deep reinforcement learning[1],该论文提出了一个结合深度学习(DL)技术和强化学习(RL)思想的模型Deep Q-Network(DQN),在Atari游戏平台上展示出超越人类水平的表现.自此以后,结合DL与RL的深度强化学习(Deep Reinforcement Learning, DRL)迅速

