最近因为某个不可描述的原因需要迅速用强化学习完成一个小实例,但是之前完全不懂强化学习啊,虽然用了人家的代码但是在找代码的过程中还是发现了很多不错的强化学习资源,决定mark下来以后学习用
【1】如何用简单例子讲解 Q - learning 的具体过程?
https://www.zhihu.com/question/26408259
【2】最简单的讲解Q-Learning过程的例子
http://mnemstudio.org/path-finding-q-learning-tutorial.htm
注:这个网站上还附带了代码,可惜都是用C++,java写的,看不懂,哎,感觉是一个不错的资源网站
这篇博客是对应的中文翻译最简单的讲解Q-Learning过程的例子
还有人用python按照上述教程完成了复现:
https://github.com/JasonQSY/ML-Weekly/blob/master/P5-Reinforcement-Learning/Q-learning/Q-Learning-Get-Started.ipynb
具体代码如下:
import numpy as np import random
In [44]:
# initial q = np.zeros([6, 6]) q = np.matrix(q) r = np.array([[-1, -1, -1, -1, 0, -1], [-1, -1, -1, 0, -1, 100], [-1, -1, -1, 0, -1, -1], [-1, 0, 0, -1, 0, -1], [0, -1, -1, 0, -1, 100], [-1, 0, -1, -1, 0, 100]]) r = np.matrix(r) gamma = 0.8
In [45]:
# training for i in range(100): # one episode state = random.randint(0, 5) while (state != 5): # choose positive r-value action randomly r_pos_action = [] for action in range(6): if r[state, action] >= 0: r_pos_action.append(action) next_state = r_pos_action[random.randint(0, len(r_pos_action) - 1)] q[state, next_state] = r[state, next_state] + gamma * q[next_state].max() state = next_state
In [46]:
# verify for i in range(10): # one episode print("episode: " + str(i + 1)) # random initial state state = random.randint(0, 5) print("the robot borns in " + str(state) + ".") count = 0 while (state != 5): # prevent endless loop if count > 20: print(‘fails‘) break # choose maximal q-value action randomly q_max = -100 for action in range(6): if q[state, action] > q_max: q_max = q[state, action] q_max_action = [] for action in range(6): if q[state, action] == q_max: q_max_action.append(action) next_state = q_max_action[random.randint(0, len(q_max_action) - 1)] print("the robot goes to " + str(next_state) + ‘.‘) state = next_state count = count + 1
【3】这个人的博客有强化学习系列
http://www.algorithmdog.com/ml/rl-series
【4】http://blog.csdn.net/u012192662/article/category/6394979
粗看感觉写的还可以
时间: 2024-08-26 23:47:53