(六)Value Function Approximation-LSPI code (5)

本篇是sample.py

 1 # -*- coding: utf-8 -*-
 2 """Contains class representing an LSPI sample."""
 3
 4
 5 class Sample(object):
 6
 7     """Represents an LSPI sample tuple ``(s, a, r, s‘, absorb)``.
 8     #表达了ＬＳＰＩ的采样，用ｔｕｐｌｅ表示
 9     Parameters＃输入参数
10     ----------
11
12     state : numpy.array＃状态向量
13         State of the environment at the start of the sample.采样开始时环境的状态
14         ``s`` in the sample tuple.
15         (The usual type is a numpy array.)
16     action : int＃执行的动作的编号
17         Index of action that was executed.
18         ``a`` in the sample tuple
19     reward : float＃从环境中获得的奖励
20         Reward received from the environment.
21         ``r`` in the sample tuple
22     next_state : numpy.array＃采用了采样中的动作后的下一个环境状态
23         State of the environment after executing the sample‘s action.
24         ``s‘`` in the sample tuple
25         (The type should match that of state.)
26     absorb : bool, optional＃如果这个采样终结了这个episode那么就返回Ｔｒｕｅ
27         True if this sample ended the episode. False otherwise.
28         ``absorb`` in the sample tuple
29         (The default is False, which implies that this is a
30         non-episode-ending sample)
31
32
33     Assumes that this is a non-absorbing sample (as the vast majority
34     of samples will be non-absorbing).
35     ＃假设这个ｓａｍｐｌｅ是不会结束episode的，
36     ＃这么做：设成一个类，是为了方便不同的调用方式
37     This class is just a dumb data holder so the types of the different
38     fields can be anything convenient for the problem domain.
39
40     For states represented by vectors a numpy array works well.
41
42     """
43
44     def __init__(self, state, action, reward, next_state, absorb=False):＃初始化
45         """Initialize Sample instance."""
46         self.state = state
47         self.action = action
48         self.reward = reward
49         self.next_state = next_state
50         self.absorb = absorb
51
52     def __repr__(self):＃打印的时候调用该函数．
53         """Create string representation of tuple."""
54         return ‘Sample(%s, %s, %s, %s, %s)‘ % (self.state,
55                                                self.action,
56                                                self.reward,
57                                                self.next_state,
58                                                self.absorb)

时间： 2024-10-15 12:50:52

(六)Value Function Approximation-LSPI code (5)的相关文章

打印发现function toUpperCase() { [native code] }

var s='hello' undefined s.toUpperCase function toUpperCase() { [native code] } s.toUpperCase() "HELLO" 咦然后我就发现了要是这本来是一个方法然后你没有用一个方法的方式去调用,就会出现function toUpperCase() { [native code] } 类似于这一句就是这样啦然后你再用方法的方式去调用一下就会得到你想要的东西

2.在使用"node-xlsx" 模块时报" TypeError: Object function Object() { [native code] } has no method 'assign' "

最近做一个关于数据库数据以xls格式导出的功能.由于之前用的"excel-export"模块功能有不是很全.也有可能是我没完全弄明白怎么使用这个模块生成多页的excel文件吧.后来就选用了'node-xlsx'模块来做.但是在使用demo的时候,会报一个TypeError: Object function Object() { [native code] } has no method 'assign的错误.网上查了半天也没找到相关信息.后来终于找到一篇文章.(链接:https://w

(六)Value Function Approximation-LSPI code (1)

本篇代码来自: https://github.com/rhololkeolke/lspi-python 这是lspi文件夹basisfunction.py文件事项 (1)python ABC(abstract base class)用法: https://mozillazg.com/2014/06/python-define-abstract-base-classes.html http://blog.csdn.net/nixawk/article/details/42970321 1 # -

(六)Value Function Approximation-LSPI code (2)

接上一篇,对LSPI算法的code进行解释 1 # -*- coding: utf-8 -*- 2 """Contains main interface to LSPI algorithm.""" 3 #LSPI算法的主要接口 4 from copy import copy 5 6 import numpy as np 7 8 9 def learn(data, initial_policy, solver, epsilon=10**-5, ma

python编程快速上手第六章实践项目参考code

代码如下: 题目的意思是通过一个函数将列表的列表显示在组织良好的表格中,每列右对齐 tableData = [['apples', 'oranges', 'cherries', 'banana'], ['Alice', 'Bob', 'Carol', 'David'], ['dogs', 'cats', 'moose', 'goose']] ''' apples Alice dogs oranges Bob catscherries Carol moose bana

2.6. Statistical Models, Supervised Learning and Function Approximation

Statical model regression $y_i=f_{\theta}(x_i)+\epsilon_i,E(\epsilon)=0$ 1.$\epsilon\sim N(0,\sigma^2)$ 2.使用最大似然估计$\rightarrow$最小二乘 $y\sim N(f_{\theta}(x),\sigma^2)$ $L(\theta)=-\frac{N}{2}log(2\pi)-Nlog\sigma -\frac{1}{2\sigma^2}\sum_i\left(y_i-f_{\

探秘JavaScript中的六个字符

JavaScript 是一个奇怪而有趣的语言,我们可以写一些疯狂却仍然有效的代码.它试图帮助我们把事情转换到基于我们如何对待他们的特定类型. 如果我们添加一个字符串,JavaScript会假定我们希望为文本形式表示,所以将它转换为一个字符串.如果我们添加一个正负前缀符号,JavaScript会假定我们希望为数值形式表示,如果可能的话,对我们来说并将字符串转换为一个数字.如果我们添加一个否定符号,JavaScript会将将字符串转换为一个布尔值. 我们可以使用Javascript中[,],(,),

vbscript 中的 sub 和 function 的调用

声明 sub sub subA(arg1) '//Code.... end sub sub subB(arg1, arg2) '//Code.... end sub 声明 function function funcA(arg1) '//Code.... A = arg1 end function function funcB(arg1,arg2) '//Code.... A = arg1 end function 调用 sub subA(arg1

小程序--获取code

wx.login({ success: function (res) { var code = res.code; if (code) { console.log('获取用户登录凭证:' + code); // --------- 发送凭证 ------------------ wx.request({ url: 'https://www.my-domain.com/wx/onlogin', data: { code: code } }) // -------------------------