NumPy库入门
NumPy数据存取和函数
数据的CSV文件存取
CSV文件
CSV(Comma-Separated Value,逗号分隔值)是一种常见的文件格式,用来存储批量数据。
np.savetxt(frame,array,fmt=‘%.18e‘,delimiter=None)
- frame:文件、字符串或产生器,可以是.gz或.bz2的压缩文件。
- array:存入文件的数组。
- fmt:写入文件的格式,例如:%d %.2f %.18e。
- delimiter:分割字符串,默认是任何空格。
范例:savetxt()保存文件
In [1]: import numpy as np In [2]: a = np.arange(100).reshape(5,20) In [3]: np.savetxt(‘a.csv‘, a, fmt=‘%d‘, delimiter=‘,‘)
"a.csv"文件信息如下:
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59 60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79 80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99
In [4]: np.savetxt(‘a1.csv‘, a, fmt=‘%.1f‘, delimiter=‘,‘)
"a1.csv"文件信息如下:
0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0 20.0,21.0,22.0,23.0,24.0,25.0,26.0,27.0,28.0,29.0,30.0,31.0,32.0,33.0,34.0,35.0,36.0,37.0,38.0,39.0 40.0,41.0,42.0,43.0,44.0,45.0,46.0,47.0,48.0,49.0,50.0,51.0,52.0,53.0,54.0,55.0,56.0,57.0,58.0,59.0 60.0,61.0,62.0,63.0,64.0,65.0,66.0,67.0,68.0,69.0,70.0,71.0,72.0,73.0,74.0,75.0,76.0,77.0,78.0,79.0 80.0,81.0,82.0,83.0,84.0,85.0,86.0,87.0,88.0,89.0,90.0,91.0,92.0,93.0,94.0,95.0,96.0,97.0,98.0,99.0
np.loadtxt(frame, dtype=np.float, delimiter=None, unpack=False)
- frame:文件、字符串或产生器,可以是.gz或.bz2的压缩文件。
- dtype:数据类型,可选。
- delimiter:分割字符串,默认是任何空格。
- unpack:如果True,读入属性将分别写入不同变量。
范例:loadtxt()读取文件
In [5]: b = np.loadtxt(‘a1.csv‘, delimiter=‘,‘) In [6]: b Out[6]: array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.], [ 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39.], [ 40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54., 55., 56., 57., 58., 59.], [ 60., 61., 62., 63., 64., 65., 66., 67., 68., 69., 70., 71., 72., 73., 74., 75., 76., 77., 78., 79.], [ 80., 81., 82., 83., 84., 85., 86., 87., 88., 89., 90., 91., 92., 93., 94., 95., 96., 97., 98., 99.]]) In [7]: b = np.loadtxt(‘a1.csv‘, dtype=np.int, delimiter=‘,‘) In [8]: b Out[8]: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
CSV文件的局限性
CSV只能有效存储一维和二维数组。np.savetxt()、np.loadtxt()只能有效存取一维和二维数组。
多维数据的存取
a.tofile(frame, sep=‘‘, format=‘%s‘)
- frame:文件、字符串。
- sep:数据分割字符串,如果是空串,写入文件为二进制。
- format:写入数据的格式。
范例:tofile()存储多维数据
In [9]: a = np.arange(100).reshape(5,10,2) In [10]: a.tofile(‘b.dat‘, sep=‘,‘, format=‘%d‘)
"b.dat"文件信息如下:
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99
In [11]: a.tofile(‘b1.dat‘, format=‘%d‘)
"b1.dat"文件信息(二进制文件)如下:
0000 0000 0100 0000 0200 0000 0300 0000 0400 0000 0500 0000 0600 0000 0700 0000 0800 0000 0900 0000 0a00 0000 0b00 0000 0c00 0000 0d00 0000 0e00 0000 0f00 0000 1000 0000 1100 0000 1200 0000 1300 0000 1400 0000 1500 0000 1600 0000 1700 0000 1800 0000 1900 0000 1a00 0000 1b00 0000 1c00 0000 1d00 0000 1e00 0000 1f00 0000 2000 0000 2100 0000 2200 0000 2300 0000 2400 0000 2500 0000 2600 0000 2700 0000 2800 0000 2900 0000 2a00 0000 2b00 0000 2c00 0000 2d00 0000 2e00 0000 2f00 0000 3000 0000 3100 0000 3200 0000 3300 0000 3400 0000 3500 0000 3600 0000 3700 0000 3800 0000 3900 0000 3a00 0000 3b00 0000 3c00 0000 3d00 0000 3e00 0000 3f00 0000 4000 0000 4100 0000 4200 0000 4300 0000 4400 0000 4500 0000 4600 0000 4700 0000 4800 0000 4900 0000 4a00 0000 4b00 0000 4c00 0000 4d00 0000 4e00 0000 4f00 0000 5000 0000 5100 0000 5200 0000 5300 0000 5400 0000 5500 0000 5600 0000 5700 0000 5800 0000 5900 0000 5a00 0000 5b00 0000 5c00 0000 5d00 0000 5e00 0000 5f00 0000 6000 0000 6100 0000 6200 0000 6300 0000
b1.dat
np.fromfile(frame, dtype=float, count=-1, sep=‘‘)
- frame:文件、字符串。
- dtype:读取的数据类型。
- count:读取元素个数,-1表示读入整个文件。
- sep:数据分割字符串,如果是空串,写入文件为二进制。
范例:fromfile()函数读取多维数据
In [9]: c = np.fromfile(‘b.dat‘, dtype=np.int, sep=‘,‘) In [10]: c Out[10]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]) In [11]: c = np.fromfile(‘b.dat‘, dtype=np.int, sep=‘,‘).reshape(5,10,2) In [12]: c Out[12]:
array([[[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9], [10, 11], [12, 13], [14, 15], [16, 17], [18, 19]], [[20, 21], [22, 23], [24, 25], [26, 27], [28, 29], [30, 31], [32, 33], [34, 35], [36, 37], [38, 39]], [[40, 41], [42, 43], [44, 45], [46, 47], [48, 49], [50, 51], [52, 53], [54, 55], [56, 57], [58, 59]], [[60, 61], [62, 63], [64, 65], [66, 67], [68, 69], [70, 71], [72, 73], [74, 75], [76, 77], [78, 79]], [[80, 81], [82, 83], [84, 85], [86, 87], [88, 89], [90, 91], [92, 93], [94, 95], [96, 97], [98, 99]]])
Out[12]:
In [13]: c = np.fromfile(‘b1.dat‘,dtype=np.int).reshape(5,10,2)
In [14]: c
Out[14]:
array([[[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9], [10, 11], [12, 13], [14, 15], [16, 17], [18, 19]], [[20, 21], [22, 23], [24, 25], [26, 27], [28, 29], [30, 31], [32, 33], [34, 35], [36, 37], [38, 39]], [[40, 41], [42, 43], [44, 45], [46, 47], [48, 49], [50, 51], [52, 53], [54, 55], [56, 57], [58, 59]], [[60, 61], [62, 63], [64, 65], [66, 67], [68, 69], [70, 71], [72, 73], [74, 75], [76, 77], [78, 79]], [[80, 81], [82, 83], [84, 85], [86, 87], [88, 89], [90, 91], [92, 93], [94, 95], [96, 97], [98, 99]]])
Out[14]:
需要注意:
该方法需要读取时知道存入文件时数组的维度和元素类型。a.tofile()和np.fromfile()需要配合使用。
可以通过元素据文件来存储额外信息。也可以通过文件名来保存数组维度和元素类型(例:b1_int_5_10_2.dat)
Numpy的便捷文件存取
np.save(fname,array) 或 np.savez(fname,array)
- fname:文件名,以.npy为扩展名,压缩扩展名为.npz
- array:数组变量
np.load(fname)
- fname:文件名,以.npy为扩展名,压缩扩展名为.npz
范例:使用save()、load()
In [15]: np.save(‘a.npy‘,a)
"a.npy"文件信息如下:
934e 554d 5059 0100 4600 7b27 6465 7363 7227 3a20 273c 6934 272c 2027 666f 7274 7261 6e5f 6f72 6465 7227 3a20 4661 6c73 652c 2027 7368 6170 6527 3a20 2835 2c20 3130 2c20 3229 2c20 7d20 2020 2020 200a 0000 0000 0100 0000 0200 0000 0300 0000 0400 0000 0500 0000 0600 0000 0700 0000 0800 0000 0900 0000 0a00 0000 0b00 0000 0c00 0000 0d00 0000 0e00 0000 0f00 0000 1000 0000 1100 0000 1200 0000 1300 0000 1400 0000 1500 0000 1600 0000 1700 0000 1800 0000 1900 0000 1a00 0000 1b00 0000 1c00 0000 1d00 0000 1e00 0000 1f00 0000 2000 0000 2100 0000 2200 0000 2300 0000 2400 0000 2500 0000 2600 0000 2700 0000 2800 0000 2900 0000 2a00 0000 2b00 0000 2c00 0000 2d00 0000 2e00 0000 2f00 0000 3000 0000 3100 0000 3200 0000 3300 0000 3400 0000 3500 0000 3600 0000 3700 0000 3800 0000 3900 0000 3a00 0000 3b00 0000 3c00 0000 3d00 0000 3e00 0000 3f00 0000 4000 0000 4100 0000 4200 0000 4300 0000 4400 0000 4500 0000 4600 0000 4700 0000 4800 0000 4900 0000 4a00 0000 4b00 0000 4c00 0000 4d00 0000 4e00 0000 4f00 0000 5000 0000 5100 0000 5200 0000 5300 0000 5400 0000 5500 0000 5600 0000 5700 0000 5800 0000 5900 0000 5a00 0000 5b00 0000 5c00 0000 5d00 0000 5e00 0000 5f00 0000 6000 0000 6100 0000 6200 0000 6300 0000
a.npy
通过读取二进制文件发现np.load()方法除了将数据存放到.npy文件,还增加了额外的信息。
In [16]: b = np.load(‘a.npy‘) In [17]: b Out[17]:
array([[[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9], [10, 11], [12, 13], [14, 15], [16, 17], [18, 19]], [[20, 21], [22, 23], [24, 25], [26, 27], [28, 29], [30, 31], [32, 33], [34, 35], [36, 37], [38, 39]], [[40, 41], [42, 43], [44, 45], [46, 47], [48, 49], [50, 51], [52, 53], [54, 55], [56, 57], [58, 59]], [[60, 61], [62, 63], [64, 65], [66, 67], [68, 69], [70, 71], [72, 73], [74, 75], [76, 77], [78, 79]], [[80, 81], [82, 83], [84, 85], [86, 87], [88, 89], [90, 91], [92, 93], [94, 95], [96, 97], [98, 99]]])
Out[17]:
Numpy的随机数函数
Numpy的random子库
基本格式:np.random.*
np.random.rand()、np.random.randn()、np.random.randint()
np.random的随机数函数
函数 | 说明 |
---|---|
rand(d0,d1, ... ,dn) | 根据d0 - dn 创建随机数组,浮点数,[0,1),均匀分布 |
randn(d0,d1, ... ,dn) | 根据d0 - dn创建随机数组,标准正态分布 |
randint(low,[,high,shape]) | 根据shape创建随机整数或整数数组,范围是[low,high] |
seed(s) | 随机数种子,s是给定的种子值 |
范例:函数测试
In [18]: a = np.random.rand(3,4,5) In [19]: a Out[19]: array([[[ 0.97845512, 0.90466706, 0.92576248, 0.77775142, 0.84334893], [ 0.39599821, 0.31917683, 0.7961439 , 0.01324569, 0.97660396], [ 0.5049603 , 0.80952265, 0.67359257, 0.89334316, 0.94496225], [ 0.04840473, 0.04665257, 0.20956817, 0.62255095, 0.36600489]], [[ 0.58059326, 0.28464266, 0.23596248, 0.16677631, 0.86467069], [ 0.14691968, 0.60863245, 0.71725038, 0.69206766, 0.18301705], [ 0.73197901, 0.99051723, 0.10489076, 0.33979432, 0.0354286 ], [ 0.73696453, 0.48268632, 0.99294233, 0.06285961, 0.93090147]], [[ 0.07853777, 0.827061 , 0.66325364, 0.52289669, 0.96894828], [ 0.41912388, 0.01883408, 0.80978245, 0.93082898, 0.98095581], [ 0.58614214, 0.55996867, 0.37734444, 0.79280598, 0.03626233], [ 0.233132 , 0.22514788, 0.32245147, 0.13739658, 0.18866422]]]) In [20]: sn = np.random.randn(3,4,5) In [21]: sn Out[21]: array([[[-0.54821321, 0.35733947, 0.74102173, -1.26679716, -0.75072289], [ 0.13182283, 2.32578442, -0.52208189, 2.5041796 , -0.96995644], [ 1.00171095, 0.97037733, 1.55386206, -0.94515087, 0.75707273], [-1.2481768 , 0.53095038, 0.92527818, -0.17261088, -0.13667463]], [[ 2.18760173, -0.93813162, 0.19032109, -1.59605908, -0.96802666], [ 0.30649913, 1.32375007, 0.72547761, -1.59253182, -0.72385311], [-2.22923637, -1.05462649, 1.82672301, 0.47343961, -0.9786459 ], [-0.36857965, 0.59003624, 1.80140997, 1.00965744, 1.9037593 ]], [[ 0.36273071, -0.0447364 , 1.27120325, 0.21076423, -0.40820945], [-1.22315321, -1.94670543, 0.17959233, -1.1020581 , 0.17423733], [-1.16368644, 0.00589158, 1.19701291, -0.4255035 , -0.7508364 ], [-1.61788168, 0.50386607, 0.15993032, 0.36881486, -0.41457221]]]) In [22]: b = np.random.randint(100,200,(3,4)) In [23]: b Out[23]: array([[163, 171, 163, 168], [166, 127, 160, 109], [135, 111, 196, 190]]) In [24]: np.random.seed(10) In [25]: np.random.randint(100,200,(3,4)) Out[25]: array([[109, 115, 164, 128], [189, 193, 129, 108], [173, 100, 140, 136]]) In [26]: np.random.seed(10) In [27]: np.random.randint(100,200,(3,4)) Out[27]: array([[109, 115, 164, 128], [189, 193, 129, 108], [173, 100, 140, 136]])
np.random的随机数函数
函数 | 说明 |
---|---|
shuffle(a) | 根据数组a的第1轴进行随机排列,改变数组x |
permutation(a) | 根据数组a的第1轴产生一个新的乱序数组,不改变数组x |
choice(a,[,size,replace,p]) | 从一维数组a中以概率p抽取元素,形成size形状新数组 replace表示是否可以重用元素,默认为False |
范例:函数测试
In [28]: a = np.random.randint(100,200,(3,4)) In [29]: a Out[29]: array([[116, 111, 154, 188], [162, 133, 172, 178], [149, 151, 154, 177]]) In [30]: np.random.shuffle(a) In [31]: a Out[31]: array([[116, 111, 154, 188], [149, 151, 154, 177], [162, 133, 172, 178]]) In [32]: np.random.shuffle(a) In [33]: a Out[33]: array([[162, 133, 172, 178], [116, 111, 154, 188], [149, 151, 154, 177]]) In [34]: a = np.random.randint(100,200,(3,4)) In [35]: a Out[35]: array([[113, 192, 186, 130], [130, 189, 112, 165], [131, 157, 136, 127]]) In [36]: np.random.permutation(a) Out[36]: array([[113, 192, 186, 130], [130, 189, 112, 165], [131, 157, 136, 127]]) In [37]: a Out[37]: array([[113, 192, 186, 130], [130, 189, 112, 165], [131, 157, 136, 127]]) In [38]: b = np.random.randint(100,200,(8,)) In [39]: b Out[39]: array([177, 122, 123, 194, 111, 128, 174, 188]) In [40]: np.random.choice(b,(3,2)) Out[40]: array([[122, 188], [123, 177], [174, 188]]) In [41]: np.random.choice(b,(3,2),replace=False) Out[41]: array([[123, 111], [128, 188], [174, 122]]) In [42]: np.random.choice(b,(3,2),p= b/np.sum(b)) Out[42]: array([[174, 122], [188, 194], [174, 123]])
函数 | 说明 |
---|---|
uniform(low,high,size) | 产生具有均匀分布的数组,low起始值,high结束值,size形状 |
normal(loc,scale,size) | 产生具有正态分布的数组,loc均值,scale标准差,size形状 |
poisson(lam,size) | 产生具有泊松分布的数组,lam随机事件发生率,size形状 |
In [43]: u = np.random.uniform(0,10,(3,4)) In [44]: u Out[44]: array([[ 8.8393648 , 3.25511638, 1.65015898, 3.92529244], [ 0.93460375, 8.21105658, 1.5115202 , 3.84114449], [ 9.44260712, 9.87625475, 4.56304547, 8.26122844]]) In [45]: n = np.random.normal(10,5,(3,4)) In [46]: n Out[46]: array([[ 12.8882903 , 2.6251256 , 10.39394227, 14.59206826], [ 7.5365132 , 10.48231186, 6.73620032, 8.89118781], [ 4.65856717, 3.86153973, 1.00713488, 6.5739633 ]])
NumPy的统计函数
Numpy直接提供的统计类函数
基本格式:np.*
np.std()、np.var()、np.average()
np.random的统计函数
函数 | 说明 |
---|---|
sum(a,axis=None) | 根据给定轴axis计算数组a相关元素之和,axis整数或元组 |
mean(a,axis=None) | 根据给定轴axis计算数组a相关元素的期望,axis整数或元组 |
average(a,axis=None,weights=None) | 根据给定轴axis计算数组a相关元素的加权平均值 |
std(a,axis=None) | 根据给定轴axis计算数组a相关元素的标准差 |
var(a,axis=None) | 根据给定轴axis计算数组a相关元素的方差 |
axis=None是统计函数的标配参数,表示对每个元素进行计算。
In [47]: a = np.arange(15).reshape(3,5) In [48]: a Out[48]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) In [49]: np.sum(a) Out[49]: 105 In [50]: np.mean(a,axis=1) # 2. = (0+5+10)/3 Out[50]: array([ 2., 7., 12.]) In [51]: np.mean(a,axis=0) Out[51]: array([ 5., 6., 7., 8., 9.]) # 7. = (2+7+12)/3 In [52]: np.average(a, axis=0, weights=[10,5,1]) # 加权平均: 4.1875 = (2*10+7*5+1*12)/(10+5+1) Out[52]: array([ 2.1875, 3.1875, 4.1875, 5.1875, 6.1875]) In [53]: np.std(a) Out[53]: 4.3204937989385739 In [54]: np.var(a) Out[54]: 18.666666666666668
函数 | 说明 |
---|---|
min(a) max(a) | 计算数组a中元素的最小值、最大值 |
argmin(a) argmax(a) | 计算数组a中元素最小值、最大值的降一维后下标 |
unravel_index(index,shape) | 根据shape将一维下标index转换成多维下标 |
ptp(a) | 计算数组a中元素最大值与最小值的差 |
median(a) | 计算数组a中元素的中位数(中值) |
In [55]: b = np.arange(15,0,-1).reshape(3,5) In [56]: b Out[56]: array([[15, 14, 13, 12, 11], [10, 9, 8, 7, 6], [ 5, 4, 3, 2, 1]]) In [57]: np.max(b) Out[57]: 15 In [58]: np.argmax(b) # 扁平化后的下标 Out[58]: 0 In [59]: np.unravel_index(np.argmax(b), b.shape) # 重塑成多维下标 Out[59]: (0, 0) In [60]: np.ptp(b) Out[60]: 14 In [61]: np.median(b) Out[61]: 8.0
Numpy的梯度函数
np.random的梯度函数
函数 | 说明 |
np.gradient | 计算数组f中元素的梯度,当f为多维时,返回每个维度梯度 |
梯度:连续值之间的变化率,即斜率。 XY坐标轴连续X坐标对应的Y轴值:a,b,c,其中b的梯度是:(c-a)/2
In [62]: a = np.random.randint(0,20,(5)) In [63]: a Out[63]: array([14, 16, 10, 17, 0]) In [64]: np.gradient(a) # 存在两侧值:-2. = (10-14)/2 Out[64]: array([ 2. , -2. , 0.5, -5. , -17. ]) In [65]: b = np.random.randint(0,20,(5)) In [66]: b Out[66]: array([17, 9, 16, 9, 12]) In [67]: np.gradient(b) # 只有一侧值:-8. = (9-17)/1 Out[67]: array([-8. , -0.5, 0. , -2. , 3. ]) In [68]: c = np.random.randint(0, 50, (3,5)) In [69]: c Out[69]: array([[30, 17, 17, 16, 0], [31, 37, 9, 0, 38], [22, 32, 2, 3, 31]]) In [70]: np.gradient(c) Out[70]: [array([[ 1. , 20. , -8. , -16. , 38. ], [ -4. , 7.5, -7.5, -6.5, 15.5], [ -9. , -5. , -7. , 3. , -7. ]]), array([[-13. , -6.5, -0.5, -8.5, -16. ], [ 6. , -11. , -18.5, 14.5, 38. ], [ 10. , -10. , -14.5, 14.5, 28. ]])]