SciPy - sparse module

http://blog.csdn.net/pipisorry/article/details/41762945

一、Sparse Matrix Storage Formats

对于很多元素为零的稀疏矩阵,仅存储非零元素可使矩阵操作效率更高。

现有许多种稀疏矩阵的存储方式,但是多数采用相同的基本技术,即存储矩阵所有的非零元素到一个线性数组中,并提供辅助数组来描述原数组中非零元素的位置。

1. Coordinate Format (COO)

这种存储方式的主要优点是灵活、简单。仅存储非零元素以及每个非零元素的坐标。

使用3个数组进行存储:values, rows, andcolumn

values: 实数或复数数据,包括矩阵中的非零元素, 顺序任意。
rows: 数据所处的行。
columns: 数据所处的列.

参数:矩阵中非零元素的数量 nnz,3个数组的长度均为nnz.

2. Diagonal Storage Format (DIA)

If the sparse matrix has diagonals containing only zero elements, then the diagonal storage format can be used to reduce the amount of information needed to locate the non-zero elements. This storage format is particularly useful
in many applications where the matrix arises from a finite element or finite difference discretization.

The Intel MKL diagonal storage format is specified by two arrays:values anddistance, and two parameters:ndiag, which is the number of non-empty diagonals, andlval, which is the declared
leading dimension in the calling (sub)programs.

values

A real or complex two-dimensional array is dimensioned aslval byndiag. Each column of it contains the non-zero elements of certain diagonal ofA. The key point of the storage is that each element
invalues retains the row number of the original matrix. To achieve this diagonals in the lower triangular part of the matrix are padded from the top, and those in the upper triangular part are padded from the bottom. Note that the value ofdistance(i)
is the number of elements to be padded for diagonali.

distance

An integer array with dimension ndiag. Elementi of the arraydistance is the distance betweeni-diagonal and the main diagonal. The distance is positive if the diagonal is above the main
diagonal, and negative if the diagonal is below the main diagonal. The main diagonal has a distance equal to zero.

3. Compressed Sparse Row Format (CSR)

The Intel MKL compressed sparse row (CSR) format is specified by four arrays: thevalues,columns,pointerB, andpointerE. The following table describes the arrays in terms of the values,
row, and column positions of the non-zero elements in a sparse matrixA.

values

A real or complex array that contains the non-zero elements ofA. Values of the non-zero elements ofA are mapped into thevalues array using the row-major storage mapping described above.

columns

Element i of the integer array columns is the number of the column inA that contains thei-th value in thevalues array.

pointerB

Element j of this integer array gives the index of the element in thevalues array that is first non-zero element in a rowj ofA. Note that this index is equal topointerB(j)
-pointerB(1)+1 .

pointerE

An integer array that contains row indices, such thatpointerE(j)-pointerB(1) is the index of the element in thevalues array that is last non-zero element
in a row j of A.

4. Compressed Sparse Column Format (CSC)

The compressed sparse column format (CSC) is similar to the CSR format, but the columns are used instead the rows. In other words, the CSC format is identical to the CSR format for the transposed matrix. The CSR format is specified
by four arrays: values, columns, pointerB, and
pointerE
. The following table describes the arrays in terms of the values, row, and column positions of the non-zero elements in a sparse matrixA.

values

A real or complex array that contains the non-zero elements ofA. Values of the non-zero elements ofA are mapped into thevalues array using the column-major storage mapping.

rows

Element i of the integer array rows is the number of the row inA that contains thei-th value in thevalues array.

pointerB

Element j of this integer array gives the index of the element in thevalues array that is first non-zero element in a columnj ofA. Note that this index is equal topointerB(j)
-pointerB(1)+1 .

pointerE

An integer array that contains column indices, such thatpointerE(j)-pointerB(1) is the index of the element in thevalues array that is last non-zero element
in a column j ofA.

5. Skyline Storage Format

The skyline storage format is important for the direct sparse solvers, and it is well suited for Cholesky or LU decomposition when no pivoting is required.

The skyline storage format accepted in Intel MKL can store only triangular matrix or triangular part of a matrix. This format is specified by two arrays:values andpointers. The following table describes
these arrays:

values

A scalar array. For a lower triangular matrix it contains the set of elements from each row of the matrix starting from the first non-zero element to and including the diagonal element. For an upper triangular matrix it contains
the set of elements from each column of the matrix starting with the first non-zero element down to and including the diagonal element. Encountered zero elements are included in the sets.

pointers

An integer array with dimension
(m+1), where m is the number of rows for lower triangle (columns for the upper triangle).pointers(i) -pointers(1)+1 gives the index of element invalues that
is first non-zero element in row (column)i. The value ofpointers(m+1) is set tonnz+pointers(1), wherennz is the number of elements in
the arrayvalues.

6. Block Compressed Sparse Row Format (BSR)

The Intel MKL block compressed sparse row (BSR) format for sparse matrices is specified by four arrays:values,columns,pointerB, andpointerE. The following table describes these arrays.

values

A real array that contains the elements of the non-zero blocks of a sparse matrix. The elements are stored block-by-block in row-major order. A non-zero block is the block that contains at least one non-zero element. All elements
of non-zero blocks are stored, even if some of them is equal to zero. Within each non-zero block elements are stored in column-major order in the case of one-based indexing, and in row-major order in the case of the zero-based indexing.

columns

Element i of the integer array columns is the number of the column in the block matrix that contains thei-th non-zero block.

pointerB

Element j of this integer array gives the index of the element in thecolumns array that is first non-zero block in a rowj of the block matrix.

pointerE

Element j of this integer array gives the index of the element in thecolumns array that contains the last non-zero block in a rowj of the block matrix plus 1.

7.  ELLPACK (ELL)

8. Hybrid (HYB)

由ELL+COO两种格式结合而成。

二、选择稀疏矩阵存储格式的一些经验:

1. DIA和ELL格式在进行稀疏矩阵-矢量乘积(sparse matrix-vector products)时效率最高,所以它们是应用迭代法(如共轭梯度法)解稀疏线性系统最快的格式;

2. COO和CSR格式比起DIA和ELL来,更加灵活,易于操作;

3. ELL的优点是快速,而COO优点是灵活,二者结合后的HYB格式是一种不错的稀疏矩阵表示格式;

4. 根据Nathan Bell的工作:

CSR格式在存储稀疏矩阵时非零元素平均使用的字节数(Bytes per Nonzero Entry)最为稳定(float类型约为8.5,double类型约为12.5)

而DIA格式存储数据的非零元素平均使用的字节数与矩阵类型有较大关系,适合于StructuredMesh结构的稀疏矩阵(float类型约为4.05,double类型约为8.10)

对于Unstructured Mesh以及Random Matrix,DIA格式使用的字节数是CSR格式的十几倍;

5. 一些线性代数计算库:COO格式常用于从文件中进行稀疏矩阵的读写,如matrix market即采用COO格式,而CSR格式常用于读入数据后进行稀疏矩阵计算。

Sparse
Matrix Representations & Iterative Solvers, Lesson 1 by Nathan Bell

稀疏线性系统 Sparse Linear Systems

Intel MKL 库中使用的稀疏矩阵格式

三、sparse matrix不同的存储形式在sparse模块中对应如下:

bsr_matrix(arg1[, shape, dtype,copy, blocksize]) Block Sparse Row matrix

coo_matrix(arg1[, shape, dtype,copy]) A sparse matrix in COOrdinate format.

csc_matrix(arg1[, shape, dtype,copy]) Compressed Sparse Column matrix

csr_matrix(arg1[, shape, dtype,copy]) Compressed Sparse Row matrix

dia_matrix(arg1[, shape, dtype,copy]) Sparse matrix with DIAgonal storage

dok_matrix(arg1[, shape, dtype,copy]) Dictionary Of Keys based sparse matrix.

lil_matrix(arg1[, shape, dtype,copy]) Row-based linked list sparse matrix

四、sparse matrix相关操作

将普通的非稀疏矩阵变为相应存储形式的稀疏矩阵

以coo_matrix为例:

1 A =coo_matrix([[1,2],[3,4]])

2 按照相应存储形式的要求构建矩阵:

>>>row  = np.array([0,0,1,3,1,0,0])

>>>col  = np.array([0,2,1,3,1,0,0])

>>>data = np.array([1,1,1,1,1,1,1])

>>>coo_matrix((data, (row,col)), shape=(4,4)).todense()

matrix([[3, 0, 1,0],

[0, 2, 0,0],

[0, 0, 0,0],

[0, 0, 0,1]])

将稀疏矩阵横向或者纵向合并

>>>from scipy.sparse import coo_matrix, vstack

>>>A = coo_matrix([[1,2],[3,4]])

>>>B = coo_matrix([[5,6]])

>>>vstack( [A,B] ).todense()

matrix([[1, 2],

[3,4],

[5,6]])

如果A和B的数据形式不一样,不能合并。一个矩阵中的数据格式必须是相同的。

diags函数可以建立稀疏的对角矩阵

对于大多数(似乎只处了coo之外)稀疏矩阵的存储格式,都可以进行slice操作,比如对于csc,csr。也可以进行arithmeticoperations,矩阵的加减乘除,速度很快。

取矩阵的指定列数,比如取矩阵的第1,3,8列:matrix[:,[0,2,7]]???

sparce矩阵的读取:可以像常规矩阵一样通过下标读取。也可以通过getrow(i),gecol(i)读取特定的列或者特定的行,以及nonzero()读取非零元素的位置。

sparse模块的官方document:http://docs.scipy.org/doc/scipy/reference/sparse.html

from:

http://blog.csdn.net/pipisorry/article/details/41762945

ref:

http://blog.sina.com.cn/s/blog_6a90ae320101aavg.html

时间: 2024-11-05 13:51:12

SciPy - sparse module的相关文章

Python scipy.sparse矩阵使用方法

本文以csr_matrix为例来说明sparse矩阵的使用方法,其他类型的sparse矩阵可以参考https://docs.scipy.org/doc/scipy/reference/sparse.html csr_matrix是Compressed Sparse Row matrix的缩写组合,下面介绍其两种初始化方法 csr_matrix((data, (row_ind, col_ind)), [shape=(M, N)]) where data, row_ind and col_ind s

Python SciPy Sparse模块学习笔记

1. sparse模块的官方document地址:http://docs.scipy.org/doc/scipy/reference/sparse.html 2. sparse matrix的存储形式有很多种,见此帖子http://blog.csdn.net/anshan1984/article/details/8580952 不同的存储形式在sparse模块中对应如下: bsr_matrix(arg1[, shape, dtype, copy, blocksize]) Block Sparse

python稀疏矩阵得到每列最大k项的值,对list内为类对象的排序(scipy.sparse.csr.csr_matrix)

print(train_set.tdm) print(type(train_set.tdm)) 输出得到: (0, 3200) 0.264940780338 (0, 1682) 0.356545827856 (0, 3875) 0.404535449364 (0, 2638) 0.375094236628 (0, 2643) 0.420086333071 (0, 558) 0.332314202381 (0, 2383) 0.215711023304 (0, 3233) 0.3048846436

1.5 Scipy:高级科学计算

医药统计项目可联系 QQ:231469242 http://www.kancloud.cn/wizardforcel/scipy-lecture-notes/129867 作者:Adrien Chauve, Andre Espaze, Emmanuelle Gouillart, Ga?l Varoquaux, Ralf Gommers Scipy scipy包包含许多专注于科学计算中的常见问题的工具箱.它的子模块对应于不同的应用,比如插值.积分.优化.图像处理.统计和特殊功能等. scipy可以

scipy的使用过程bug调试

C:\SoftApplication\Anaconda3\python.exe E:/pycharmprojects/test03/test01.py Traceback (most recent call last): File "E:/pycharmprojects/test03/test01.py", line 2, in <module> from scipy.integrate import quad,dblquad,nquad File "C:\Sof

『Python』Numpy学习指南第十章_高端科学计算库scipy入门(系列完结)

简介: scipy包包含致力于科学计算中常见问题的各个工具箱.它的不同子模块相应于不同的应用.像插值,积分,优化,图像处理,,特殊函数等等. scipy可以与其它标准科学计算程序库进行比较,比如GSL(GNU C或C++科学计算库),或者Matlab工具箱.scipy是Python中科学计算程序的核心包;它用于有效地计算numpy矩阵,来让numpy和scipy协同工作. 在实现一个程序之前,值得检查下所需的数据处理方式是否已经在scipy中存在了.作为非专业程序员,科学家总是喜欢重新发明造轮子

python中scipy学习——随机稀疏矩阵及操作

1.生成随机稀疏矩阵: scipy中生成随机稀疏矩阵的函数如下: scipy.sparse.rand(m,n,density,format,dtype,random_state) 1 参数介绍: 参数 含义 m,n 整型:表示矩阵的行和列 density 实数类型:表示矩阵的稀疏度 format str类型:表示矩阵的类型:如format='coo' dtype dtype;表示返回矩阵值的类型 ranom_state {numpy.random.RandomState,int};可选的随机种子

scipy构建稀疏矩阵

from scipy.sparse import csr_matrix import numpy as np indptr = np.array([0, 2, 3, 6]) indices = np.array([0, 2, 2, 0, 1, 2]) data = np.array([1, 2, 3, 4, 5, 6]) #表示要构建稀疏矩阵的数据 #按照行来压缩, #方法:第i行(本例中i=0,1,2), #非零数据列的索引为indices[indptr[i]:indptr[i+1]] #非零

ImportError: DLL load failed: %1 不是有效的 Win32 应用程序。

from matplotlib import pyplot as pltfrom sklearn.datasets import load_irisimport numpy as np data=load_iris()feature_names=data['featrue_names']target=data['target']for t.marker,c in zip(xrange(3),">ox","rgb"):    plt.scatter(featur