稀疏矩阵的存储格式(转)

  此文转自一博文

  更详细资料可从百度云下载。

  对于很多元素为零的稀疏矩阵,仅存储非零元素可使矩阵操作效率更高。现有许多种稀疏矩阵的存储方式,但是多数采用相同的基本技术,即存储矩阵所有的非零元素到一个线性数组中,并提供辅助数组来描述原数组中非零元素的位置。

  以下是几种常见的稀疏矩阵存储格式:

  1. Coordinate Format (COO)

  

  这种存储方式的主要优点是灵活、简单。仅存储非零元素以及每个非零元素的坐标。

  使用3个数组进行存储:valuesrows, andcolumn

  values: 实数或复数数据,包括矩阵中的非零元素, 顺序任意。
  rows: 数据所处的行。
  columns: 数据所处的列.

  这用一个结构体就可以表示。

  2. Diagonal Storage Format (DIA)

  

  If the sparse matrix has diagonals containing only zero elements, then the diagonal storage format can be used to reduce the amount of information needed to locate the non-zero elements. This storage format is particularly useful in many applications where the matrix arises from a finite element or finite difference discretization.

  The Intel MKL diagonal storage format is specified by two arrays:values anddistance, and two parameters:ndiag, which is the number of non-empty diagonals, andlval, which is the declared leading dimension in the calling (sub)programs.

  values

A real or complex two-dimensional array is dimensioned aslval byndiag. Each column of it contains the non-zero elements of certain diagonal ofA. The key point of the storage is that each element invalues retains the row number of the original matrix. To achieve this diagonals in the lower triangular part of the matrix are padded from the top, and those in the upper triangular part are padded from the bottom. Note that the value ofdistance(i) is the number of elements to be padded for diagonali.

  distance

An integer array with dimension ndiag. Elementi of the arraydistance is the distance betweeni-diagonal and the main diagonal. The distance is positive if the diagonal is above the main diagonal, and negative if the diagonal is below the main diagonal. The main diagonal has a distance equal to zero.

  3. Compressed Sparse Row Format (CSR)

  

  The Intel MKL compressed sparse row (CSR) format is specified by four arrays: thevalues,columns,pointerB, andpointerE. The following table describes the arrays in terms of the values, row, and column positions of the non-zero elements in a sparse matrixA.

  values

A real or complex array that contains the non-zero elements ofA. Values of the non-zero elements ofA are mapped into thevalues array using the row-major storage mapping described above.

  columns

Element i of the integer array columns is the number of the column inA that contains thei-th value in thevalues array.

  pointerB

Element j of this integer array gives the index of the element in thevalues array that is first non-zero element in a rowj ofA. Note that this index is equal topointerB(j) -pointerB(1)+1 .

  pointerE

An integer array that contains row indices, such thatpointerE(j)-pointerB(1) is the index of the element in thevalues array that is last non-zero element in a row j of A.

  4. Compressed Sparse Column Format (CSC)

  The compressed sparse column format (CSC) is similar to the CSR format, but the columns are used instead the rows. In other words, the CSC format is identical to the CSR format for the transposed matrix. The CSR format is specified by four arrays: valuescolumnspointerB, and pointerE. The following table describes the arrays in terms of the values, row, and column positions of the non-zero elements in a sparse matrixA.

  values

A real or complex array that contains the non-zero elements ofA. Values of the non-zero elements ofA are mapped into thevalues array using the column-major storage mapping.

  rows

Element i of the integer array rows is the number of the row inA that contains thei-th value in thevalues array.

  pointerB

Element j of this integer array gives the index of the element in thevalues array that is first non-zero element in a columnj ofA. Note that this index is equal topointerB(j) -pointerB(1)+1 .

  pointerE

An integer array that contains column indices, such thatpointerE(j)-pointerB(1) is the index of the element in thevalues array that is last non-zero element in a column j ofA.

  5. Skyline Storage Format

  The skyline storage format is important for the direct sparse solvers, and it is well suited for Cholesky or LU decomposition when no pivoting is required.

The skyline storage format accepted in Intel MKL can store only triangular matrix or triangular part of a matrix. This format is specified by two arrays:values andpointers. The following table describes these arrays:

  values

A scalar array. For a lower triangular matrix it contains the set of elements from each row of the matrix starting from the first non-zero element to and including the diagonal element. For an upper triangular matrix it contains the set of elements from each column of the matrix starting with the first non-zero element down to and including the diagonal element. Encountered zero elements are included in the sets.

  pointers

An integer array with dimension (m+1), where m is the number of rows for lower triangle (columns for the upper triangle).pointers(i) -pointers(1)+1 gives the index of element invalues that is first non-zero element in row (column)i. The value ofpointers(m+1) is set tonnz+pointers(1), wherennz is the number of elements in the arrayvalues.

  6. Block Compressed Sparse Row Format (BSR)

  The Intel MKL block compressed sparse row (BSR) format for sparse matrices is specified by four arrays:values,columns,pointerB, andpointerE. The following table describes these arrays.

  values

A real array that contains the elements of the non-zero blocks of a sparse matrix. The elements are stored block-by-block in row-major order. A non-zero block is the block that contains at least one non-zero element. All elements of non-zero blocks are stored, even if some of them is equal to zero. Within each non-zero block elements are stored in column-major order in the case of one-based indexing, and in row-major order in the case of the zero-based indexing.

  columns

Element i of the integer array columns is the number of the column in the block matrix that contains thei-th non-zero block.

  pointerB

Element j of this integer array gives the index of the element in thecolumns array that is first non-zero block in a rowj of the block matrix.

  pointerE

Element j of this integer array gives the index of the element in thecolumns array that contains the last non-zero block in a rowj of the block matrix plus 1.

  7.  ELLPACK (ELL)

  

  8. Hybrid (HYB)

  

  由ELL+COO两种格式结合而成。

  选择稀疏矩阵存储格式的一些经验:

  1. DIA和ELL格式在进行稀疏矩阵-矢量乘积(sparse matrix-vector products)时效率最高,所以它们是应用迭代法(如共轭梯度法)解稀疏线性系统最快的格式;

  2. COO和CSR格式比起DIA和ELL来,更加灵活,易于操作;

  3. ELL的优点是快速,而COO优点是灵活,二者结合后的HYB格式是一种不错的稀疏矩阵表示格式;

  4. 根据Nathan Bell的工作,CSR格式在存储稀疏矩阵时非零元素平均使用的字节数(Bytes per Nonzero Entry)最为稳定(float类型约为8.5,double类型约为12.5),而DIA格式存储数据的非零元素平均使用的字节数与矩阵类型有较大关系,适合于StructuredMesh结构的稀疏矩阵(float类型约为4.05,double类型约为8.10),对于Unstructured Mesh以及Random Matrix,DIA格式使用的字节数是CSR格式的十几倍;

  5. 从我使用过的一些线性代数计算库来说,COO格式常用于从文件中进行稀疏矩阵的读写,如matrix market即采用COO格式,而CSR格式常用于读入数据后进行稀疏矩阵计算。

时间: 2024-10-11 15:57:16

稀疏矩阵的存储格式(转)的相关文章

基于稀疏矩阵的k近邻(KNN)实现

一.概述 这里我们先来看看当我们的数据是稀疏时,如何用稀疏矩阵的特性为KNN算法加速.KNN算法在之前的博文中有提到,当时写的测试程序是针对稠密矩阵数据的.但实际上我们也会遇到不少的稀疏数据,而且有很多是有意而为之的,因为稀疏数据具有稠密数据无法媲美的存储和计算特性,这对工程应用中的内存需求和实时需求是很重要的.所以这里我们也来关注下稀疏矩阵的存储和其在knn算法中的应用举例. 大家都知道,所谓稀疏矩阵,就是很多元素为零的矩阵,或者说矩阵中非零元素的个数远远小于矩阵元素的总数,如上图.如果我们只

SciPy - sparse module

http://blog.csdn.net/pipisorry/article/details/41762945 一.Sparse Matrix Storage Formats 对于很多元素为零的稀疏矩阵,仅存储非零元素可使矩阵操作效率更高. 现有许多种稀疏矩阵的存储方式,但是多数采用相同的基本技术,即存储矩阵所有的非零元素到一个线性数组中,并提供辅助数组来描述原数组中非零元素的位置. 1. Coordinate Format (COO) 这种存储方式的主要优点是灵活.简单.仅存储非零元素以及每个

Python SciPy Sparse模块学习笔记

1. sparse模块的官方document地址:http://docs.scipy.org/doc/scipy/reference/sparse.html 2. sparse matrix的存储形式有很多种,见此帖子http://blog.csdn.net/anshan1984/article/details/8580952 不同的存储形式在sparse模块中对应如下: bsr_matrix(arg1[, shape, dtype, copy, blocksize]) Block Sparse

稀疏矩阵存储格式总结+存储效率对比:COO,CSR,DIA,ELL,HYB

转载请注明出处:Bin的专栏,http://blog.csdn.net/xbinworld 稀疏矩阵是指矩阵中的元素大部分是0的矩阵,事实上,实际问题中大规模矩阵基本上都是稀疏矩阵,很多稀疏度在90%甚至99%以上.因此我们需要有高效的稀疏矩阵存储格式.本文总结几种典型的格式:COO,CSR,DIA,ELL,HYB. (1)Coordinate(COO) 这是最简单的一种格式,每一个元素需要用一个三元组来表示,分别是(行号,列号,数值),对应上图右边的一列.这种方式简单,但是记录单信息多(行列)

转载:稀疏矩阵存储格式总结+存储效率对比:COO,CSR,DIA,ELL,HYB

http://www.cnblogs.com/xbinworld/p/4273506.html 稀疏矩阵是指矩阵中的元素大部分是0的矩阵,事实上,实际问题中大规模矩阵基本上都是稀疏矩阵,很多稀疏度在90%甚至99%以上.因此我们需要有高效的稀疏矩阵存储格式.本文总结几种典型的格式:COO,CSR,DIA,ELL,HYB. (1)Coordinate(COO) 这是最简单的一种格式,每一个元素需要用一个三元组来表示,分别是(行号,列号,数值),对应上图右边的一列.这种方式简单,但是记录单信息多(行

如何高效存储稀疏矩阵?

为了节省存储空间并且加快并行程序处理速度,需要对稀疏矩阵进行压缩存储,压缩存储的原则是:不重复存储相同元素:不存储零值元素.常用的几种矩阵的存储格式如下:COO,CSR,DIA,ELL,HYB等:一般情况下,稀疏矩阵指的是元素大部分是0的矩阵(有些资料定义非零元素不超过5%的矩阵,为稀疏矩阵);所以如何高效存储以及其格式如何确定,往往会影响并行程序的运行效率. 接下来介绍几种存储格式: (一)Coordinate(COO) 这种存储格式比较简单易懂,每一个元素需要用一个三元组来表示,分别是(行号

scipy稀疏矩阵

那些零元素数目远远多于非零元素数目,并且非零元素的分布没有规律的矩阵称为稀疏矩阵(sparse matrix). 不同类型的矩阵有不同的压缩方式,比如对角矩阵只存储对角元素即可.要想充分压缩,就要找到数据的特点. 压缩算法也有很多种,如:音频压缩算法.视频压缩算法.通用压缩算法.不同压缩算法有不同的使用领域,一般专用领域的压缩算法效率高于通用压缩算法.因为专用领域压缩算法抓住了数据的特点. 本文主要介绍scipy提供的八种稀疏矩阵存储格式. 坐标存储 Coordinate Format (COO

稀疏矩阵 part 5

? 目前为止能跑的所有代码及其结果(2019年2月24日),之后添加:DIA 乘法 GPU 版:其他维度的乘法(矩阵乘矩阵):其他稀疏矩阵格式之间的相互转化 1 #include <stdio.h> 2 #include <stdlib.h> 3 #include <memory.h> 4 #include <malloc.h> 5 #include <math.h> 6 #include <time.h> 7 #include &q

hive文件存储格式

hive在建表是,可以通过'STORED AS FILE_FORMAT' 指定存储文件格式 例如: [plain] view plain copy > CREATE EXTERNAL TABLE MYTEST(num INT, name STRING) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' > STORED AS TEXTFILE > LOCATION '/data/test'; 指定文件存储格式为"TEXTFI