Spark Distributed matrix 分布式矩阵

RowMatrix行矩阵

import org.apache.spark.rdd.RDD
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.linalg.distributed.RowMatrix

val df1 = Seq(
     |       (1.0, 2.0, 3.0),
     |       (1.1, 2.1, 3.1),
     |       (1.2, 2.2, 3.2)).toDF("c1", "c2", "c3")
df1: org.apache.spark.sql.DataFrame = [c1: double, c2: double ... 1 more field]

df1.show
+---+---+---+
| c1| c2| c3|
+---+---+---+
|1.0|2.0|3.0|
|1.1|2.1|3.1|
|1.2|2.2|3.2|
+---+---+---+

// DataFrame转换成RDD[Vector]
val rowsVector= df1.rdd.map {
     |       x =>
     |         Vectors.dense(
     |           x(0).toString().toDouble,
     |           x(1).toString().toDouble,
     |           x(2).toString().toDouble)
     |     }
rowsVector: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] = MapPartitionsRDD[4] at map

// Create a RowMatrix from an RDD[Vector].
val mat1: RowMatrix = new RowMatrix(rowsVector)
mat1: org.apache.spark.mllib.linalg.distributed.RowMatrix = [email protected]

// Get its size.
val m = mat1.numRows()
m: Long = 3                                                                     

val n = mat1.numCols()
n: Long = 3

// 将RowMatrix转换成DataFrame
val resDF = mat1.rows.map {
     |       x =>
     |         (x(0).toDouble,
     |           x(1).toDouble,
     |           x(2).toDouble)
     |     }.toDF("c1", "c2", "c3")
resDF: org.apache.spark.sql.DataFrame = [c1: double, c2: double ... 1 more field]

resDF.show
+---+---+---+
| c1| c2| c3|
+---+---+---+
|1.0|2.0|3.0|
|1.1|2.1|3.1|
|1.2|2.2|3.2|
+---+---+---+

mat1.rows.collect().take(10)
res3: Array[org.apache.spark.mllib.linalg.Vector] = Array([1.0,2.0,3.0], [1.1,2.1,3.1], [1.2,2.2,3.2])

CoordinateMatrix坐标矩阵

import org.apache.spark.rdd.RDD
import org.apache.spark.mllib.linalg.distributed.{CoordinateMatrix, MatrixEntry}

// 第一列：行坐标；第二列：列坐标；第三列：矩阵元素
val df = Seq(
     |       (0, 0, 1.1), (0, 1, 1.2), (0, 2, 1.3),
     |       (1, 0, 2.1), (1, 1, 2.2), (1, 2, 2.3),
     |       (2, 0, 3.1), (2, 1, 3.2), (2, 2, 3.3),
     |       (3, 0, 4.1), (3, 1, 4.2), (3, 2, 4.3)).toDF("row", "col", "value")
df: org.apache.spark.sql.DataFrame = [row: int, col: int ... 1 more field]

df.show
+---+---+-----+
|row|col|value|
+---+---+-----+
|  0|  0|  1.1|
|  0|  1|  1.2|
|  0|  2|  1.3|
|  1|  0|  2.1|
|  1|  1|  2.2|
|  1|  2|  2.3|
|  2|  0|  3.1|
|  2|  1|  3.2|
|  2|  2|  3.3|
|  3|  0|  4.1|
|  3|  1|  4.2|
|  3|  2|  4.3|
+---+---+-----+

// 生成入口矩阵
val entr = df.rdd.map { x =>
     |       val a = x(0).toString().toLong
     |       val b = x(1).toString().toLong
     |       val c = x(2).toString().toDouble
     |       MatrixEntry(a, b, c)
     |     }
entr: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.distributed.MatrixEntry] = MapPartitionsRDD[20] at map

// 生成坐标矩阵
val mat: CoordinateMatrix = new CoordinateMatrix(entr)
mat: org.apache.spark.mllib.linalg.distributed.CoordinateMatrix = [email protected]eec

mat.numRows()
res5: Long = 4                                                                  

mat.numCols()
res6: Long = 3

mat.entries.collect().take(10)
res7: Array[org.apache.spark.mllib.linalg.distributed.MatrixEntry] = Array(MatrixEntry(0,0,1.1), MatrixEntry(0,1,1.2), MatrixEntry(0,2,1.3), MatrixEntry(1,0,2.1), MatrixEntry(1,1,2.2), MatrixEntry(1,2,2.3), MatrixEntry(2,0,3.1), MatrixEntry(2,1,3.2), MatrixEntry(2,2,3.3), MatrixEntry(3,0,4.1))

// 坐标矩阵转成，带行索引的DataFrame，行索引为行坐标
val t = mat.toIndexedRowMatrix().rows.map { x =>
     |       val v=x.vector
     |       (x.index,v(0).toDouble, v(1).toDouble, v(2).toDouble)
     |     }
t: org.apache.spark.rdd.RDD[(Long, Double, Double, Double)] = MapPartitionsRDD[33] at map

t.toDF().show
+---+---+---+---+
| _1| _2| _3| _4|
+---+---+---+---+
|  0|1.1|1.2|1.3|
|  1|2.1|2.2|2.3|
|  2|3.1|3.2|3.3|
|  3|4.1|4.2|4.3|
+---+---+---+---+

// 坐标矩阵转成DataFrame
val t1 = mat.toRowMatrix().rows.map { x =>
     |       (x(0).toDouble, x(1).toDouble, x(2).toDouble)
     |     }
t1: org.apache.spark.rdd.RDD[(Double, Double, Double)] = MapPartitionsRDD[26] at map

t1.toDF().show
+---+---+---+
| _1| _2| _3|
+---+---+---+
|1.1|1.2|1.3|
|3.1|3.2|3.3|
|2.1|2.2|2.3|
|4.1|4.2|4.3|
+---+---+---+

时间： 2024-11-05 12:34:10

Spark Distributed matrix 分布式矩阵的相关文章

基于Spark的异构分布式深度学习平台

导读:本文介绍百度基于Spark的异构分布式深度学习系统,把Spark与深度学习平台PADDLE结合起来解决PADDLE与业务逻辑间的数据通路问题,在此基础上使用GPU与FPGA异构计算提升每台机器的数据处理能力,使用YARN对异构资源做分配,支持Multi-Tenancy,让资源的使用更有效. 深层神经网络技术最近几年取得了巨大的突破,特别在语音和图像识别应用上有质的飞跃,已经被验证能够使用到许多业务上.如何大规模分布式地执行深度学习程序,使其更好地支持不同的业务线成为当务之急.在过去两年,百

hdu 4965 Fast Matrix Calculation(矩阵快速幂)

题目链接:hdu 4965 Fast Matrix Calculation 题目大意:给定两个矩阵A,B,分别为N*K和K*N: 矩阵C = A*B 矩阵M=CN?N 将矩阵M中的所有元素取模6,得到新矩阵M' 计算矩阵M'中所有元素的和解题思路:因为矩阵C为N*N的矩阵,N最大为1000,就算用快速幂也超时,但是因为C = A*B, 所以CN?N=ABAB-AB=AC′N?N?1B,C' = B*A, 为K*K的矩阵,K最大为6,完全可以接受. #include <cstdio> #inc

HDU 4920 Matrix multiplication(矩阵相乘)

各种TEL,233啊.没想到是处理掉0的情况就可以过啊.一直以为会有极端数据.没想到竟然是这样的啊..在网上看到了一个AC的神奇的代码,经典的矩阵乘法,只不过把最内层的枚举,移到外面就过了啊...有点不理解啊,复杂度不是一样的吗.. Matrix multiplication Time Limit: 4000/2000 MS (Java/Others) Memory Limit: 131072/131072 K (Java/Others) Total Submission(s): 640

HDU4920 Matrix multiplication 矩阵

不要问我为什么过了窝也不造为什么就过了 #include <stdio.h> #include <string.h> #include <stdlib.h> #include <limits.h> #include <malloc.h> #include <ctype.h> #include <math.h> #include <string> #include<iostream> #inclu

[CareerCup] 1.7 Set Matrix Zeroes 矩阵赋零

1.7 Write an algorithm such that if an element in an MxN matrix is 0, its entire row and column are set to 0. LeetCode中的原题,请参见我之前的博客Set Matrix Zeroes 矩阵赋零.

hdu 4920 Matrix multiplication(矩阵坑题)

http://acm.hdu.edu.cn/showproblem.php?pid=4920 被这道题虐了一下午,啥也不说了.继续矩阵吧. 超时就超在每步取余上,要放在最后取余,再者注意三个循环的次序. #include <stdio.h> #include <map> #include <set> #include <stack> #include <queue> #include <vector> #include <cma

css3 matrix 2D矩阵和canvas transform 2D矩阵

一看到“2D矩阵”这个高大上的名词,有的同学可能会有种畏惧感,“矩阵”,看起来好高深的样子,我还是看点简单的吧.其实本文就很简单,你只需要有一点点css3 transform的基础就好. 没有前戏,直奔主题 2D矩阵指的是元素在2D平面内发生诸如缩放.平移.旋转.拉伸四种变化,在css3中对应4个方法分别是scale().translate().rotate()和skew(),可以说这4个方法是css3矩阵matrix的快捷方式,因为这4个方法本质都是由matrix实现的.类似地,在canvas

HDU 4965 Fast Matrix Calculation (矩阵快速幂取模----矩阵相乘满足结合律)

http://acm.hdu.edu.cn/showproblem.php?pid=4965 利用相乘的可结合性先算B*A,得到6*6的矩阵,利用矩阵快速幂取模即可水过. 1 #include<iostream> 2 #include<stdio.h> 3 #include<iostream> 4 #include<stdio.h> 5 #define N 1010 6 #define M 1010 7 #define K 6 8 using namespa

POJ 3318 Matrix Multiplication(矩阵乘法)

题目链接题意 : 给你三个n维矩阵,让你判断A*B是否等于C. 思路 :优化将二维转化成一维的.随机生成一个一维向量d,使得A*(B*d)=C*d,多次生成多次测试即可使错误概率大大减小. 1 //3318 2 #include <stdio.h> 3 #include <string.h> 4 #include <time.h> 5 #include <stdlib.h> 6 #include <iostream> 7 8 using nam