How to convert matrix to RDD[Vector] in spark

The matrix is generated from SVD, and I am using the results from SVD to do clustering analysis.

if your clustering only supports RDD as its input, here‘s how you can do the transformation

  def toRDD(sc :SparkContext,m: Matrix): RDD[Vector] = {
        val columns: Iterator[Array[Double]] = m.toArray.grouped(m.numRows)
//        val rows: Seq[Array[Double]] = columns.toSeq // Skip this if you want a column-major RDD.
        val rows: Seq[Seq[Double]] = columns.toSeq.transpose // Skip this if you want a column-major RDD.
        val vectors: Seq[DenseVector] = rows.map(row => new DenseVector(row.toArray))
        sc.parallelize(vectors)

  

时间: 2024-08-03 15:36:33

How to convert matrix to RDD[Vector] in spark的相关文章

Python Theano TypeError: Cannot convert Type TensorType(float64, vector) (of Variable Subtensor{int64:int64:}.0) into Type TensorType(float64, matrix)

参考: https://groups.google.com/forum/#!topic/theano-users/teA-07wOFpE 这个问题出现的原因是,我在读文件的时候,应该Train_X读成matrix(rows * dimensions),Train_Y读成vector(因为只有label一维) 因为才接触python第一天,所以误以为column=1的matrix就是vector(汗汗汗!) 话不多说,直接贴代码: train_X_matrix = numpy.empty((tra

大数据技术之_19_Spark学习_02_Spark Core 应用解析+ RDD 概念 + RDD 编程 + 键值对 RDD + 数据读取与保存主要方式 + RDD 编程进阶 + Spark Core 实例练习

第1章 RDD 概念1.1 RDD 为什么会产生1.2 RDD 概述1.2.1 什么是 RDD1.2.2 RDD 的属性1.3 RDD 弹性1.4 RDD 特点1.4.1 分区1.4.2 只读1.4.3 依赖1.4.4 缓存1.4.5 CheckPoint第2章 RDD 编程2.1 RDD 编程模型2.2 RDD 创建2.2.1 由一个已经存在的 Scala 集合创建,即集合并行化(测试用)2.2.2 由外部存储系统的数据集创建(开发用)2.3 RDD 编程2.3.1 Transformatio

RDD之七:Spark容错机制

引入 一般来说,分布式数据集的容错性有两种方式:数据检查点和记录数据的更新. 面向大规模数据分析,数据检查点操作成本很高,需要通过数据中心的网络连接在机器之间复制庞大的数据集,而网络带宽往往比内存带宽低得多,同时还需要消耗更多的存储资源. 因此,Spark选择记录更新的方式.但是,如果更新粒度太细太多,那么记录更新成本也不低.因此,RDD只支持粗粒度转换,即只记录单个块上执行的单个操作,然后将创建RDD的一系列变换序列(每个RDD都包含了他是如何由其他RDD变换过来的以及如何重建某一块数据的信息

[Leetcode] spiral matrix 螺旋矩阵

Given a matrix of m x n elements (m rows, n columns), return all elements of the matrix in spiral order. For example,Given the following matrix: [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] You should return[1,2,3,6,9,8,7,4,5]. 题意:以螺旋的方式,顺时针访问数组. 思路:按照遍

[email protected] [329] Longest Increasing Path in a Matrix (DFS + 记忆化搜索)

https://leetcode.com/problems/longest-increasing-path-in-a-matrix/ Given an integer matrix, find the length of the longest increasing path. From each cell, you can either move to four directions: left, right, up or down. You may NOT move diagonally o

LeetCode OJ:Spiral Matrix(螺旋矩阵)

Given a matrix of m x n elements (m rows, n columns), return all elements of the matrix in spiral order. For example,Given the following matrix: [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] You should return [1,2,3,6,9,8,7,4,5]. 典型的dfs问题,与前面一个问题比较像,代码如下

LeetCode54/59 Spiral Matrix I/II

一:Spiral Matrix I 题目: Given a matrix of m x n elements (m rows, n columns), return all elements of the matrix in spiral order. For example, Given the following matrix: [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] You should return [1,2,3,6,9,8,7,4,5]. 链

【Lintcode】038.Search a 2D Matrix II

题目: Write an efficient algorithm that searches for a value in an m x n matrix, return the occurrence of it. This matrix has the following properties: Integers in each row are sorted from left to right. Integers in each column are sorted from up to bo

Spark编程模型及RDD操作

转载自:http://blog.csdn.net/liuwenbo0920/article/details/45243775 1. Spark中的基本概念 在Spark中,有下面的基本概念.Application:基于Spark的用户程序,包含了一个driver program和集群中多个executorDriver Program:运行Application的main()函数并创建SparkContext.通常SparkContext代表driver programExecutor:为某App