1.spark运行模式有4种:
a.local 多有用测试,
b. standalone:spark 集群模式,使用spark自己的调度方式。
c. Yarn: 对MapreduceV1升级的经典版本,支持spark。
d.Mesos:类似Yarn的资源调度框架,提供了有效的、跨分布式应用或框架的资源隔离和共享,可以运行hadoop、spark等框架
2.spark local 模式(shell )
Spark local模式(shell运行) windows: 执行spark-shell.cmd Linux: 执行spark-shell 参数指定: ? MASTER=local[4] ADD_JARS=code.jar ./spark-shell ? MASTER=spark://host:port ? 指定executor内存:export SPARK_MEM=25g
3. spark standalone 模式
Spark standalone加载数据(shell运行spark-shell) 读取本地文件: var file = sc.textFile("/root/test.txt").collect 加载远程hdfs文件: var files = sc.textFile("hdfs://192.168.2.2:8020/user/superman").collect (读取hdfs数据时使用的还是inputFormat) standalone WordCount sc.textFile("/root/test.txt").flatMap(_.split("\\t")).map(x=>(x,1)) .reduceByKey(_+_).collect
.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, "Courier New", courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }
Spark standalone保存结果集数据 保存数据到本地: result.saveAsTextFile("/root/tmp") (tmp文件夹必须不存在) 保存数据到远程hdfs文件: result.saveAsTextFile("hdfs://crxy165:8020/user/superman/tmp") (tmp文件夹必须不存在) 设置输出结果集文件数量: result.repartition(1).saveAsTextFile 任务提交 spark-submit (推荐) 其它也可?,如sbt run, java -jar 等等.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, "Courier New", courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }
.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, "Courier New", courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }
4.RDD,可恢复分布式数据集,弹性分布式数据集