Spark集群测试

1. Spark Shell测试

Spark Shell是一个特别适合快速开发Spark原型程序的工具,可以帮助我们熟悉Scala语言。即使你对Scala不熟悉,仍然可以使用这一工具。Spark Shell使得用户可以和Spark集群进行交互,提交查询,这便于调试,也便于初学者使用Spark。

测试案例1:

[[email protected] spark]$ MASTER=spark://Master:7077 bin/spark-shell //连接到集群
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/12/01 11:11:03 INFO spark.SecurityManager: Changing view acls to: Spark,
14/12/01 11:11:03 INFO spark.SecurityManager: Changing modify acls to: Spark,
14/12/01 11:11:03 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )
14/12/01 11:11:03 INFO spark.HttpServer: Starting HTTP Server
14/12/01 11:11:03 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/12/01 11:11:03 INFO server.AbstractConnector: Started [email protected]0.0.0.0:36942
14/12/01 11:11:03 INFO util.Utils: Successfully started service ‘HTTP class server‘ on port 36942.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  ‘_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.1.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Type :help for more information.
14/12/01 11:11:10 INFO spark.SecurityManager: Changing view acls to: Spark,
14/12/01 11:11:10 INFO spark.SecurityManager: Changing modify acls to: Spark,
14/12/01 11:11:10 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )
14/12/01 11:11:11 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/12/01 11:11:11 INFO Remoting: Starting remoting
14/12/01 11:11:11 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:45322]
14/12/01 11:11:11 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:45322]
14/12/01 11:11:11 INFO util.Utils: Successfully started service ‘sparkDriver‘ on port 45322.
14/12/01 11:11:11 INFO spark.SparkEnv: Registering MapOutputTracker
14/12/01 11:11:11 INFO spark.SparkEnv: Registering BlockManagerMaster
14/12/01 11:11:12 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141201111112-e9cc
14/12/01 11:11:12 INFO util.Utils: Successfully started service ‘Connection manager for block manager‘ on port 52705.
14/12/01 11:11:12 INFO network.ConnectionManager: Bound socket to port 52705 with id = ConnectionManagerId(Master,52705)
14/12/01 11:11:12 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
14/12/01 11:11:12 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/12/01 11:11:12 INFO storage.BlockManagerMasterActor: Registering block manager Master:52705 with 267.3 MB RAM
14/12/01 11:11:12 INFO storage.BlockManagerMaster: Registered BlockManager
14/12/01 11:11:12 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-87ad77b3-40b1-4320-958f-b1d632f2b4f5
14/12/01 11:11:12 INFO spark.HttpServer: Starting HTTP Server
14/12/01 11:11:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/12/01 11:11:12 INFO server.AbstractConnector: Started [email protected]0.0.0.0:51107
14/12/01 11:11:12 INFO util.Utils: Successfully started service ‘HTTP file server‘ on port 51107.
14/12/01 11:11:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/12/01 11:11:12 INFO server.AbstractConnector: Started [email protected]0.0.0.0:4040
14/12/01 11:11:12 INFO util.Utils: Successfully started service ‘SparkUI‘ on port 4040.
14/12/01 11:11:12 INFO ui.SparkUI: Started SparkUI at http://Master:4040
14/12/01 11:11:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/01 11:11:14 INFO client.AppClient$ClientActor: Connecting to master spark://Master:7077...
14/12/01 11:11:14 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
14/12/01 11:11:14 INFO repl.SparkILoop: Created spark context..
Spark context available as sc.

scala> 14/12/01 11:11:15 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20141201111115-0000
14/12/01 11:11:15 INFO client.AppClient$ClientActor: Executor added: app-20141201111115-0000/0 on worker-20141201031041-Slave1-49261 (Slave1:49261) with 1 cores
14/12/01 11:11:15 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141201111115-0000/0 on hostPort Slave1:49261 with 1 cores, 512.0 MB RAM
14/12/01 11:11:15 INFO client.AppClient$ClientActor: Executor added: app-20141201111115-0000/1 on worker-20141201031041-Slave2-33833 (Slave2:33833) with 1 cores
14/12/01 11:11:15 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141201111115-0000/1 on hostPort Slave2:33833 with 1 cores, 512.0 MB RAM
14/12/01 11:11:15 INFO client.AppClient$ClientActor: Executor updated: app-20141201111115-0000/0 is now RUNNING
14/12/01 11:11:15 INFO client.AppClient$ClientActor: Executor updated: app-20141201111115-0000/1 is now RUNNING
14/12/01 11:11:19 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:41369/user/Executor#-1591583962] with ID 0
14/12/01 11:11:19 INFO storage.BlockManagerMasterActor: Registering block manager Slave1:57062 with 267.3 MB RAM
14/12/01 11:11:19 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:47569/user/Executor#-1622351454] with ID 1
14/12/01 11:11:20 INFO storage.BlockManagerMasterActor: Registering block manager Slave2:52207 with 267.3 MB RAM

scala> val file = sc.textFile("hdfs://Master:9000/data/test1")
14/12/01 11:12:12 INFO storage.MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=280248975
14/12/01 11:12:12 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 267.1 MB)
14/12/01 11:12:12 INFO storage.MemoryStore: ensureFreeSpace(12910) called with curMem=163705, maxMem=280248975
14/12/01 11:12:12 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.6 KB, free 267.1 MB)
14/12/01 11:12:12 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Master:52705 (size: 12.6 KB, free: 267.3 MB)
14/12/01 11:12:12 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
file: org.apache.spark.rdd.RDD[String] = hdfs://Master:9000/data/test1 MappedRDD[1] at textFile at <console>:12

scala> val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
14/12/01 11:12:43 INFO mapred.FileInputFormat: Total input paths to process : 1
count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:14

scala> count.collect()
14/12/01 11:12:59 INFO spark.SparkContext: Starting job: collect at <console>:17
14/12/01 11:12:59 INFO scheduler.DAGScheduler: Registering RDD 3 (map at <console>:14)
14/12/01 11:12:59 INFO scheduler.DAGScheduler: Got job 0 (collect at <console>:17) with 2 output partitions (allowLocal=false)
14/12/01 11:12:59 INFO scheduler.DAGScheduler: Final stage: Stage 0(collect at <console>:17)
14/12/01 11:12:59 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 1)
14/12/01 11:12:59 INFO scheduler.DAGScheduler: Missing parents: List(Stage 1)
14/12/01 11:12:59 INFO scheduler.DAGScheduler: Submitting Stage 1 (MappedRDD[3] at map at <console>:14), which has no missing parents
14/12/01 11:12:59 INFO storage.MemoryStore: ensureFreeSpace(3424) called with curMem=176615, maxMem=280248975
14/12/01 11:12:59 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 267.1 MB)
14/12/01 11:12:59 INFO storage.MemoryStore: ensureFreeSpace(2051) called with curMem=180039, maxMem=280248975
14/12/01 11:12:59 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 267.1 MB)
14/12/01 11:12:59 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Master:52705 (size: 2.0 KB, free: 267.3 MB)
14/12/01 11:12:59 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
14/12/01 11:12:59 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[3] at map at <console>:14)
14/12/01 11:12:59 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
14/12/01 11:12:59 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, Slave2, NODE_LOCAL, 1174 bytes)
14/12/01 11:12:59 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, Slave1, NODE_LOCAL, 1174 bytes)
14/12/01 11:13:00 INFO network.ConnectionManager: Accepted connection from [Slave1/192.168.8.30:43475]
14/12/01 11:13:00 INFO network.SendingConnection: Initiating connection to [Slave1/192.168.8.30:57062]
14/12/01 11:13:00 INFO network.ConnectionManager: Accepted connection from [Slave2/192.168.8.31:43976]
14/12/01 11:13:00 INFO network.SendingConnection: Connected to [Slave1/192.168.8.30:57062], 1 messages pending
14/12/01 11:13:00 INFO network.SendingConnection: Initiating connection to [Slave2/192.168.8.31:52207]
14/12/01 11:13:00 INFO network.SendingConnection: Connected to [Slave2/192.168.8.31:52207], 1 messages pending
14/12/01 11:13:00 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Slave1:57062 (size: 2.0 KB, free: 267.3 MB)
14/12/01 11:13:00 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Slave2:52207 (size: 2.0 KB, free: 267.3 MB)
14/12/01 11:13:00 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Slave1:57062 (size: 12.6 KB, free: 267.3 MB)
14/12/01 11:13:00 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Slave2:52207 (size: 12.6 KB, free: 267.3 MB)
14/12/01 11:13:07 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 8197 ms on Slave2 (1/2)
14/12/01 11:13:07 INFO scheduler.DAGScheduler: Stage 1 (map at <console>:14) finished in 8.626 s
14/12/01 11:13:07 INFO scheduler.DAGScheduler: looking for newly runnable stages
14/12/01 11:13:07 INFO scheduler.DAGScheduler: running: Set()
14/12/01 11:13:07 INFO scheduler.DAGScheduler: waiting: Set(Stage 0)
14/12/01 11:13:07 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 8585 ms on Slave1 (2/2)
14/12/01 11:13:07 INFO scheduler.DAGScheduler: failed: Set()
14/12/01 11:13:07 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
14/12/01 11:13:07 INFO scheduler.DAGScheduler: Missing parents for Stage 0: List()
14/12/01 11:13:07 INFO scheduler.DAGScheduler: Submitting Stage 0 (ShuffledRDD[4] at reduceByKey at <console>:14), which is now runnable
14/12/01 11:13:07 INFO storage.MemoryStore: ensureFreeSpace(2112) called with curMem=182090, maxMem=280248975
14/12/01 11:13:07 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.1 KB, free 267.1 MB)
14/12/01 11:13:07 INFO storage.MemoryStore: ensureFreeSpace(1327) called with curMem=184202, maxMem=280248975
14/12/01 11:13:07 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1327.0 B, free 267.1 MB)
14/12/01 11:13:07 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Master:52705 (size: 1327.0 B, free: 267.3 MB)
14/12/01 11:13:07 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0
14/12/01 11:13:07 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (ShuffledRDD[4] at reduceByKey at <console>:14)
14/12/01 11:13:07 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/12/01 11:13:07 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, Slave2, PROCESS_LOCAL, 948 bytes)
14/12/01 11:13:07 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 3, Slave1, PROCESS_LOCAL, 948 bytes)
14/12/01 11:13:07 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Slave1:57062 (size: 1327.0 B, free: 267.3 MB)
14/12/01 11:13:07 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Slave2:52207 (size: 1327.0 B, free: 267.3 MB)
14/12/01 11:13:08 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [email protected]:36991
14/12/01 11:13:08 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 143 bytes
14/12/01 11:13:08 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [email protected]:50333
14/12/01 11:13:08 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 2) in 149 ms on Slave2 (1/2)
14/12/01 11:13:08 INFO scheduler.DAGScheduler: Stage 0 (collect at <console>:17) finished in 0.179 s
14/12/01 11:13:08 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 3) in 181 ms on Slave1 (2/2)
14/12/01 11:13:08 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
14/12/01 11:13:08 INFO spark.SparkContext: Job finished: collect at <console>:17, took 8.947687849 s
res0: Array[(String, Int)] = Array((spark,1), (hadoop,2), (hbase,1))

scala> 

测试案例2:

运行Spark自带测试程序

[[email protected] spark]$ bin/run-example org.apache.spark.examples.SparkPi 2 spark://192.168.8.29:7077
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/12/01 11:01:24 INFO spark.SecurityManager: Changing view acls to: Spark,
14/12/01 11:01:24 INFO spark.SecurityManager: Changing modify acls to: Spark,
14/12/01 11:01:24 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )
14/12/01 11:01:24 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/12/01 11:01:25 INFO Remoting: Starting remoting
14/12/01 11:01:25 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:60670]
14/12/01 11:01:25 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:60670]
14/12/01 11:01:25 INFO util.Utils: Successfully started service ‘sparkDriver‘ on port 60670.
14/12/01 11:01:25 INFO spark.SparkEnv: Registering MapOutputTracker
14/12/01 11:01:25 INFO spark.SparkEnv: Registering BlockManagerMaster
14/12/01 11:01:25 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141201110125-9987
14/12/01 11:01:25 INFO util.Utils: Successfully started service ‘Connection manager for block manager‘ on port 35768.
14/12/01 11:01:25 INFO network.ConnectionManager: Bound socket to port 35768 with id = ConnectionManagerId(Master,35768)
14/12/01 11:01:25 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
14/12/01 11:01:25 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/12/01 11:01:25 INFO storage.BlockManagerMasterActor: Registering block manager Master:35768 with 267.3 MB RAM
14/12/01 11:01:25 INFO storage.BlockManagerMaster: Registered BlockManager
14/12/01 11:01:25 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-68503776-9126-4e30-89a3-83a560210e14
14/12/01 11:01:25 INFO spark.HttpServer: Starting HTTP Server
14/12/01 11:01:25 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/12/01 11:01:25 INFO server.AbstractConnector: Started [email protected]0.0.0.0:33890
14/12/01 11:01:25 INFO util.Utils: Successfully started service ‘HTTP file server‘ on port 33890.
14/12/01 11:01:26 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/12/01 11:01:26 INFO server.AbstractConnector: Started [email protected]0.0.0.0:4040
14/12/01 11:01:26 INFO util.Utils: Successfully started service ‘SparkUI‘ on port 4040.
14/12/01 11:01:26 INFO ui.SparkUI: Started SparkUI at http://Master:4040
14/12/01 11:01:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/01 11:01:27 INFO spark.SparkContext: Added JAR file:/home/Spark/husor/spark/lib/spark-examples-1.1.0-hadoop2.4.0.jar at http://Master:33890/jars/spark-examples-1.1.0-hadoop2.4.0.jar with timestamp 1417402887362
14/12/01 11:01:27 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://[email protected]:60670/user/HeartbeatReceiver
14/12/01 11:01:27 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:35
14/12/01 11:01:27 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 2 output partitions (allowLocal=false)
14/12/01 11:01:27 INFO scheduler.DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:35)
14/12/01 11:01:27 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/12/01 11:01:27 INFO scheduler.DAGScheduler: Missing parents: List()
14/12/01 11:01:27 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkPi.scala:31), which has no missing parents
14/12/01 11:01:28 INFO storage.MemoryStore: ensureFreeSpace(1728) called with curMem=0, maxMem=280248975
14/12/01 11:01:28 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1728.0 B, free 267.3 MB)
14/12/01 11:01:28 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at map at SparkPi.scala:31)
14/12/01 11:01:28 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/12/01 11:01:28 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1223 bytes)
14/12/01 11:01:28 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
14/12/01 11:01:28 INFO executor.Executor: Fetching http://Master:33890/jars/spark-examples-1.1.0-hadoop2.4.0.jar with timestamp 1417402887362
14/12/01 11:01:28 INFO util.Utils: Fetching http://Master:33890/jars/spark-examples-1.1.0-hadoop2.4.0.jar to /tmp/fetchFileTemp7489373377783107634.tmp
14/12/01 11:01:28 INFO executor.Executor: Adding file:/tmp/spark-ad7b4d7f-9793-406b-b3a9-21bd79fddf9f/spark-examples-1.1.0-hadoop2.4.0.jar to class loader
14/12/01 11:01:28 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 701 bytes result sent to driver
14/12/01 11:01:28 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1223 bytes)
14/12/01 11:01:28 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
14/12/01 11:01:29 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 701 bytes result sent to driver
14/12/01 11:01:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 765 ms on localhost (1/2)
14/12/01 11:01:29 INFO scheduler.DAGScheduler: Stage 0 (reduce at SparkPi.scala:35) finished in 0.936 s
14/12/01 11:01:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 177 ms on localhost (2/2)
14/12/01 11:01:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
14/12/01 11:01:29 INFO spark.SparkContext: Job finished: reduce at SparkPi.scala:35, took 1.3590325 s
Pi is roughly 3.13872
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
14/12/01 11:01:29 INFO ui.SparkUI: Stopped Spark web UI at http://Master:4040
14/12/01 11:01:29 INFO scheduler.DAGScheduler: Stopping DAGScheduler
14/12/01 11:01:30 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
14/12/01 11:01:30 INFO network.ConnectionManager: Selector thread was interrupted!
14/12/01 11:01:30 INFO network.ConnectionManager: ConnectionManager stopped
14/12/01 11:01:30 INFO storage.MemoryStore: MemoryStore cleared
14/12/01 11:01:30 INFO storage.BlockManager: BlockManager stopped
14/12/01 11:01:30 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
14/12/01 11:01:30 INFO spark.SparkContext: Successfully stopped SparkContext
14/12/01 11:01:30 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
14/12/01 11:01:30 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

2. 利用Intellij IDEA(Scala插件)编写相应的Spark程序后进行打包成.jar文件后,提交到Spark集群进行运行

其中,com.husor.Test.WordCount.scala代码如下:

package com.husor.Test

import org.apache.spark.{SparkContext,SparkConf}
import org.apache.spark.SparkContext._

/**
 * Created by huxiu on 2014/11/27.
 */
object WordCount {
  def main(args: Array[String]) {

    println("Test is starting......")

    if (args.length < 2) {
      System.err.println("Usage: HDFS_InputFile <File> HDFS_OutputDir <Directory>")
      System.exit(1)
    }

    //System.setProperty("hadoop.home.dir", "d:\\winutil\\")

    val conf = new SparkConf().setAppName("WordCount")
                              .setSparkHome("SPARK_HOME")

    val spark = new SparkContext(conf)

    //val spark = new SparkContext("local","WordCount")

    val file = spark.textFile(args(0))

    //在控制台上进行输出
    //file.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect().foreach(println)
    //val wordcounts = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)

    val wordCounts = file.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_)
    wordCounts.saveAsTextFile(args(1))
    spark.stop()

    println("Test is Succeed!!!")

  }
}

相应的执行脚本runSpark.sh如下:

#!/bin/bash

set -x

spark-submit --class com.husor.Test.WordCount --master spark://Master:7077 \
--executor-memory 512m --total-executor-cores 1 /home/Spark/husor/spark/SparkTest.jar hdfs://Master:9000/data/test1 \
hdfs://Master:9000/user/huxiu/SparkWordCount

给执行脚本runSpark.sh添加执行权限(chmod +x runSpark.sh),执行过程如下:

[[email protected] spark]$ ./runSpark.sh
+ spark-submit --class com.husor.Test.WordCount --master spark://Master:7077 --executor-memory 512m --total-executor-cores 1 /home/Spark/husor/spark/SparkTest.jar hdfs://Master:9000/data/test1 hdfs://Master:9000/user/huxiu/SparkWordCount
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Test is starting......
14/12/01 12:10:50 INFO spark.SecurityManager: Changing view acls to: Spark,
14/12/01 12:10:50 INFO spark.SecurityManager: Changing modify acls to: Spark,
14/12/01 12:10:50 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )
14/12/01 12:10:50 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/12/01 12:10:50 INFO Remoting: Starting remoting
14/12/01 12:10:51 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:37899]
14/12/01 12:10:51 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:37899]
14/12/01 12:10:51 INFO util.Utils: Successfully started service ‘sparkDriver‘ on port 37899.
14/12/01 12:10:51 INFO spark.SparkEnv: Registering MapOutputTracker
14/12/01 12:10:51 INFO spark.SparkEnv: Registering BlockManagerMaster
14/12/01 12:10:51 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141201121051-6189
14/12/01 12:10:51 INFO util.Utils: Successfully started service ‘Connection manager for block manager‘ on port 34131.
14/12/01 12:10:51 INFO network.ConnectionManager: Bound socket to port 34131 with id = ConnectionManagerId(Master,34131)
14/12/01 12:10:51 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
14/12/01 12:10:51 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/12/01 12:10:51 INFO storage.BlockManagerMasterActor: Registering block manager Master:34131 with 267.3 MB RAM
14/12/01 12:10:51 INFO storage.BlockManagerMaster: Registered BlockManager
14/12/01 12:10:51 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-83b486ec-2237-4f71-be00-0418e485151f
14/12/01 12:10:51 INFO spark.HttpServer: Starting HTTP Server
14/12/01 12:10:51 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/12/01 12:10:51 INFO server.AbstractConnector: Started [email protected]0.0.0.0:34902
14/12/01 12:10:51 INFO util.Utils: Successfully started service ‘HTTP file server‘ on port 34902.
14/12/01 12:10:51 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/12/01 12:10:51 INFO server.AbstractConnector: Started [email protected]0.0.0.0:4040
14/12/01 12:10:51 INFO util.Utils: Successfully started service ‘SparkUI‘ on port 4040.
14/12/01 12:10:51 INFO ui.SparkUI: Started SparkUI at http://Master:4040
14/12/01 12:10:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/01 12:10:52 INFO spark.SparkContext: Added JAR file:/home/Spark/husor/spark/SparkTest.jar at http://Master:34902/jars/SparkTest.jar with timestamp 1417407052941
14/12/01 12:10:53 INFO client.AppClient$ClientActor: Connecting to master spark://Master:7077...
14/12/01 12:10:53 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
14/12/01 12:10:53 INFO storage.MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=280248975
14/12/01 12:10:53 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 267.1 MB)
14/12/01 12:10:53 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20141201121053-0006
14/12/01 12:10:53 INFO client.AppClient$ClientActor: Executor added: app-20141201121053-0006/0 on worker-20141201031041-Slave1-49261 (Slave1:49261) with 1 cores
14/12/01 12:10:53 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141201121053-0006/0 on hostPort Slave1:49261 with 1 cores, 512.0 MB RAM
14/12/01 12:10:54 INFO client.AppClient$ClientActor: Executor updated: app-20141201121053-0006/0 is now RUNNING
14/12/01 12:10:54 INFO storage.MemoryStore: ensureFreeSpace(12910) called with curMem=163705, maxMem=280248975
14/12/01 12:10:54 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.6 KB, free 267.1 MB)
14/12/01 12:10:54 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Master:34131 (size: 12.6 KB, free: 267.3 MB)
14/12/01 12:10:54 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
14/12/01 12:10:54 INFO mapred.FileInputFormat: Total input paths to process : 1
14/12/01 12:10:55 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
14/12/01 12:10:55 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
14/12/01 12:10:55 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
14/12/01 12:10:55 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
14/12/01 12:10:55 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
14/12/01 12:10:55 INFO spark.SparkContext: Starting job: saveAsTextFile at WordCount.scala:35
14/12/01 12:10:55 INFO scheduler.DAGScheduler: Registering RDD 3 (map at WordCount.scala:34)
14/12/01 12:10:55 INFO scheduler.DAGScheduler: Got job 0 (saveAsTextFile at WordCount.scala:35) with 2 output partitions (allowLocal=false)
14/12/01 12:10:55 INFO scheduler.DAGScheduler: Final stage: Stage 0(saveAsTextFile at WordCount.scala:35)
14/12/01 12:10:55 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 1)
14/12/01 12:10:55 INFO scheduler.DAGScheduler: Missing parents: List(Stage 1)
14/12/01 12:10:55 INFO scheduler.DAGScheduler: Submitting Stage 1 (MappedRDD[3] at map at WordCount.scala:34), which has no missing parents
14/12/01 12:10:55 INFO storage.MemoryStore: ensureFreeSpace(3400) called with curMem=176615, maxMem=280248975
14/12/01 12:10:55 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 267.1 MB)
14/12/01 12:10:55 INFO storage.MemoryStore: ensureFreeSpace(2055) called with curMem=180015, maxMem=280248975
14/12/01 12:10:55 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 267.1 MB)
14/12/01 12:10:55 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Master:34131 (size: 2.0 KB, free: 267.3 MB)
14/12/01 12:10:55 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
14/12/01 12:10:55 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[3] at map at WordCount.scala:34)
14/12/01 12:10:55 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
14/12/01 12:10:57 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:38410/user/Executor#898843507] with ID 0
14/12/01 12:10:57 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, Slave1, NODE_LOCAL, 1222 bytes)
14/12/01 12:10:57 INFO storage.BlockManagerMasterActor: Registering block manager Slave1:44906 with 267.3 MB RAM
14/12/01 12:10:58 INFO network.ConnectionManager: Accepted connection from [Slave1/192.168.8.30:43149]
14/12/01 12:10:58 INFO network.SendingConnection: Initiating connection to [Slave1/192.168.8.30:44906]
14/12/01 12:10:58 INFO network.SendingConnection: Connected to [Slave1/192.168.8.30:44906], 1 messages pending
14/12/01 12:10:58 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Slave1:44906 (size: 2.0 KB, free: 267.3 MB)
14/12/01 12:10:58 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Slave1:44906 (size: 12.6 KB, free: 267.3 MB)
14/12/01 12:10:59 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, Slave1, NODE_LOCAL, 1222 bytes)
14/12/01 12:11:00 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 159 ms on Slave1 (1/2)
14/12/01 12:11:00 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 2454 ms on Slave1 (2/2)
14/12/01 12:11:00 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
14/12/01 12:11:00 INFO scheduler.DAGScheduler: Stage 1 (map at WordCount.scala:34) finished in 4.444 s
14/12/01 12:11:00 INFO scheduler.DAGScheduler: looking for newly runnable stages
14/12/01 12:11:00 INFO scheduler.DAGScheduler: running: Set()
14/12/01 12:11:00 INFO scheduler.DAGScheduler: waiting: Set(Stage 0)
14/12/01 12:11:00 INFO scheduler.DAGScheduler: failed: Set()
14/12/01 12:11:00 INFO scheduler.DAGScheduler: Missing parents for Stage 0: List()
14/12/01 12:11:00 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[5] at saveAsTextFile at WordCount.scala:35), which is now runnable
14/12/01 12:11:00 INFO storage.MemoryStore: ensureFreeSpace(57552) called with curMem=182070, maxMem=280248975
14/12/01 12:11:00 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 56.2 KB, free 267.0 MB)
14/12/01 12:11:00 INFO storage.MemoryStore: ensureFreeSpace(19863) called with curMem=239622, maxMem=280248975
14/12/01 12:11:00 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.4 KB, free 267.0 MB)
14/12/01 12:11:00 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Master:34131 (size: 19.4 KB, free: 267.2 MB)
14/12/01 12:11:00 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0
14/12/01 12:11:00 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[5] at saveAsTextFile at WordCount.scala:35)
14/12/01 12:11:00 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/12/01 12:11:00 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, Slave1, PROCESS_LOCAL, 996 bytes)
14/12/01 12:11:00 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Slave1:44906 (size: 19.4 KB, free: 267.2 MB)
14/12/01 12:11:00 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [email protected]:51850
14/12/01 12:11:00 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 133 bytes
14/12/01 12:11:00 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 3, Slave1, PROCESS_LOCAL, 996 bytes)
14/12/01 12:11:00 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 2) in 412 ms on Slave1 (1/2)
14/12/01 12:11:00 INFO scheduler.DAGScheduler: Stage 0 (saveAsTextFile at WordCount.scala:35) finished in 0.710 s
14/12/01 12:11:00 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 3) in 308 ms on Slave1 (2/2)
14/12/01 12:11:00 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
14/12/01 12:11:00 INFO spark.SparkContext: Job finished: saveAsTextFile at WordCount.scala:35, took 5.556490798 s
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
14/12/01 12:11:00 INFO ui.SparkUI: Stopped Spark web UI at http://Master:4040
14/12/01 12:11:00 INFO scheduler.DAGScheduler: Stopping DAGScheduler
14/12/01 12:11:00 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors
14/12/01 12:11:00 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down
14/12/01 12:11:01 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(Slave1,44906)
14/12/01 12:11:01 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(Slave1,44906)
14/12/01 12:11:01 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(Slave1,44906)
14/12/01 12:11:02 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
14/12/01 12:11:02 INFO network.ConnectionManager: Selector thread was interrupted!
14/12/01 12:11:02 INFO network.ConnectionManager: ConnectionManager stopped
14/12/01 12:11:02 INFO storage.MemoryStore: MemoryStore cleared
14/12/01 12:11:02 INFO storage.BlockManager: BlockManager stopped
14/12/01 12:11:02 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
14/12/01 12:11:02 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
14/12/01 12:11:02 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
14/12/01 12:11:02 INFO spark.SparkContext: Successfully stopped SparkContext
Test is Succeed!!!
14/12/01 12:11:02 INFO Remoting: Remoting shut down
14/12/01 12:11:02 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
[[email protected] spark]$ hdfs dfs -cat /user/huxiu/SparkWordCount/part-00001
14/12/01 12:11:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
(spark,1)
(hadoop,2)
(hbase,1)
[[email protected] spark]$ hdfs dfs -ls /user/huxiu/SparkWordCount/
14/12/01 12:11:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
-rw-r--r--   2 Spark huxiu          0 2014-12-01 12:11 /user/huxiu/SparkWordCount/_SUCCESS
-rw-r--r--   2 Spark huxiu          0 2014-12-01 12:11 /user/huxiu/SparkWordCount/part-00000
-rw-r--r--   2 Spark huxiu         31 2014-12-01 12:11 /user/huxiu/SparkWordCount/part-00001
[[email protected] spark]$ hdfs dfs -cat /user/huxiu/SparkWordCount/part-00000
14/12/01 12:11:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Note:

运行过程中可能会出现 Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory异常,而内存肯定是够的,但就是无法获取资源!检查防火墙,果然客户端只开启的对80端口的访问,其他都禁止了!

Solution:

关闭各节点上的防火墙(service iptables stop),然后在Spark on yarn集群上执行上述脚本runSpark.sh即可

时间: 2024-11-05 21:36:37

Spark集群测试的相关文章

spark集群与spark HA高可用快速部署 spark研习第一季

1.spark 部署 标签: spark 0 apache spark项目架构 spark SQL -- spark streaming -- MLlib -- GraphX 0.1 hadoop快速搭建,主要利用hdfs存储框架 下载hadoop-2.6.0,解压,到etc/hadoop/目录下 0.2 快速配置文件 cat core-site.xml <configuration> <property> <name>fs.defaultFS</name>

Spark集群框架搭建【VM15+CentOS7+Hadoop+Scala+Spark+Zookeeper+HBase+Hive】

目录 1 目的 2 准备工作 3 安装过程 3.1 在虚拟机中安装CentOS7 3.1.1 虚拟机设置 3.1.2 安装Linux系统 3.2 JAVA环境 3.2.1 卸载Linux自带的jdk 3.2.2 下载并安装最新版本的jdk 3.2.3 环境变量设置 3.3 SSH免密登陆 3.3.1 准备工作 3.3.2 设置免密登陆 3.4 Hadoop2.7.2安装及集群配置 3.4.1 Hadoop安装 3.4.2 伪分布式集群配置 3.4.3 启动hadoop 3.5 Spark安装及环

Spark集群搭建与测试(DT大数据梦工厂)

Spark流行的两种文件存储方式:1.Hadoop的HDFS:2.H3云存储 tux yarn  +HDFS是未来3.5年的趋势 看你用的是bash,可能ubuntu里的bash不会自动source /etc/profile,所以你将那条export命令放在~/.bashrc里试试 计算的集群和数据存储的集群不在同一个集群上的话,性能不高不可接受,tux yarn解决了这个问题,它用JAVA写的 ubuntu 设置root登录见http://jingyan.baidu.com/article/1

使用docker安装部署Spark集群来训练CNN(含Python实例)

使用docker安装部署Spark集群来训练CNN(含Python实例) 本博客仅为作者记录笔记之用,不免有很多细节不对之处. 还望各位看官能够见谅,欢迎批评指正. 博客虽水,然亦博主之苦劳也. 如需转载,请附上本文链接,不甚感激! http://blog.csdn.net/cyh_24/article/details/49683221 实验室有4台神服务器,每台有8个tesla-GPU,然而平时做实验都只使用了其中的一个GPU,实在暴遣天物! 于是想用spark来把这些GPU都利用起来.听闻d

Spark3000门徒第六课精通Spark集群搭建总结

今晚听了王家林老师的第六课精通Spark集群搭建和测试,课后作业是:搭建自己的spark环境并成功运行pi,我的总结如下: 1 硬件环境: 至少8GB内存,推荐金士顿内存,虚拟机推荐Ubuntu kylin版本,可以安装各种办公软件包括搜狗输入法.上网方式:Nat,root权限登录,避免权限问题 2.软件环境: RedHat 6.4  spark 1.6.0   hadoop 2.6.0   scala 2.11.8 3 /etc/hosts ip-hostname对应关系 spark.even

实验室中搭建Spark集群和PyCUDA开发环境

1.安装CUDA 1.1安装前工作 1.1.1选取实验器材 实验中的每台计算机均装有双系统.选择其中一台计算机作为master节点,配置有GeForce GTX 650显卡,拥有384个CUDA核心.另外两台计算机作为worker节点,一个配置有GeForce GTX 650显卡,另外一个配置有GeForce GTX 750 Ti显卡,拥有640个CUDA核心. 在每台计算机均创建hadoop用户并赋予root权限,本文所有的操作都将在hadoop用户下进行. 1.1.2安装前准备 用以下命令来

spark集群搭建

Spark集群环境搭建 2015年09月27日中秋节,祝中秋快乐团圆 1安装jdk 略 2安装scala 在http://www.scala-lang.org scala官网下载安装包,这里以scala-2.11.7为例: 1)下载scala-2.11.7.tgz 2)在目录下解压缩: tar -xzvf scala-2.11.7.tgz 3)配置环境变量,在/etc/profile中添加以下的内容: export SCALA_HOME=实际安装路径 export PATH=${SCALA_HO

Spark教程-构建Spark集群(1)

对于90%以上想学习Spark的人而言,如何构建Spark集群是其最大的难点之一,为了解决大家构建Spark集群的一切困难,家林把Spark集群的构建分为了四个步骤,从零起步,不需要任何前置知识,涵盖操作的每一个细节,构建完整的Spark集群. 从零起步,构建Spark集群经典四部曲: 第一步:搭建Hadoop单机和伪分布式环境: 第二步:构造分布式Hadoop集群: 第三步:构造分布式的Spark集群: 第四步:测试Spark集群: 本文内容为构建Spark集群经典四部曲的第一步,从零起步构建

在Docker中从头部署自己的Spark集群

由于自己的电脑配置普普通通,在VM虚拟机中搭建的集群规模也就是6个节点左右,再多就会卡的不行 碰巧接触了Docker这种轻量级的容器虚拟化技术,理论上在普通PC机上搭建的集群规模可以达到很高(具体能有多少个也没有实际测试过) 于是就准备在Docker上搭建Spark集群 由于是Docker新手,在操作过程中遇到了不少麻烦 刚开始在网上找的资料都是直接从DockerHub上拉取别人已经建好的镜像使用 问题多多,下载速度慢,下载异常,运行异常,配置异常等等等等... 好不容易下载了一个可以用的镜像,