环境:
centos7
hadoop 2.7.3
java 1.8
scala
下载:
http://spark.apache.org
解压到安装目录
可以自由选择,我安装到hadoop同一目录
配置:(cd spark安装目录/conf)
cp log4j.properties.template log4j.properties cp spark-env.sh.template spark-env.sh cp slaves.template slaves
在spark-env.sh文件后面添加如下信息指定hadoop和spark、scala环境
export SPARK_DIST_CLASSPATH=$(/home/hadoop/hadoop-2.7.3/bin/hadoop classpath) export SPARK_HOME=/home/hadoop/spark export SCALA_HOME=/home/hadoop/scala
在slaves 文件末尾添加 slave机器
启动:
sbin/start-master.sh sbin/start-slaves.sh
查看spark是否运行:
http://yourIp:8080
运行实例application
(主机url在http://yourIp:8080显示)
bin/spark-shell --matser spark://master:7077
[[email protected] spark]$ bin/spark-shell --master spark://master:7077 Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/06/06 04:01:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/06/06 04:01:29 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Spark context Web UI available at http://10.12.1.102:4040 Spark context available as ‘sc‘ (master = spark://master:7077, app id = app-20170606040119-0002). Spark session available as ‘spark‘. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ ‘_/ /___/ .__/\_,_/_/ /_/\_\ version 2.1.1 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112) Type in expressions to have them evaluated. Type :help for more information. scala>
官方示例:http://spark.apache.org/docs/latest/quick-start.html
scala> var textfile=sc.textFile("hdfs://master:9000/user/lihb/in/*.log") textfile: org.apache.spark.rdd.RDD[String] = hdfs://master:9000/user/lihb/in/*.log MapPartitionsRDD[1] at textFile at <console>:24 scala> textfile.first() res5: String = #Software: IIS Advanced Logging Module scala> textfile.count() res7: Long = 32583 scala> val wordCounts=textfile.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey((a,b)=>a+b) wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:26 scala> wordCounts.collect() res8: Array[(String, Int)] = Array((/space/attentionto/99335/,1), (01:41:27.777,1), (01:45:... scala>
时间: 2024-11-06 07:20:48