1、创建项目sp
create new project->scala->NOT SBT->next->设置项目名称‘sp‘
2、导入相关jar包
File->Project Structure->Libraries->点绿色‘+‘->java->找到spark-assembly-1.0.0-hadoop2.2.0.jar->OK
按照类似的方法导入scala-compiler.jar, scala-library.jar, scala-reflect.jar //这些位于scala的安装目录下的lib目录
3、创建scala源文件HdfsWC.scala
import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ class HdfsWC { def main(args: Array[String]) { val sc = new SparkContext(args(0)/*"yarn-standalone"*/,"myWordCount",System.getenv("SPARK_HOME"),null) //List("lib/spark-assembly_2.10-0.9.0-incubating-hadoop1.0.4.jar") val logFile = sc.textFile(args(1))//"hdfs://master:9101/user/root/spam.data") // Should be some file on your system // val file = sc.textFile("D:\\test.txt") val counts = logFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) // println(counts) counts.saveAsTextFile(args(2)/*"hdfs://master:9101/user/root/out"*/) } }
4、配置artifacts
File->Project Structure->Artifacts->点绿色‘+‘->jar->From modules ...->在Main Classes中点按钮“....”选中HdfsWC->OK
如下图所示将Extracted xxxx选中,点红色‘-‘,将这些移除->OK
5、编译项目
Build->Make Project
6、打包:
Build->Build Artifacts->Build
7、在上图的Output directory中找到sp.jar
Interlij 13编译Spark程序生成jar包,布布扣,bubuko.com
时间: 2024-10-05 03:45:07