spark on yarn运行产生缺jar包错误及解决办法

1、本地运行出错及解决办法

当运行如下命令时:

./bin/spark-submit   --class org.apache.spark.examples.mllib.JavaALS   --master local[*]   /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-yarn/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar   /user/data/netflix_rating 10 10 /user/data/result

会出现如下错误:

Exception in thread "main" java.lang.RuntimeException: java.io.IOException: No FileSystem for scheme: hdfs
        at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:657)
        at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:389)
        at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)
        at org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546)
        at org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546)
        at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:145)
        at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:145)
        at scala.Option.map(Option.scala:145)
        at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:145)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:168)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
        at org.apache.spark.mllib.recommendation.ALS.run(ALS.scala:167)
        at org.apache.spark.mllib.recommendation.ALS$.train(ALS.scala:599)
        at org.apache.spark.mllib.recommendation.ALS.train(ALS.scala)
        at org.apache.spark.examples.mllib.JavaALS.main(JavaALS.java:80)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: No FileSystem for scheme: hdfs
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2385)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
        at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:653)
        ... 34 more

出现此错误的原因为spark执行过程中缺少hadoop-hdfs的jar包,使用spark-submit中的--jar或者--driver-class-path参数可以解决此问题。当使用hadoop-hdfs时路径指的就是hdfs路径。

正确的执行方式如下:

./bin/spark-submit   --class org.apache.spark.examples.mllib.JavaALS   --driver-class-path /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-hdfs/hadoop-hdfs-2.3.0-cdh5.1.2.jar   --master local[*]   /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-yarn/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar   /user/data/netflix_rating 10 10 /user/data/result
或者
./bin/spark-submit   --class org.apache.spark.examples.mllib.JavaALS   --jars /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-hdfs/hadoop-hdfs-2.3.0-cdh5.1.2.jar   --master local[*]   /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-yarn/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar   /user/data/netflix_rating 10 10 /user/data/result

2、spark在yarn上运行错误及解决办法

当运行如下命令时:

./bin/spark-submit   --class org.apache.spark.examples.mllib.JavaALS   --master yarn-cluster   /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-yarn/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar   /user/data/netflix_rating 10 10 /user/data/result

会出现如下错误:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/client/api/impl/YarnClientImpl
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
        at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at org.apache.spark.util.Utils$$anonfun$classIsLoadable$1.apply(Utils.scala:143)
        at org.apache.spark.util.Utils$$anonfun$classIsLoadable$1.apply(Utils.scala:143)
        at scala.util.Try$.apply(Try.scala:161)
        at org.apache.spark.util.Utils$.classIsLoadable(Utils.scala:143)
        at org.apache.spark.deploy.SparkSubmit$.createLaunchEnv(SparkSubmit.scala:158)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:54)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.client.api.impl.YarnClientImpl
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        ... 21 more

产生此错误的原因是缺少hadoop-yarn目录下的jar包,解决此问题的方法只能使用--driver-class-path参数,原因是执行spark on yarn时,需要提前将hadoop-yarn目录下的jar包导入。

正确的执行方式如下:

./bin/spark-submit   --class org.apache.spark.examples.mllib.JavaALS   --master yarn-cluster   --driver-class-path $(echo /opt/cloudera/parcels/CDH/lib/hadoop-yarn/*.jar |sed 's/ /:/g'):/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-hdfs/hadoop-hdfs-2.3.0-cdh5.1.2.jar   /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-yarn/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar   /user/data/netflix_rating 10 10 /user/data/result

执行结果集如下图所示

/user/data/result/productFeatures/part-00000数据格式为:

22,[5.561720883259194, 1.8295046510786157, 1.456597387276617, 0.8851233321058966, -0.6750794769961516, 0.2105431165110079, 1.868136268816477, -0.7426684616337039, -0.5856268982634872, 2.2788288132587358]
31,[0.9093801231293616, 0.31519780093777366, 1.1875509370524693, 0.40381375438624073, 2.518833489342341, 1.4242427194658087, 2.0950977044322574, 0.9012256614215569, 1.1604700989497398, 0.15791920617498328]
76,[1.8285525546730474, 0.6330058247735413, 2.5686801366906984, 1.4128062599776998, 1.401816974160943, 0.1596137900376602, 1.5625150218484072, -0.9678843308247949, 2.682242352514027, 1.0599465865866935]
152,[0.014905493368344078, 0.43308346940343456, 0.2351848253710811, 0.26220235713374834, 0.055210836978533295, 0.21723689234341548, 0.09391052568889097, 0.7231946368850907, 0.02497671848923523, 0.5022350772242716]
206,[-0.5501117679008718, 0.4105849318486638, 1.0876481291363873, 2.233025299808942, 2.1038565118723387, 1.662798954470802, 1.575332336431819, 0.8167712158963146, 1.4536436809654083, -0.5224582242822096]

/user/data/result/userFeatures/part-00000数据格式为:

22,[0.18595332423070562, 0.26223861694267697, 0.2220917583718615, 0.015729079507204886, 0.4450456773474982, 0.12287125816024044, 0.4644319181495295, 0.38377345920108646, 0.28428991637647794, 0.17875507467819415]
31,[0.15133710263843259, -0.02354886937021699, 0.10618787396390789, 0.03258147800653979, 0.3556889855610244, 1.021110467423965, 0.3701959855785832, 0.1524124835894395, 0.23381646690418442, -0.012011907243505829]
76,[0.2344438657777155, 0.03821305024729112, 0.230093903321136, 0.48888224387617607, 0.30121869825786685, 0.48198504753122795, 0.29543641416718835, 0.39299434584620146, 0.27798068299013984, 0.15611605797193095]
121,[0.2038917971256244, 0.7576071991072084, 0.30603993855416245, 0.41995044224403344, 0.06550681386608997, 0.20395370870960078, 0.3444359097858106, 0.4935457123179016, 0.2041119263872145, 0.3518582534508109]
130,[0.042995762604581524, -0.21177745644812881, 0.7047019111940551, 0.44978429350262916, 0.18912686527984246, 0.6349887274906566, 0.29651737861710675, 0.49758500548973844, 0.02699514514764544, 0.39330900998421187]
152,[1.9989336762046868, 1.2185456627280438, -0.14465791504370654, 0.32972894935630664, -0.6316151112173617, -0.5568528040594881, 0.007477525352213408, -0.012087520291972442, 0.4184613236246099, -0.24669307203702268]

3、对于spark on hadoop执行中出行的问题需要去hadoop yarn对应的job日志中去看

时间: 2024-08-08 01:24:00

spark on yarn运行产生缺jar包错误及解决办法的相关文章

maven下载flex依赖的jar包失败,解决办法

在开发过程中,有的时候会碰到个别jar包在maven仓库中未提供的情况.比如flex相关的一些jar包. 如果自己公司没有私服,则可以通过自己下载jar包放在项目中,如: 再在pom.xml中添加本地依赖: <dependency>       <groupId>javabuilder</groupId>       <artifactId>javabuilder</artifactId>       <version>1.0<

ant jsch.jar - 一个错误及解决办法

ant jsch.jar -- 一个异常及解决方法 运行build.xml出现异常: Cause: Could not load a dependent class com/jcraft/jsch/Logger It is not enough to have Ant's optional JARs you need the JAR files that the optional tasks depend upon. Ant's optional task dependencies are li

运行VS出现warning C4996错误的解决办法

运行VS出现warning C4996错误, warning C4996: 'sprintf': This function or variable may be unsafe. Consider using sprintf_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. 解决办法:打开stdafx.h头文件,在include之前添加 #define _CRT_SECURE_NO_DEPRECATE

服务器Jar包冲突的解决

在开发测试时正常, 一旦部署到正式服务器上, 如果出现以下几类异常,很可能是存在jar包冲突: 1, java类找不到 java.lang.ClassNotFoundException2, 方法不存在 java.lang.NoSuchMethodError3, 字段不存在 java.lang.NoSuchFieldError4, 类错误 java.lang.LinkageError war包中jar包和服务器上jar包冲突, 由于现在服务器功能越来越强大, 而引入的jar也在不断增加, 出现的冲

idea升级maven工程jar包版本和解决jar包冲突

原来用过eclipse的都知道,想要升级maven工程的jar包版本或者解决jar包冲突,直接在pom文件下的dependency hierarchy视图下右击冲突的jar包,将其exclude掉,然后在pom文件的dependencies视图下点击add按钮,然后添加自己需要的jar包即可.在公司用的是idea,还不是很熟悉,摸索之下在maven窗口中点击show dependencies出现jar包的依赖关系图,如下图所示:                                  

Missing artifact com.sun:tools:jar:1.5.0解决办法

前一阵子下了最新的JavaEE版本的eclipse,导入maven工程之后,pom文件一直报Missing artifact com.sun:tools:jar:1.5.0.很纳闷,tools.jar是jdk自带的jar包.查了一下pom依赖图,原来是struts-core依赖了这个jar包. 试了网上的各种方法,只有把jdk的tools.jar复制到本地仓库,报错才消失. 具体解决办法如下: 把D:\Program Files (x86)\Java\jdk1.6.0_10\lib\tools.

兔子--Android Support v4包丢失的解决办法

在开发中,Android Support v4包丢失的解决办法: Project->properties->Java Build Path->Libraries->Add External Jars 中加入sdk目录下的extras/android/support/v4/android-support-v4.jar (如果找不到,则需要用sdk manager下载android support package)

部分 CM11 系统 Android 平板运行植物大战僵尸 2 黑屏的解决办法

原文 http://forum.xda-developers.com/showthread.php?t=2755197 部分 CM11 系统的 Android 平板(例如三星 GT-P5110 )运行植物大战僵尸 2 时黑屏,解决方法如下: 安装 NOMone Resolution Changer 使用上述软件(需要 ROOT 权限)将分辨率调整到 1280 * 768 (适用于 1280 * 800 分辨率的平板),保持 DPI 不变 运行植物大战僵尸 2 部分 CM11 系统 Android

关于delphi软件运行出现Invalid floating point operation的错误的解决办法

关于delphi软件运行出现Invalid floating point operation的错误的解决办法 关于delphi软件运行出现Invalid floating point operation的错误的解决办法软件如果有webbrowser载入网页的时候经常会出现这个错误.这个错误是webbrowser3个Bug之一.具体行程的原因大概我也不知道.基本是如果XP系统编译的,放到vista或者V7就容易出现这个错误.具体解决的办法也是很简单的.查看官方的解决办法如下.When runnin