这一阶段主要是在学习Scala,知乎上说推荐先学习一下Haskell再学习Scala,但我觉得不一定要曲线救国。不过在学习过程中遇到的困难的确不少,好歹Scala是公认的其特性的复杂程度是要超过C++的嘛:-)
我学习Scala的主要动机是想研究Spark,尽管Python和Java等都可以用来开发Spark应用,但是Spark本身就是一个Scala项目,而且Spark也不能算是一个成熟的产品,也许在我遇到问题的时候用Scala可以更加高效的解决问题
今天初步看了部分Spark的官方文档,在IntelliJ中,我尝试运行了SparkPi,整个过程遇到一些问题
首先是当我把相关的包导入好后,Run,报错:
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.<init>(SparkContext.scala:185) at SparkDemo.SimpleApp$.main(SimpleApp.scala:13) at SparkDemo.SimpleApp.main(SimpleApp.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) Using Spark‘s default log4j profile: org/apache/spark/log4j-defaults.propertie
解决办法:在IDE中点击Run -> Edit Configuration,在右侧VM options中输入“-Dspark.master=local”,指示本程序本地单线程运行
再次运行,依旧出错:
Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet; at akka.actor.ActorCell$.<init>(ActorCell.scala:336) at akka.actor.ActorCell$.<clinit>(ActorCell.scala) at akka.actor.RootActorPath.$div(ActorPath.scala:159) at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:464) at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:452) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78) at scala.util.Try$.apply(Try.scala:191) at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73) at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84) at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84) at scala.util.Success.flatMap(Try.scala:230) at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84) at akka.actor.ActorSystemImpl.liftedTree1$1(ActorSystem.scala:584) at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:577) at akka.actor.ActorSystem$.apply(ActorSystem.scala:141) at akka.actor.ActorSystem$.apply(ActorSystem.scala:108) at akka.Akka$.delayedEndpoint$akka$Akka$1(Akka.scala:11) at akka.Akka$delayedInit$body.apply(Akka.scala:9) at scala.Function0$class.apply$mcV$sp(Function0.scala:40) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:383) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at akka.Akka$.main(Akka.scala:9) at akka.Akka.main(Akka.scala)
之前的一篇博客提到过,我安装的Scala版本为2.11.5,Spark版本为1.2.0,看来Spark版本和Scala版本还是存在一些兼容性问题,将Scala改为2.10.4问题就解决了,程序运行的结果:
Using Spark‘s default log4j profile: org/apache/spark/log4j-defaults.properties ........15/07/27 19:50:23 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1260 bytes) 15/07/27 19:50:23 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 15/07/27 19:50:23 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 727 bytes result sent to driver 15/07/27 19:50:23 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1260 bytes) 15/07/27 19:50:23 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 15/07/27 19:50:23 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 116 ms on localhost (1/2) 15/07/27 19:50:23 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 727 bytes result sent to driver 15/07/27 19:50:23 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 22 ms on localhost (2/2) 15/07/27 19:50:23 INFO DAGScheduler: Stage 0 (reduce at SparkPI.scala:20) finished in 0.146 s 15/07/27 19:50:23 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/07/27 19:50:23 INFO DAGScheduler: Job 0 finished: reduce at SparkPI.scala:20, took 0.380229 s Pi is roughly 3.13726 15/07/27 19:50:23 INFO SparkUI: Stopped Spark web UI at http://211.66.87.51:4040 15/07/27 19:50:23 INFO DAGScheduler: Stopping DAGScheduler 15/07/27 19:50:24 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped! 15/07/27 19:50:24 INFO MemoryStore: MemoryStore cleared 15/07/27 19:50:24 INFO BlockManager: BlockManager stopped 15/07/27 19:50:24 INFO BlockManagerMaster: BlockManagerMaster stopped 15/07/27 19:50:24 INFO SparkContext: Successfully stopped SparkContext
时间: 2024-10-02 00:45:28