注:1. 编译Spark之前,需要搭建Java和Scala环境,参见http://www.cnblogs.com/kevingu/p/4418779.html。
2. Spark之前使用sbt进行编译,现在建议使用maven并兼容sbt,但会逐步淘汰sbt编译方式。本文使用Maven工具编译Spark 1.2.0。
一、Maven工具搭建
(I)从http://maven.apache.org/download.cgi下载Maven二进制安装包apache-maven-3.2.5-bin.tar.gz,解压后放在/usr/maven目录下。
(II)添加环境变量
export M2_HOME=/usr/maven/apache-maven-3.2.5 export PATH=$PATH:$M2_HOME/bin export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
(III)编辑/usr/maven/apache-maven-3.2.5/conf/settings.xml配置文件(主要为<proxies>、<mirrors>和<profiles>标签,更新源使用国内http://maven.oschina.net/)
<?xml version="1.0" encoding="UTF-8"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <!-- | This is the configuration file for Maven. It can be specified at two levels: | | 1. User Level. This settings.xml file provides configuration for a single user, | and is normally provided in ${user.home}/.m2/settings.xml. | | NOTE: This location can be overridden with the CLI option: | | -s /path/to/user/settings.xml | | 2. Global Level. This settings.xml file provides configuration for all Maven | users on a machine (assuming they‘re all using the same Maven | installation). It‘s normally provided in | ${maven.home}/conf/settings.xml. | | NOTE: This location can be overridden with the CLI option: | | -gs /path/to/global/settings.xml | | The sections in this sample file are intended to give you a running start at | getting the most out of your Maven installation. Where appropriate, the default | values (values used when the setting is not specified) are provided. | | --> <settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd"> <!-- localRepository | The path to the local repository maven will use to store artifacts. | | Default: ${user.home}/.m2/repository --> <!--localRepository>F:/Maven/repo/m2/</localRepository--> <!-- interactiveMode | This will determine whether maven prompts you when it needs input. If set to false, | maven will use a sensible default value, perhaps based on some other setting, for | the parameter in question. | | Default: true <interactiveMode>true</interactiveMode> --> <!-- offline | Determines whether maven should attempt to connect to the network when executing a build. | This will have an effect on artifact downloads, artifact deployment, and others. | | Default: false <offline>false</offline> --> <!-- pluginGroups | This is a list of additional group identifiers that will be searched when resolving plugins by their prefix, i.e. | when invoking a command line like "mvn prefix:goal". Maven will automatically add the group identifiers | "org.apache.maven.plugins" and "org.codehaus.mojo" if these are not already contained in the list. | --> <pluginGroups> <!-- pluginGroup | Specifies a further group identifier to use for plugin lookup. <pluginGroup>com.your.plugins</pluginGroup> --> </pluginGroups> <!-- proxies | This is a list of proxies which can be used on this machine to connect to the network. | Unless otherwise specified (by system property or command-line switch), the first proxy | specification in this list marked as active will be used. | --> <proxies> <!--<proxy> <id>optional</id> <active>true</active> <protocol>http</protocol> <host>10.22.98.21</host> <port>8080</port> </proxy> --> </proxies> <!-- servers | This is a list of authentication profiles, keyed by the server-id used within the system. | Authentication profiles can be used whenever maven must make a connection to a remote server. | --> <servers> <!-- server | Specifies the authentication information to use when connecting to a particular server, identified by | a unique name within the system (referred to by the ‘id‘ attribute below). | | NOTE: You should either specify username/password OR privateKey/passphrase, since these pairings are | used together. | <server> <id>deploymentRepo</id> <username>repouser</username> <password>repopwd</password> </server> --> <!-- Another sample, using keys to authenticate. <server> <id>siteServer</id> <privateKey>/path/to/private/key</privateKey> <passphrase>optional; leave empty if not used.</passphrase> </server> --> </servers> <!-- mirrors | This is a list of mirrors to be used in downloading artifacts from remote repositories. | | It works like this: a POM may declare a repository to use in resolving certain artifacts. | However, this repository may have problems with heavy traffic at times, so people have mirrored | it to several places. | | That repository definition will have a unique id, so we can create a mirror reference for that | repository, to be used as an alternate download site. The mirror site will be the preferred | server for that repository. | --> <mirrors> <!-- mirror | Specifies a repository mirror site to use instead of a given repository. The repository that | this mirror serves has an ID that matches the mirrorOf element of this mirror. IDs are used | for inheritance and direct lookup purposes, and must be unique across the set of mirrors. | --> <mirror> <id>nexus-osc</id> <mirrorOf>central</mirrorOf> <name>Nexus osc</name> <url>http://maven.oschina.net/content/groups/public/</url> </mirror> <mirror> <id>nexus-osc-thirdparty</id> <mirrorOf>thirdparty</mirrorOf> <name>Nexus osc thirdparty</name> <url>http://maven.oschina.net/content/repositories/thirdparty/</url> </mirror> </mirrors> <!-- profiles | This is a list of profiles which can be activated in a variety of ways, and which can modify | the build process. Profiles provided in the settings.xml are intended to provide local machine- | specific paths and repository locations which allow the build to work in the local environment. | | For example, if you have an integration testing plugin - like cactus - that needs to know where | your Tomcat instance is installed, you can provide a variable here such that the variable is | dereferenced during the build process to configure the cactus plugin. | | As noted above, profiles can be activated in a variety of ways. One way - the activeProfiles | section of this document (settings.xml) - will be discussed later. Another way essentially | relies on the detection of a system property, either matching a particular value for the property, | or merely testing its existence. Profiles can also be activated by JDK version prefix, where a | value of ‘1.4‘ might activate a profile when the build is executed on a JDK version of ‘1.4.2_07‘. | Finally, the list of active profiles can be specified directly from the command line. | | NOTE: For profiles defined in the settings.xml, you are restricted to specifying only artifact | repositories, plugin repositories, and free-form properties to be used as configuration | variables for plugins in the POM. | | --> <profiles> <!-- profile | Specifies a set of introductions to the build process, to be activated using one or more of the | mechanisms described above. For inheritance purposes, and to activate profiles via <activatedProfiles/> | or the command line, profiles have to have an ID that is unique. | | An encouraged best practice for profile identification is to use a consistent naming convention | for profiles, such as ‘env-dev‘, ‘env-test‘, ‘env-production‘, ‘user-jdcasey‘, ‘user-brett‘, etc. | This will make it more intuitive to understand what the set of introduced profiles is attempting | to accomplish, particularly when you only have a list of profile id‘s for debug. | | This profile example uses the JDK version to trigger activation, and provides a JDK-specific repo. --> <profile> <id>jdk-1.8</id> <activation> <jdk>1.8</jdk> </activation> <repositories> <repository> <id>nexus</id> <name>local private nexus</name> <url>http://maven.oschina.net/content/groups/public/</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> </snapshots> </repository> <repository> <id>osc_thirdparty</id> <url>http://maven.oschina.net/content/repositories/thirdparty/</url> </repository> </repositories> <pluginRepositories> <pluginRepository> <id>nexus</id> <name>local private nexus</name> <url>http://maven.oschina.net/content/groups/public/</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> </snapshots> </pluginRepository> </pluginRepositories> </profile> <!-- | Here is another profile, activated by the system property ‘target-env‘ with a value of ‘dev‘, | which provides a specific path to the Tomcat instance. To use this, your plugin configuration | might hypothetically look like: | | ... | <plugin> | <groupId>org.myco.myplugins</groupId> | <artifactId>myplugin</artifactId> | | <configuration> | <tomcatLocation>${tomcatPath}</tomcatLocation> | </configuration> | </plugin> | ... | | NOTE: If you just wanted to inject this configuration whenever someone set ‘target-env‘ to | anything, you could just leave off the <value/> inside the activation-property. | <profile> <id>env-dev</id> <activation> <property> <name>target-env</name> <value>dev</value> </property> </activation> <properties> <tomcatPath>/path/to/tomcat/instance</tomcatPath> </properties> </profile> --> </profiles> <!-- activeProfiles | List of profiles that are active for all builds. | <activeProfiles> <activeProfile>alwaysActiveProfile</activeProfile> <activeProfile>anotherAlwaysActiveProfile</activeProfile> </activeProfiles> --> </settings>
(IV)验证打开Terminal,键入
mvn -v
显示以下信息,Maven工具搭建成功。
Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 2014-12-15T01:29:23+08:00) Maven home: /usr/maven/apache-maven-3.2.5 Java version: 1.7.0_72, vendor: Oracle Corporation Java home: /usr/java/jdk1.7.0_72/jre Default locale: en_US, platform encoding: UTF-8 OS name: "linux", version: "2.6.32-504.8.1.el6.x86_64", arch: "amd64", family: "unix"
二、从http://spark.apache.org/downloads.html下载Spark 1.2.0源码包,解压放在/usr/spark目录下。
三、打开Terminal,进入/usr/spark/spark-1.2.0目录,键入
mvn -DskipTests clean package
出现以下信息,开始编译。
[INFO] Scanning for projects... Downloading: http://maven.oschina.net/content/groups/public/org/apache/apache/14/apache-14.pom Downloaded: http://maven.oschina.net/content/groups/public/org/apache/apache/14/apache-14.pom (15 KB at 5.6 KB/sec) [INFO] ------------------------------------------------------------------------ [INFO] Reactor Build Order: [INFO] [INFO] Spark Project Parent POM [INFO] Spark Project Networking [INFO] Spark Project Shuffle Streaming Service [INFO] Spark Project Core [INFO] Spark Project Bagel [INFO] Spark Project GraphX [INFO] Spark Project Streaming [INFO] Spark Project Catalyst [INFO] Spark Project SQL [INFO] Spark Project ML Library [INFO] Spark Project Tools [INFO] Spark Project Hive [INFO] Spark Project REPL [INFO] Spark Project Assembly [INFO] Spark Project External Twitter [INFO] Spark Project External Flume Sink [INFO] Spark Project External Flume [INFO] Spark Project External MQTT [INFO] Spark Project External ZeroMQ [INFO] Spark Project External Kafka [INFO] Spark Project Examples [INFO] [INFO] ------------------------------------------------------------------------
编译过程中,Maven根据情况,下载需要的文件包,受限国内网络条件,时间可能较长。过程中若因网络问题出现下载错误,再次键入编译命令,编译过程继续进行,警告可忽略。直到最后出现以下信息,编译完成。
[INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [35:17 min] [INFO] Spark Project Networking ........................... SUCCESS [16:53 min] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 26.230 s] [INFO] Spark Project Core ................................. SUCCESS [32:59 min] [INFO] Spark Project Bagel ................................ SUCCESS [ 25.566 s] [INFO] Spark Project GraphX ............................... SUCCESS [01:45 min] [INFO] Spark Project Streaming ............................ SUCCESS [01:54 min] [INFO] Spark Project Catalyst ............................. SUCCESS [01:56 min] [INFO] Spark Project SQL .................................. SUCCESS [05:14 min] [INFO] Spark Project ML Library ........................... SUCCESS [03:17 min] [INFO] Spark Project Tools ................................ SUCCESS [ 15.841 s] [INFO] Spark Project Hive ................................. SUCCESS [11:33 min] [INFO] Spark Project REPL ................................. SUCCESS [ 54.570 s] [INFO] Spark Project Assembly ............................. SUCCESS [ 46.018 s] [INFO] Spark Project External Twitter ..................... SUCCESS [ 47.342 s] [INFO] Spark Project External Flume Sink .................. SUCCESS [04:54 min] [INFO] Spark Project External Flume ....................... SUCCESS [ 37.416 s] [INFO] Spark Project External MQTT ........................ SUCCESS [ 34.923 s] [INFO] Spark Project External ZeroMQ ...................... SUCCESS [01:05 min] [INFO] Spark Project External Kafka ....................... SUCCESS [02:15 min] [INFO] Spark Project Examples ............................. SUCCESS [11:07 min] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:15 h [INFO] Finished at: 2015-01-02T17:21:15+08:00 [INFO] Final Memory: 69M/1122M [INFO] ------------------------------------------------------------------------
四、启动Spark Shell
在/usr/Spark/Spark-1.2.0目录下,键入
./bin/spark-shell
出现以下信息,Spark启动成功。
Using Spark‘s default log4j profile: org/apache/spark/log4j-defaults.properties 15/04/13 09:50:52 INFO SecurityManager: Changing view acls to: kevin 15/04/13 09:50:52 INFO SecurityManager: Changing modify acls to: kevin 15/04/13 09:50:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kevin); users with modify permissions: Set(kevin) 15/04/13 09:50:52 INFO HttpServer: Starting HTTP Server 15/04/13 09:50:52 INFO Utils: Successfully started service ‘HTTP class server‘ on port 55842. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ ‘_/ /___/ .__/\_,_/_/ /_/\_\ version 1.2.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_72) Type in expressions to have them evaluated. Type :help for more information. 15/04/13 09:50:57 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.131.151 instead (on interface eth0) 15/04/13 09:50:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 15/04/13 09:50:57 INFO SecurityManager: Changing view acls to: kevin 15/04/13 09:50:57 INFO SecurityManager: Changing modify acls to: kevin 15/04/13 09:50:57 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kevin); users with modify permissions: Set(kevin) 15/04/13 09:50:58 INFO Slf4jLogger: Slf4jLogger started 15/04/13 09:50:58 INFO Remoting: Starting remoting 15/04/13 09:50:58 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:41278] 15/04/13 09:50:58 INFO Utils: Successfully started service ‘sparkDriver‘ on port 41278. 15/04/13 09:50:58 INFO SparkEnv: Registering MapOutputTracker 15/04/13 09:50:58 INFO SparkEnv: Registering BlockManagerMaster 15/04/13 09:50:58 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150413095058-f481 15/04/13 09:50:58 INFO MemoryStore: MemoryStore started with capacity 265.4 MB 15/04/13 09:50:59 INFO HttpFileServer: HTTP File server directory is /tmp/spark-15b2ae1c-3256-43a7-bc05-b79cb924911d 15/04/13 09:50:59 INFO HttpServer: Starting HTTP Server 15/04/13 09:50:59 INFO Utils: Successfully started service ‘HTTP file server‘ on port 41609. 15/04/13 09:50:59 INFO Utils: Successfully started service ‘SparkUI‘ on port 4040. 15/04/13 09:50:59 INFO SparkUI: Started SparkUI at http://192.168.131.151:4040 15/04/13 09:50:59 INFO Executor: Using REPL class URI: http://192.168.131.151:55842 15/04/13 09:50:59 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://[email protected]:41278/user/HeartbeatReceiver 15/04/13 09:50:59 INFO NettyBlockTransferService: Server created on 50724 15/04/13 09:50:59 INFO BlockManagerMaster: Trying to register BlockManager 15/04/13 09:50:59 INFO BlockManagerMasterActor: Registering block manager localhost:50724 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 50724) 15/04/13 09:50:59 INFO BlockManagerMaster: Registered BlockManager 15/04/13 09:50:59 INFO SparkILoop: Created spark context.. Spark context available as sc. scala>
最后,单机编译Spark完成!
参考:Maven:http://maven.apache.org/
Spark:http://spark.apache.org/