Spark单机编译(on CentOS 6)

注:1. 编译Spark之前,需要搭建Java和Scala环境,参见http://www.cnblogs.com/kevingu/p/4418779.html

2. Spark之前使用sbt进行编译,现在建议使用maven并兼容sbt,但会逐步淘汰sbt编译方式。本文使用Maven工具编译Spark 1.2.0。

一、Maven工具搭建

(I)从http://maven.apache.org/download.cgi下载Maven二进制安装包apache-maven-3.2.5-bin.tar.gz,解压后放在/usr/maven目录下。

(II)添加环境变量

export M2_HOME=/usr/maven/apache-maven-3.2.5
export PATH=$PATH:$M2_HOME/bin
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

(III)编辑/usr/maven/apache-maven-3.2.5/conf/settings.xml配置文件(主要为<proxies>、<mirrors>和<profiles>标签,更新源使用国内http://maven.oschina.net/

<?xml version="1.0" encoding="UTF-8"?>

<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor
    license agreements. See the NOTICE file distributed with this work for additional
    information regarding copyright ownership. The ASF licenses this file to
    you under the Apache License, Version 2.0 (the "License"); you may not use
    this file except in compliance with the License. You may obtain a copy of
    the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required
    by applicable law or agreed to in writing, software distributed under the
    License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
    OF ANY KIND, either express or implied. See the License for the specific
    language governing permissions and limitations under the License. -->

<!-- | This is the configuration file for Maven. It can be specified at two
    levels: | | 1. User Level. This settings.xml file provides configuration
    for a single user, | and is normally provided in ${user.home}/.m2/settings.xml.
    | | NOTE: This location can be overridden with the CLI option: | | -s /path/to/user/settings.xml
    | | 2. Global Level. This settings.xml file provides configuration for all
    Maven | users on a machine (assuming they‘re all using the same Maven | installation).
    It‘s normally provided in | ${maven.home}/conf/settings.xml. | | NOTE: This
    location can be overridden with the CLI option: | | -gs /path/to/global/settings.xml
    | | The sections in this sample file are intended to give you a running start
    at | getting the most out of your Maven installation. Where appropriate,
    the default | values (values used when the setting is not specified) are
    provided. | | -->
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
    <!-- localRepository | The path to the local repository maven will use to
        store artifacts. | | Default: ${user.home}/.m2/repository
    -->
        <!--localRepository>F:/Maven/repo/m2/</localRepository-->

    <!-- interactiveMode | This will determine whether maven prompts you when
        it needs input. If set to false, | maven will use a sensible default value,
        perhaps based on some other setting, for | the parameter in question. | |
        Default: true <interactiveMode>true</interactiveMode> -->

    <!-- offline | Determines whether maven should attempt to connect to the
        network when executing a build. | This will have an effect on artifact downloads,
        artifact deployment, and others. | | Default: false <offline>false</offline> -->

    <!-- pluginGroups | This is a list of additional group identifiers that
        will be searched when resolving plugins by their prefix, i.e. | when invoking
        a command line like "mvn prefix:goal". Maven will automatically add the group
        identifiers | "org.apache.maven.plugins" and "org.codehaus.mojo" if these
        are not already contained in the list. | -->
    <pluginGroups>
        <!-- pluginGroup | Specifies a further group identifier to use for plugin
            lookup. <pluginGroup>com.your.plugins</pluginGroup> -->
    </pluginGroups>

    <!-- proxies | This is a list of proxies which can be used on this machine
        to connect to the network. | Unless otherwise specified (by system property
        or command-line switch), the first proxy | specification in this list marked
        as active will be used. | -->
     <proxies>
            <!--<proxy>
            <id>optional</id>
            <active>true</active>
            <protocol>http</protocol>
            <host>10.22.98.21</host>
            <port>8080</port>
        </proxy>
        -->
    </proxies> 

    <!-- servers | This is a list of authentication profiles, keyed by the server-id
        used within the system. | Authentication profiles can be used whenever maven
        must make a connection to a remote server. | -->
    <servers>
        <!-- server | Specifies the authentication information to use when connecting
            to a particular server, identified by | a unique name within the system (referred
            to by the ‘id‘ attribute below). | | NOTE: You should either specify username/password
            OR privateKey/passphrase, since these pairings are | used together. | <server>
            <id>deploymentRepo</id> <username>repouser</username> <password>repopwd</password>
            </server> -->

        <!-- Another sample, using keys to authenticate. <server> <id>siteServer</id>
            <privateKey>/path/to/private/key</privateKey> <passphrase>optional; leave
            empty if not used.</passphrase> </server> -->
    </servers>

    <!-- mirrors | This is a list of mirrors to be used in downloading artifacts
        from remote repositories. | | It works like this: a POM may declare a repository
        to use in resolving certain artifacts. | However, this repository may have
        problems with heavy traffic at times, so people have mirrored | it to several
        places. | | That repository definition will have a unique id, so we can create
        a mirror reference for that | repository, to be used as an alternate download
        site. The mirror site will be the preferred | server for that repository.
        | -->
    <mirrors>
        <!-- mirror | Specifies a repository mirror site to use instead of a given
            repository. The repository that | this mirror serves has an ID that matches
            the mirrorOf element of this mirror. IDs are used | for inheritance and direct
            lookup purposes, and must be unique across the set of mirrors. | -->
        <mirror>
            <id>nexus-osc</id>
            <mirrorOf>central</mirrorOf>
            <name>Nexus osc</name>
            <url>http://maven.oschina.net/content/groups/public/</url>
        </mirror>
        <mirror>
            <id>nexus-osc-thirdparty</id>
            <mirrorOf>thirdparty</mirrorOf>
            <name>Nexus osc thirdparty</name>
            <url>http://maven.oschina.net/content/repositories/thirdparty/</url>
        </mirror>

    </mirrors>

    <!-- profiles | This is a list of profiles which can be activated in a variety
        of ways, and which can modify | the build process. Profiles provided in the
        settings.xml are intended to provide local machine- | specific paths and
        repository locations which allow the build to work in the local environment.
        | | For example, if you have an integration testing plugin - like cactus
        - that needs to know where | your Tomcat instance is installed, you can provide
        a variable here such that the variable is | dereferenced during the build
        process to configure the cactus plugin. | | As noted above, profiles can
        be activated in a variety of ways. One way - the activeProfiles | section
        of this document (settings.xml) - will be discussed later. Another way essentially
        | relies on the detection of a system property, either matching a particular
        value for the property, | or merely testing its existence. Profiles can also
        be activated by JDK version prefix, where a | value of ‘1.4‘ might activate
        a profile when the build is executed on a JDK version of ‘1.4.2_07‘. | Finally,
        the list of active profiles can be specified directly from the command line.
        | | NOTE: For profiles defined in the settings.xml, you are restricted to
        specifying only artifact | repositories, plugin repositories, and free-form
        properties to be used as configuration | variables for plugins in the POM.
        | | -->
    <profiles>
        <!-- profile | Specifies a set of introductions to the build process, to
            be activated using one or more of the | mechanisms described above. For inheritance
            purposes, and to activate profiles via <activatedProfiles/> | or the command
            line, profiles have to have an ID that is unique. | | An encouraged best
            practice for profile identification is to use a consistent naming convention
            | for profiles, such as ‘env-dev‘, ‘env-test‘, ‘env-production‘, ‘user-jdcasey‘,
            ‘user-brett‘, etc. | This will make it more intuitive to understand what
            the set of introduced profiles is attempting | to accomplish, particularly
            when you only have a list of profile id‘s for debug. | | This profile example
            uses the JDK version to trigger activation, and provides a JDK-specific repo. -->
        <profile>
            <id>jdk-1.8</id>

            <activation>
                <jdk>1.8</jdk>
            </activation>

            <repositories>
                <repository>
                    <id>nexus</id>
                    <name>local private nexus</name>
                    <url>http://maven.oschina.net/content/groups/public/</url>
                    <releases>
                        <enabled>true</enabled>
                    </releases>
                    <snapshots>
                        <enabled>false</enabled>
                    </snapshots>
                </repository>
                <repository>
                                <id>osc_thirdparty</id>
                                <url>http://maven.oschina.net/content/repositories/thirdparty/</url>
                        </repository>
            </repositories>
            <pluginRepositories>
                <pluginRepository>
                    <id>nexus</id>
                    <name>local private nexus</name>
                    <url>http://maven.oschina.net/content/groups/public/</url>
                    <releases>
                        <enabled>true</enabled>
                    </releases>
                    <snapshots>
                        <enabled>false</enabled>
                    </snapshots>
                </pluginRepository>
            </pluginRepositories>
        </profile>

        <!-- | Here is another profile, activated by the system property ‘target-env‘
            with a value of ‘dev‘, | which provides a specific path to the Tomcat instance.
            To use this, your plugin configuration | might hypothetically look like:
            | | ... | <plugin> | <groupId>org.myco.myplugins</groupId> | <artifactId>myplugin</artifactId>
            | | <configuration> | <tomcatLocation>${tomcatPath}</tomcatLocation> | </configuration>
            | </plugin> | ... | | NOTE: If you just wanted to inject this configuration
            whenever someone set ‘target-env‘ to | anything, you could just leave off
            the <value/> inside the activation-property. | <profile> <id>env-dev</id>
            <activation> <property> <name>target-env</name> <value>dev</value> </property>
            </activation> <properties> <tomcatPath>/path/to/tomcat/instance</tomcatPath>
            </properties> </profile> -->
    </profiles>

    <!-- activeProfiles | List of profiles that are active for all builds. |
        <activeProfiles> <activeProfile>alwaysActiveProfile</activeProfile> <activeProfile>anotherAlwaysActiveProfile</activeProfile>
        </activeProfiles> -->
</settings>

(IV)验证打开Terminal,键入

mvn -v

显示以下信息,Maven工具搭建成功。

Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 2014-12-15T01:29:23+08:00)
Maven home: /usr/maven/apache-maven-3.2.5
Java version: 1.7.0_72, vendor: Oracle Corporation
Java home: /usr/java/jdk1.7.0_72/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-504.8.1.el6.x86_64", arch: "amd64", family: "unix"

二、从http://spark.apache.org/downloads.html下载Spark 1.2.0源码包,解压放在/usr/spark目录下。

三、打开Terminal,进入/usr/spark/spark-1.2.0目录,键入

mvn -DskipTests clean package

出现以下信息,开始编译。

[INFO] Scanning for projects...
Downloading: http://maven.oschina.net/content/groups/public/org/apache/apache/14/apache-14.pom
Downloaded: http://maven.oschina.net/content/groups/public/org/apache/apache/14/apache-14.pom (15 KB at 5.6 KB/sec)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] Spark Project Parent POM
[INFO] Spark Project Networking
[INFO] Spark Project Shuffle Streaming Service
[INFO] Spark Project Core
[INFO] Spark Project Bagel
[INFO] Spark Project GraphX
[INFO] Spark Project Streaming
[INFO] Spark Project Catalyst
[INFO] Spark Project SQL
[INFO] Spark Project ML Library
[INFO] Spark Project Tools
[INFO] Spark Project Hive
[INFO] Spark Project REPL
[INFO] Spark Project Assembly
[INFO] Spark Project External Twitter
[INFO] Spark Project External Flume Sink
[INFO] Spark Project External Flume
[INFO] Spark Project External MQTT
[INFO] Spark Project External ZeroMQ
[INFO] Spark Project External Kafka
[INFO] Spark Project Examples
[INFO]
[INFO] ------------------------------------------------------------------------

编译过程中,Maven根据情况,下载需要的文件包,受限国内网络条件,时间可能较长。过程中若因网络问题出现下载错误,再次键入编译命令,编译过程继续进行,警告可忽略。直到最后出现以下信息,编译完成。

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [35:17 min]
[INFO] Spark Project Networking ........................... SUCCESS [16:53 min]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 26.230 s]
[INFO] Spark Project Core ................................. SUCCESS [32:59 min]
[INFO] Spark Project Bagel ................................ SUCCESS [ 25.566 s]
[INFO] Spark Project GraphX ............................... SUCCESS [01:45 min]
[INFO] Spark Project Streaming ............................ SUCCESS [01:54 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [01:56 min]
[INFO] Spark Project SQL .................................. SUCCESS [05:14 min]
[INFO] Spark Project ML Library ........................... SUCCESS [03:17 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 15.841 s]
[INFO] Spark Project Hive ................................. SUCCESS [11:33 min]
[INFO] Spark Project REPL ................................. SUCCESS [ 54.570 s]
[INFO] Spark Project Assembly ............................. SUCCESS [ 46.018 s]
[INFO] Spark Project External Twitter ..................... SUCCESS [ 47.342 s]
[INFO] Spark Project External Flume Sink .................. SUCCESS [04:54 min]
[INFO] Spark Project External Flume ....................... SUCCESS [ 37.416 s]
[INFO] Spark Project External MQTT ........................ SUCCESS [ 34.923 s]
[INFO] Spark Project External ZeroMQ ...................... SUCCESS [01:05 min]
[INFO] Spark Project External Kafka ....................... SUCCESS [02:15 min]
[INFO] Spark Project Examples ............................. SUCCESS [11:07 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:15 h
[INFO] Finished at: 2015-01-02T17:21:15+08:00
[INFO] Final Memory: 69M/1122M
[INFO] ------------------------------------------------------------------------

四、启动Spark Shell

/usr/Spark/Spark-1.2.0目录下,键入

./bin/spark-shell

出现以下信息,Spark启动成功。

Using Spark‘s default log4j profile: org/apache/spark/log4j-defaults.properties
15/04/13 09:50:52 INFO SecurityManager: Changing view acls to: kevin
15/04/13 09:50:52 INFO SecurityManager: Changing modify acls to: kevin
15/04/13 09:50:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kevin); users with modify permissions: Set(kevin)
15/04/13 09:50:52 INFO HttpServer: Starting HTTP Server
15/04/13 09:50:52 INFO Utils: Successfully started service ‘HTTP class server‘ on port 55842.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  ‘_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.2.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_72)
Type in expressions to have them evaluated.
Type :help for more information.
15/04/13 09:50:57 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.131.151 instead (on interface eth0)
15/04/13 09:50:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/04/13 09:50:57 INFO SecurityManager: Changing view acls to: kevin
15/04/13 09:50:57 INFO SecurityManager: Changing modify acls to: kevin
15/04/13 09:50:57 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kevin); users with modify permissions: Set(kevin)
15/04/13 09:50:58 INFO Slf4jLogger: Slf4jLogger started
15/04/13 09:50:58 INFO Remoting: Starting remoting
15/04/13 09:50:58 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:41278]
15/04/13 09:50:58 INFO Utils: Successfully started service ‘sparkDriver‘ on port 41278.
15/04/13 09:50:58 INFO SparkEnv: Registering MapOutputTracker
15/04/13 09:50:58 INFO SparkEnv: Registering BlockManagerMaster
15/04/13 09:50:58 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150413095058-f481
15/04/13 09:50:58 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/04/13 09:50:59 INFO HttpFileServer: HTTP File server directory is /tmp/spark-15b2ae1c-3256-43a7-bc05-b79cb924911d
15/04/13 09:50:59 INFO HttpServer: Starting HTTP Server
15/04/13 09:50:59 INFO Utils: Successfully started service ‘HTTP file server‘ on port 41609.
15/04/13 09:50:59 INFO Utils: Successfully started service ‘SparkUI‘ on port 4040.
15/04/13 09:50:59 INFO SparkUI: Started SparkUI at http://192.168.131.151:4040
15/04/13 09:50:59 INFO Executor: Using REPL class URI: http://192.168.131.151:55842
15/04/13 09:50:59 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://[email protected]:41278/user/HeartbeatReceiver
15/04/13 09:50:59 INFO NettyBlockTransferService: Server created on 50724
15/04/13 09:50:59 INFO BlockManagerMaster: Trying to register BlockManager
15/04/13 09:50:59 INFO BlockManagerMasterActor: Registering block manager localhost:50724 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 50724)
15/04/13 09:50:59 INFO BlockManagerMaster: Registered BlockManager
15/04/13 09:50:59 INFO SparkILoop: Created spark context..
Spark context available as sc.

scala>

最后,单机编译Spark完成!

参考:Maven:http://maven.apache.org/

Spark:http://spark.apache.org/

时间: 2024-11-10 16:58:17

Spark单机编译(on CentOS 6)的相关文章

spark单机环境搭建以及快速入门

1 单机环境搭建 系统环境 cat /etc/centos-release CentOS Linux release 7.3.1611 (Core) 配置jdk8 wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.

编译安装 Centos 7 x64 + tengine.2.0.3 (实测+笔记)

环境: 系统硬件:vmware vsphere (CPU:2*4核,内存2G) 系统版本:CentOS Linux release 7.0.1406 安装步骤: 1.系统环境 1.1 更新系统 [[email protected] ~]# yum update -y 1.2 查看环境 [[email protected] ~]# cat /etc/redhat-release CentOS Linux release 7.0.1406 (Core) [[email protected] ~]#

Spark 单机 Demo.

安装好Spark 后,官方自带了一些demo, 路径在  Spark根目录/examples/src/main/python/ 里面有些例子,例如统计字数的 wordcount.py import sys from operator import add from pyspark import SparkContext import sys reload(sys) sys.setdefaultencoding("utf-8") if __name__ == "__main__

内核编译、CentOS无人值守系统安装

CentOS 7内核编译流程:首先,让自己的虚拟机处于NET模式这样连着互联网的话,如果在安装过程中需要什么工具方便直接安装:将主机内的内核文件(可以到kernel.org网站直接下载)也就是自己所想要编译的内核版本压缩包发送东虚拟机中的/root目录下 这时也需要在客户机上有xftp工具才能进行文件的传输然后解压缩:~] tar xf linux-3.16.56.tar.xz -C /usr/src/kernels/解压完成之后:~] du -sh /src/kernels/linux-3.1

Mac spark 单机部署

因为应用需要开始学习数据处理技术,网上多使用spark,随大流也选用spark (spark性能是hadoop的100倍,我也是道听途说,没有亲测.) 1.ssh免密登录配置 Mac 自带ssh 不需安装,只需要生成秘要并放入秘要文件中即可 生成秘要文件: ssh-keygen -t rsa第一个输入提示是 生成文件名可以直接回车使用默认的文件名,如果默认文件名已经有文件存在会有提示是否覆盖,根据提示输入yes即可覆盖原有文件.紧接着提示输入密码和确认密码.生成的文件默认在~/.ssh/目录中,

Spark单机环境安装

1.ubantu环境下安装JDK 我的jdk安装在/home/fuqiang/java/jvm目录下,scala,spark都是在此目录下,主要是JDK环境变量的设置$ sudo gedit /etc/profile在文档的最末尾加上export JAVA_HOME=/home/fuqiang/java/jvm/jdk1.7.0_79export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATHexport PATH=$JAVA_H

spark单机模式

1.下载spark,解压2.复制conf/spark-env.sh和conf/log4j.properties cp spark-env.sh.template spark-env.sh cp log4j.properties.template log4j.properties 3.编辑spark-env.sh,设置SPARK_LOCAL_IP,docker-1为主机名,对应IP为10.10.20.204 export SPARK_LOCAL_IP=docker-1 4.运行example,执行

Spark的编译

1.环境要求 2.检测环境 3.解压resposity 4.解压spark  tar -zxvf spark-1.6.1.tar.gz -C /etc/opt/modules/ 5.修改make-distribution.sh文件 6.修改pom.xml(共两处scala.version) 7.放置依赖服务,先上传 8.将依赖的压缩包解压到spark下的build下 tar -zxvf zinc-0.3.5.3.tgz -C /etc/opt/modules/spark-1.6.1/build

安装spark单机环境

(假定已经装好的hadoop,不管你装没装好,反正我是装好了) 1 下载spark安装包 http://spark.apache.org/downloads.html 下载spark-1.6.1-bin-without-hadoop.tgz 2 将spark包解压到自己指定目录 然后在spark中指定hadoop jar包的路径 先执行 hadoop classpath把输出内容记下来 在spark conf路径下新建spark-env.sh 然后输入以下内容:(注意:::::::把hadoop