Hadoop2.6.0 + Spark1.4.0 在Ubuntu14.10环境下的伪分布式集群的搭建(实践可用)

前言,之前曾多次搭建集群,由于疏于记录,每次搭建的时候到处翻阅博客,很是费劲,在此特别记录集群的搭建过程。

0、环境:Ubuntu14.10、Hadoop2.6.0、spark-1.4.0

1、安装jdk1.7

  (1)下载jdk-7u25-linux-i586.tar.gz;

  (2)解压jdk-7u25-linux-i586.tar.gz,并将其移动到 /opt/java/jdk/路径下面

  (3)配置java环境变量:

    在 /etc/profile文件中追加  

  #set java env
  export JAVA_HOME=/opt/java/jdk/jdk1.7.0_25
  export JRE_HOME=${JAVA_HOME}/jre
  export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
  export PATH=${JAVA_HOME}/bin:$PATH

  (4)验证,如下则安装成功:

[email protected]:~/installs$ java -version
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) Client VM (build 23.25-b01, mixed mode)

  特别注意:之前在root用户下安装好jdk,然后切换到hadoop用户下执行java -version就报错,最后排查是因为把java环境变量配置到~/.bashrc中了,重新配置到/etc/profile后,问题解决。

2、安装并配置ssh

  由于在线安装故障连连,我选择了离线安装:

  (1)下载ssh包

   “在launchpad.net/Ubuntu/中搜索openssh,根据搜索结果选择对应开发代号下选择相应版本即可。本文是在Ubuntu 12.10上安装的,而其对应的开发代号为Quantal   Quetzal,运行环境为i386,故而下载以下三个文件:openssh-client_6.0p1-3ubuntu1_i386.deb、openssh-server_6.0p1-3ubuntu1_i386.deb、ssh_6.0p1-               3ubuntu1_all.deb。”

(2)运行安装命令

  依次运行如下安装命令:

  sudo dpkg -i openssh-client_6.0p1-3ubuntu1_i386.deb
  sudo dpkg -i openssh-server_6.0p1-3ubuntu1_i386.deb
  sudo dpkg -i ssh_6.0p1-3ubuntu1_all.deb

  (3)验证,执行 ssh localhost 能登录则说明安装成功。

  (4)ssh免密码登录(root用户下)

  ssh-keygen -t rsa -P ""然后一直回车即可
  cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys

3、安装配置Hadoop

  (1)安装hadoop

  将hadoop-2.6.0.tar.gz 解压到 /opt/hadoop/路径下;

  (2)配置hadoop({HADOOP_HOME}/etc/hadoop路径下)

  配置hadoop-env.sh,追加java环境变量 

  #java env
  export JAVA_HOME=/opt/java/jdk/jdk1.7.0_25

  (3)配置core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>

</configuration>

  (4)配置hdfs-site.xml

<configuration>
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>

<property>
    <name>dfs.namenode.name.dir</name>
    <value>/home/hadoop/dfs/name</value>
</property>

<property>
    <name>dfs.datannode.data.dir</name>
    <value>/home/hadoop/dfs/data</value>
</property>
</configuration>

  (5)配置mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

  (6)配置yarn-site.xml

<configuration>

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>

<!-- Site specific YARN configuration properties -->

</configuration>

  (7)格式化namenode,并启动集群

  bin/hdfs namenode -format

  sbin/start-all.sh

  可通过localhost:50070和localhost:8088 查看Web或者用bin/hadoop dfsadmin -report命令查看集群是否正常启动,如下:

[email protected]:/opt/hadoop/hadoop-2.6.0$ bin/hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

15/10/22 01:34:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 19945680896 (18.58 GB)
Present Capacity: 13635391488 (12.70 GB)
DFS Remaining: 13635178496 (12.70 GB)
DFS Used: 212992 (208 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Live datanodes (1):

Name: 127.0.0.1:50010 (localhost)
Hostname: ubuntu
Decommission Status : Normal
Configured Capacity: 19945680896 (18.58 GB)
DFS Used: 212992 (208 KB)
Non DFS Used: 6310289408 (5.88 GB)
DFS Remaining: 13635178496 (12.70 GB)
DFS Used%: 0.00%
DFS Remaining%: 68.36%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Oct 22 01:34:25 PDT 2015

  (8)运行WordCount

$bin/hadoop fs -mkdir /input
$bin/hadoop fs -copyFromLocal /home/test.txt /input
$cd  /opt/hadoop/hadoop-2.6.0/share/hadoop/mapreduce
$/opt/hadoop/hadoop-2.6.0/bin/hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output
查看结果:
 $/opt/hadoop/hadoop-2.6.0/bin/hadoop fs -cat /output/*

4、安装配置Spark1.4

  将spark-1.4.0-bin-hadoop2.6.tgz解压到 /opt/spark/路径下

  验证:可通过Web管理页面localhost:4040或者运行自带程序验证(bin/run-example SparkPi 10

  安装成功:在spark目录下,运行spark-shell将出现如下:

[email protected]:/opt/spark$ bin/spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark‘s default log4j profile: org/apache/spark/log4j-defaults.properties
15/10/22 01:44:26 INFO SecurityManager: Changing view acls to: hadoop
15/10/22 01:44:26 INFO SecurityManager: Changing modify acls to: hadoop
15/10/22 01:44:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/10/22 01:44:26 INFO HttpServer: Starting HTTP Server
15/10/22 01:44:27 INFO Utils: Successfully started service ‘HTTP class server‘ on port 51327.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  ‘_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.4.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) Client VM, Java 1.7.0_25)
Type in expressions to have them evaluated.
Type :help for more information.
15/10/22 01:44:36 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.111.130 instead (on interface eth0)
15/10/22 01:44:36 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/10/22 01:44:36 INFO SparkContext: Running Spark version 1.4.0
15/10/22 01:44:36 INFO SecurityManager: Changing view acls to: hadoop
15/10/22 01:44:36 INFO SecurityManager: Changing modify acls to: hadoop
15/10/22 01:44:36 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/10/22 01:44:37 INFO Slf4jLogger: Slf4jLogger started
15/10/22 01:44:37 INFO Remoting: Starting remoting
15/10/22 01:44:38 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:35977]
15/10/22 01:44:38 INFO Utils: Successfully started service ‘sparkDriver‘ on port 35977.
15/10/22 01:44:38 INFO SparkEnv: Registering MapOutputTracker
15/10/22 01:44:38 INFO SparkEnv: Registering BlockManagerMaster
15/10/22 01:44:38 INFO DiskBlockManager: Created local directory at /tmp/spark-08e380aa-a102-48a2-91e3-b358cb2a6a35/blockmgr-d25aa3bd-b1af-4746-9d1a-edd7e8f1e08c
15/10/22 01:44:38 INFO MemoryStore: MemoryStore started with capacity 267.3 MB
15/10/22 01:44:39 INFO HttpFileServer: HTTP File server directory is /tmp/spark-08e380aa-a102-48a2-91e3-b358cb2a6a35/httpd-4113cef7-2865-4efd-890a-19fcbde49bcb
15/10/22 01:44:39 INFO HttpServer: Starting HTTP Server
15/10/22 01:44:39 INFO Utils: Successfully started service ‘HTTP file server‘ on port 33633.
15/10/22 01:44:39 INFO SparkEnv: Registering OutputCommitCoordinator
15/10/22 01:44:41 INFO Utils: Successfully started service ‘SparkUI‘ on port 4040.
15/10/22 01:44:41 INFO SparkUI: Started SparkUI at http://192.168.111.130:4040
15/10/22 01:44:42 INFO Executor: Starting executor ID driver on host localhost
15/10/22 01:44:42 INFO Executor: Using REPL class URI: http://192.168.111.130:51327
15/10/22 01:44:45 INFO Utils: Successfully started service ‘org.apache.spark.network.netty.NettyBlockTransferService‘ on port 37625.
15/10/22 01:44:45 INFO NettyBlockTransferService: Server created on 37625
15/10/22 01:44:45 INFO BlockManagerMaster: Trying to register BlockManager
15/10/22 01:44:45 INFO BlockManagerMasterEndpoint: Registering block manager localhost:37625 with 267.3 MB RAM, BlockManagerId(driver, localhost, 37625)
15/10/22 01:44:45 INFO BlockManagerMaster: Registered BlockManager
15/10/22 01:44:45 INFO SparkILoop: Created spark context..
Spark context available as sc.
15/10/22 01:44:48 INFO HiveContext: Initializing execution hive, version 0.13.1
15/10/22 01:44:49 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
15/10/22 01:44:49 INFO ObjectStore: ObjectStore, initialize called
15/10/22 01:44:50 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
15/10/22 01:44:50 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
15/10/22 01:44:50 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
Thu Oct 22 01:44:51 PDT 2015 Thread[main,5,main] java.io.FileNotFoundException: derby.log (Permission denied)
15/10/22 01:44:51 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
----------------------------------------------------------------
Thu Oct 22 01:44:51 PDT 2015:
Booting Derby version The Apache Software Foundation - Apache Derby - 10.10.1.1 - (1458268): instance a816c00e-0150-8eb8-dd90-0000186374f8
on database directory /tmp/spark-ea20e824-5489-4ead-a2d7-c8b14434dc51/metastore with class loader [email protected]
Loaded from file:/opt/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar
java.vendor=Oracle Corporation
java.runtime.version=1.7.0_25-b15
user.dir=/opt/spark
os.name=Linux
os.arch=i386
os.version=3.16.0-23-generic
derby.system.home=null
Database Class Loader started - derby.database.classpath=‘‘
15/10/22 01:44:53 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
15/10/22 01:44:53 INFO MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5.  Encountered: "@" (64), after : "".
15/10/22 01:44:54 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/10/22 01:44:54 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/10/22 01:44:55 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/10/22 01:44:55 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/10/22 01:44:55 INFO ObjectStore: Initialized ObjectStore
15/10/22 01:44:56 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa
15/10/22 01:44:56 INFO HiveMetaStore: Added admin role in metastore
15/10/22 01:44:56 INFO HiveMetaStore: Added public role in metastore
15/10/22 01:44:56 INFO HiveMetaStore: No user is added in admin role, since config is empty
15/10/22 01:44:57 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
15/10/22 01:44:57 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.

scala> 

参考文献:

1、http://www.aboutyun.com/thread-10554-1-1.html

2、http://www.linuxidc.com/Linux/2013-04/82814.htm

3、http://blog.csdn.net/jediael_lu/article/details/45314317

时间: 2024-08-02 06:59:21

Hadoop2.6.0 + Spark1.4.0 在Ubuntu14.10环境下的伪分布式集群的搭建(实践可用)的相关文章

Tachyon 0.7.1伪分布式集群安装与测试

Tachyon是一个高容错的分布式文件系统,允许文件以内存的速度在集群框架中进行可靠的共享,就像Spark和 MapReduce那样.通过利用信息继承,内存侵入,Tachyon获得了高性能.Tachyon工作集文件缓存在内存中,并且让不同的 Jobs/Queries以及框架都能内存的速度来访问缓存文件.因此,Tachyon可以减少那些需要经常使用的数据集通过访问磁盘来获得的次数. 源码下载 源码地址:https://github.com/amplab/tachyon git clone http

hadoop-2.9.2搭建伪分布式集群

准备4台机器分别为node1   node2    node3    node5 1.第一台node1 1.修改固定IP 2.修改主机名 3.添加与其他机器的映射 2.第二台机器 通过克隆复制多个,按上面的配置进行配置 3.免密登录 假如node1需要免密登录到node2   node3   node5 在node1的机器上执行 ssh-keygen -t rsa 生成的秘钥文件在root/.ssh目录下 在node1上把生成的公钥给其他node2  node3  node5 在其他机器上执行

Mac Hadoop2.6(CDH5.9.2)伪分布式集群安装

操作系统: MAC OS X 一.准备 1. JDK 1.8 下载地址:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html 2.Hadoop CDH 下载地址:https://archive.cloudera.com/cdh5/cdh/5/ 本次安装版本:hadoop-2.6.0-cdh5.9.2.tar.gz 二.配置SSH(免密码登录) 1.打开iTerm2 终端,输入:ssh

ubuntu14.10环境下使用apache2.2配置代理服务器

参考网址: 使用Apache架设代理服务器 ubuntu之apache正向代理及反向代理(ProxyPass\ProxyPassReverse) Apache配置正向代理与反向代理 我的配置步骤: 1. sudo a2enmod proxy proxy_ajp proxy_balancer proxy_connect proxy_ftp proxy_http 2. #向/etc/apache2/httpd.conf中写入 Listen 8088 #正向代理设置 ProxyRequests On

MongoDB4.0搭建分布式集群

搭建之前先了解一下MongoDB分片群集主要有如下三个组件: Shard:分片服务器,用于存储实际的数据块,实际生产环境中一个shard server 角色可以由几台服务器组成一个Peplica Set 承担,防止主机单点故障. Config Server:配置服务器,存储了整个分片群集的配置信息,其中包括chunk信息. Routers:前端路由,客户端由此接入,且让整个群集看上去像单一数据库,前端应用可以透明使用. 系统环境 Centos7.5.MongoDB4.0.2.关闭防火墙. IP

Hadoop2.7.3+HBase1.2.5+ZooKeeper3.4.6搭建分布式集群环境

一.环境说明 个人理解:zookeeper可以独立搭建集群,hbase本身不能独立搭建集群需要和hadoop和hdfs整合 集群环境至少需要3个节点(也就是3台服务器设备):1个Master,2个Slave,节点之间局域网连接,可以相互ping通,下面举例说明,配置节点IP分配如下: IP     角色10.10.50.133 master10.10.125.156 slave110.10.114.112 slave2 三个节点均使用CentOS 6.5系统,为了便于维护,集群环境配置项最好使用

CentOS7+Hadoop2.7.2(HA高可用+Federation联邦)+Hive1.2.1+Spark2.1.0 完全分布式集群安装

1       VM网络配置... 3 2       CentOS配置... 5 2.1             下载地址... 5 2.2             激活网卡... 5 2.3             SecureCRT. 5 2.4             修改主机名... 6 2.5             yum代理上网... 7 2.6             安装ifconfig. 8 2.7             wget安装与代理... 8 2.8       

分布式实时日志系统(四) 环境搭建之centos 6.4下hbase 1.0.1 分布式集群搭建

一.hbase简介 HBase是一个开源的非关系型分布式数据库(NoSQL),它参考了谷歌的BigTable建模,实现的编程语言为 Java.它是Apache软件基金会的Hadoop项目的一部分,运行于HDFS文件系统之上,为 Hadoop 提供类似于BigTable 规模的服务.因此,它可以容错地存储海量稀疏的数据.HBase在列上实现了BigTable论文提到的压缩算法.内存操作和布隆过滤器.HBase的表能够作为MapReduce任务的输入和输出,可以通过Java API来存取数据,也可以

VMware 虚拟机安装 hadoop 2.6.0 完全分布式集群

最近连着搭了两次hadoop的集群,搭建的时候也碰到了一些问题,因为之前对linux 不熟悉,经常遇到各种问题和命令忘记写,幸亏有度娘谷哥,这里做一个记录 下次使用的时候用的上 计算机的配置 计算机: G3440 3.3G 双核 8G内存 虚拟机: vmware workstation 12 pro 系统: centos6.5 节点: 192.168.133.33 master.hadoop 192.168.1.151 slave1.hadoop 192.168.1.151 slave2.had