CDH5.3集群安装笔记-环境准备(1)

Hadoop是一个复杂的系统组合,搭建一个用于生产的Hadoop环境是一件非常麻烦的事情。但这个世界上总有一些牛人会帮你解决一些看似痛苦的问题,如果现在没有,那也是早晚的事。CDH是Cloudera旗下的Hadoop套装环境,CDH的相关介绍请各位亲自己查阅www.cloudera.com,我就不再多说了。这里主要是介绍使用CDH5.3安装一个可以用于生产的Hadoop环境。虽然人家Cloudera牛人帮你解决了hadoop安装的问题,但随之而来的是:Cloudera Manager的安装不比hadoop的安装来得简单,而且有很多坑,后面的文章里我们将一一踩过去。

第一篇 环境准备

一、服务器准备:

我们准备一个12台的小集群,所有服务器安装Redhat 6.4 server x64 操作系统。服务器的hostname统一命名为server[1-12].cdhwork.org,内网ip地址为192.168.10.[1-12],所有服务器都必须设置DNS服务器(可以用202.96.209.5或者8.8.8.8),所有服务器的root密码必须设置成一样的

服务器角色分配表
服务器(cdhwork.org) ip地址 安装的角色
server1 192.168.10.1 CDH本地镜像,cloudera manager,时间服务器
server2 192.168.10.2 Cloudera Management Service Host Monitor

Cloudera Management Service Service Monitor

server3 192.168.10.3 HDFS NameNode

Hive Gateway

Impala Catalog Server

Cloudera Management Service Alert Publisher

Spark Gateway

ZooKeeper Server

server4 192.168.10.4 HDFS SecondaryNameNode

Hive Gateway

Impala StateStore

Solr Server

Spark Gateway

YARN (MR2 Included) ResourceManager

ZooKeeper Server

server5 192.168.10.5 HDFS Balancer

Hive Gateway

Hue Server

Cloudera Management Service Activity Monitor

Oozie Server

Spark Gateway

Sqoop 2 Server

ZooKeeper Server

server6 192.168.10.6 HBase Master

Hive Gateway

MapReduce JobTracker

Solr Server

Spark Gateway

YARN (MR2 Included) JobHistory Server

ZooKeeper Server

server7 192.168.10.7 HBase REST Server

HBase Thrift Server

Hive Metastore Server

HiveServer2

Key-Value Store Indexer Lily HBase Indexer

Cloudera Management Service Event Server

Spark History Server

server8 192.168.10.8 HBase RegionServer

HDFS DataNode

Impala Daemon

MapReduce TaskTracker

YARN (MR2 Included) NodeManager

server9 192.168.10.9 HBase RegionServer

HDFS DataNode

Impala Daemon

MapReduce TaskTracker

YARN (MR2 Included) NodeManager

server10 192.168.10.10 HBase RegionServer

HDFS DataNode

Impala Daemon

MapReduce TaskTracker

YARN (MR2 Included) NodeManager

server11 192.168.10.11 HBase RegionServer

HDFS DataNode

Impala Daemon

MapReduce TaskTracker

YARN (MR2 Included) NodeManager

server12 192.168.10.12 HBase RegionServer

HDFS DataNode

Impala Daemon

MapReduce TaskTracker

YARN (MR2 Included) NodeManager

以下操作请用root账户在所有服务器上执行相同操作。

1、关闭防火墙

/etc/init.d/iptables stop #关闭防火墙
chkconfig iptables off    #设置启动时关闭防火墙服务

2、关闭selinux

命令行执行:
setenforce 0

编辑配置文件以便重启后保持设置:

vi /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled

修改SELINUX=disabled,保存退出。

3、加快内存释放

执行命令:
sysctl vm.swappiness=0

编辑配置文件以便重启后保持设置:

vi /etc/sysctl.conf
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296

vm.swappiness = 0

增加vm.swappiness = 0,保存退出。

4、关闭redhat的内存hugepage

执行命令:
echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

编辑配置文件以便重启后保持设置:

vi /etc/rc.local
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.
echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

touch /var/lock/subsys/local

增加echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag,保存退出。

5、修改hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

# CDH本地镜像
192.168.10.1	archive.cloudera.com

# ClouderaManager
192.168.10.1 	server1.cdhwork.org

# Cloudera Management Service Host Monitor,Cloudera Management Service Service Monitor
192.168.10.2	server2.cdhwork.org

# HDFS NameNode,Hive Gateway,Impala Catalog Server,Cloudera Management Service Alert Publisher,Spark Gateway,ZooKeeper Server
192.168.10.3	server3.cdhwork.org

# HDFS SecondaryNameNode,Hive Gateway,Impala StateStore,Solr Server,Spark Gateway,YARN (MR2 Included) ResourceManager,ZooKeeper Server
192.168.10.4	server4.cdhwork.org

# HDFS Balancer,Hive Gateway,Hue Server,Cloudera Management Service Activity Monitor,Oozie Server,Spark Gateway,Sqoop 2 Server,ZooKeeper Server
192.168.10.5	server5.cdhwork.org

# HBase Master,Hive Gateway,MapReduce JobTracker,Solr Server,Spark Gateway,YARN (MR2 Included) JobHistory Server,ZooKeeper Server,Postgresql-9.2
192.168.10.6	server6.cdhwork.org

# HBase REST Server,HBase Thrift Server,Hive Metastore Server,HiveServer2,Key-Value Store Indexer Lily HBase Indexer,Cloudera Management Service Event Server,Spark History Server
192.168.10.7	server7.cdhwork.org

# HBase RegionServer,HDFS DataNode,Impala Daemon,MapReduce TaskTracker,YARN (MR2 Included) NodeManager
192.168.10.8	server8.cdhwork.org
192.168.10.9	server9.cdhwork.org
192.168.10.10	server10.cdhwork.org
192.168.10.11	server11.cdhwork.org
192.168.10.12	server12.cdhwork.org

6、配置yum源

cd /etc/yum.repos.d/
mv rhel-source.repo rhel-source.repo.bak
vi rhel-source.repo
[base]
name=CentOS-6.6 - Base
baseurl=http://mirrors.163.com/centos/6.6/os/x86_64/
gpgcheck=1
gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-6
exclude=postgresql*

#released updates
[updates]
name=CentOS-$releasever - Updates
baseurl=http://mirrors.163.com/centos/6.6/updates/x86_64/
gpgcheck=1
gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-6
exclude=postgresql*

#packages used/produced in the build but not released
#[addons]
#name=CentOS-$releasever - Addons
#baseurl=http://mirrors.163.com/centos/6.6/addons/x86_64/
#gpgcheck=1
#gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-6
#additional packages that may be useful
[extras]
name=CentOS-$releasever - Extras
baseurl=http://mirrors.163.com/centos/6.6/extras/x86_64/
gpgcheck=1
gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-6
#additional packages that extend functionality of existing packages
[centosplus]
name=CentOS-$releasever - Plus
baseurl=http://mirrors.163.com/centos/6.6/centosplus/x86_64/
gpgcheck=1
enabled=0

保存修改退出。修改yum源是为了让后面的安装速度更快些。当然,有个前提是你所有的服务器都可以访问外网,如果不能,要么安装一个代理服务器,代理访问外网;要么直接自己做一个yum源的镜像来提供服务。建议自建yum镜像,这样既方便又省事,唯一缺陷就是占一点磁盘空间。

自建镜像站点可以用wget -r <target>命令复制目标站点下所有内容,用httpd服务来一共web访问,如果你的复制站点在/usr/site,你可以直接在/var/www/html/下创建一个软连接:

ln -s /usr/site /var/www/html/site

这样,你就可以用过http://ip/site访问镜像站点了。说真的linux下有很多实用性很强的工具,wget,ln都是其中之一。

7、更新服务器环境到最新设置

yum update

这样做是为了让后面的Cloudera安装时尽量不出错,因为很多时候Cloudera会莫名其妙的报依赖的rpm资源包不存在,或者版本太旧啥的。才开始一直不知道如何解决,后来一狠心做了一次系统更新,竟然解决了!虽然不知道为啥会这样,但总算解决了问题不是?所以大家辛苦点更新一下吧,如果网络速度够快,花不了多长时间。

时间: 2024-08-10 15:10:19

CDH5.3集群安装笔记-环境准备(1)的相关文章

CDH5.3集群安装笔记-环境准备(3)

3.安装本地CDH镜像站点 去http://archive-primary.cloudera.com/cdh5/parcels/latest/上下载CDH-5.3.0-1.cdh5.3.0.p0.30-el6.parcel和manifest.json两个文件,到/home/cdh5/parcels/latest下: [[email protected] latest]# pwd /home/cdh5/parcels/latest [[email protected] latest]# ll to

CDH5.3集群安装笔记-环境准备(2)

2.安装时间服务器 在server1.cdhwork.org(192.168.10.1)安装ntp服务,确保服务器可以访问外网: yum -y install ntp chkconfig --add ntpd chkconfig ntpd on 编辑ntp配置文件: cp /etc/ntp.conf /etc/ntp.conf.bak vi /etc/ntp.conf 修改如下部分: # Permit time synchronization with our time source, but

redis集群安装

Redis集群安装 1         环境准备 1.1  系统环境 Centos6.5 2.6.32-431.el6.x86_64 1.2  依赖包 yum -y install wget tcpdump glibc libgcc gcc gcc-c++ ncurses-devel bison openssl openssl-devel xinetd glibc glibc-common gd gd-devel rsync puppet ntp perl cmake man tree lsof

ElasticSearch2.2 集群安装部署

一.ElasticSearch 集群安装部署 环境准备 ubuntu虚拟机2台 ip:192.168.1.104 192.168.1.106 jdk:最低要求1.7,本机jdk版本1.7_67 安装 a.安装jdk(这里不赘述) b.从官网下载ES版本 地址https://www.elastic.co/downloads/elasticsearch c.解压ES到本地 d.进入config目录下,用编辑器打开elasticsearch.yml文件 1.cluster.name: ppscore-

Hadoop集群安装-CDH5(5台服务器集群)

CDH5包下载:http://archive.cloudera.com/cdh5/ 架构设计: 主机规划: IP Host 部署模块 进程 192.168.254.151 Hadoop-NN-01 NameNode ResourceManager NameNode DFSZKFailoverController ResourceManager 192.168.254.152 Hadoop-NN-02 NameNode ResourceManager NameNode DFSZKFailoverC

CDH5 集群安装教程

一.虚拟机的安装和网络配置. 1.虚拟机安装. 2.安装CentOS-6.5 64位版本. 桥接模式: Master: 内存:3G: 硬盘容量40G: 4核: Slave: 内存2G: 硬盘容量30G: 2核: 3.网络配置(master,slave) 1)进入root账号 su - root 输入密码: vi /etc/sysconfig/network 2)关闭防火墙 vi /etc/sysconfig/selinux #SELinux=disable Service iptables st

Storm笔记整理(三):Storm集群安装部署与Topology作业提交

[TOC] Storm分布式集群安装部署 概述 Storm集群表面类似Hadoop集群.但在Hadoop上你运行的是"MapReduce jobs",在Storm上你运行的是"topologies"."Jobs"和"topologies"是大不同的,一个关键不同是一个MapReduce的Job最终会结束,而一个topology永远处理消息(或直到你kill它). Storm集群有两种节点:控制(master)节点和工作者(wo

ElasticSearch笔记整理(二):CURL操作、ES插件、集群安装与核心概念

[TOC] CURL操作 CURL简介 curl是利用URL语法在命令行方式下工作的开源文件传输工具,使用curl可以简单实现常见的get/post请求.简单的认为是可以在命令行下面访问url的一个工具.在centos的默认库里面是有curl工具的,如果没有请yum安装即可. curl -X 指定http的请求方法 有HEAD GET POST PUT DELETE -d 指定要传输的数据 -H 指定http请求头信息 curl创建索引库 curl -XPUT http://<ip>:9200

1.1 Storm集群安装部署步骤

安装storm集群,需要依赖以下组件: Zookeeper Python Zeromq Storm JDK JZMQ 故安装过程根据上面的组件分为以下几步: 安装JDK 安装Zookeeper集群 安装Python及依赖 安装Storm 另外,操作系统环境为:Centos6.4,安装用户为:root. 1. 安装JDK 安装jdk有很多方法,可以参考文博客使用yum安装CDH Hadoop集群中的jdk安装步骤,需要说明的是下面的zookeeper集群安装方法也可以参考此文. 不管你用什么方法,