【原创】大数据基础之集群搭建

Cluster Platform

redhat/centos7, docker, mesos, cloudera manager(cdh)

Checklist

1 check user & password & network reachability, make sure everything is fine to login all remote servers by ssh client
2 check linux release, upgrade or reinstall if necessary
3 check yum repo
4 check hardware: memory, cpu, disk & partition, network device
5 check network: lan & wlan environment
5 check data disk, make sure all data disks are mounted by fdisk/mke2fs/mount, /etc/fstab
6 check user & group, running processes, free memory, free disk, directory structure
6 check iptables, make sure only the necessary protocols and ports are allowed
7 check systemd, service

Preparation

1 login the first server by ssh client, it will be acted as deploy center
2 upload ‘deploy‘ directory(more than 15G: iso8G, cdh3G,registry2G) and ‘deploy.sh‘ to $HOME by sftp or other tools
3 use deploy.sh to generate deploy commands
4 deploy the cluster offline

Offline Deployment Details

1 make sure root user login without password by ssh client between all servers
2 create yum repo file using iso on local server
3 install ansible on local server
4 create & push /etc/hosts to all servers by ansible
5 install & start docker on local server
6 docker load registry image from local file
7 docker run registry on local server
8 docker run nginx on local server as ‘yum repo server‘, ‘docker registry server‘ and ‘cloudera manager proxy server‘
9 docker run mysql on local server temporarily
10 create mysql databases & users and flush privileges
11 create & push yum repo file to all servers by ansible
12 push deploy directory to all servers by ansible
13 install ansible on other servers
14 install and start docker on other servers
15 install kerberos on all servers
16 start kdc on local server
17 install jdk on all servers
18 install cloudera manager & cdh on all servers
19 start cloudera manager anget on all servers
20 start cloudera manager master on local server
21 deploy zookeeper by cloudera manager
22 install mesos and marathon on all servers
23 start mesos agent on all servers
24 start mesos master on other two servers
25 start marathon master on other two servers
26 deploy marathon-lb on marathon by api
27 docker stop mysql and redeploy mysql master&slave on marathon by api
28 deploy redis, elasticsearch, logstash, kibana, kafka, airflow on marathon by api
29 deploy hdfs, yarn, hive, hbase, oozie, kudu, impala, sentry, hue by cloudera manager
30 enable kerberos on cloudera manager
31 deploy all kinds of client
32 confirm all components working
33 initialize data structure in mysql, hive, impala
34 deploy business components like tomcat, logstash on marathon
35 confirm all business working

原文地址:https://www.cnblogs.com/barneywill/p/10327831.html

时间: 2024-08-30 02:33:34

【原创】大数据基础之集群搭建的相关文章

大数据:spark集群搭建

创建spark用户组,组ID1000 groupadd -g 1000 spark 在spark用户组下创建用户ID 2000的spark用户 获取视频中文档资料及完整视频的伙伴请加QQ群:947967114useradd -u 2000 -g spark spark 设置密码 passwd spark 修改sudo权限 chmod u+w /etc/sudoers vi /etc/sudoers 找到 root ALL=(ALL) ALL 添加 spark ALL=(ALL) ALL 创建一个

大数据中Linux集群搭建与配置

因测试需要,一共安装4台linux系统,在windows上用vm搭建. 对应4个IP为192.168.1.60.61.62.63,这里记录其中一台的搭建过程,其余的可以直接复制虚拟机,并修改相关配置即可. 软件版本选择: 虚拟机:VMware Workstation 12 Pro   版本:12.5.9 build-7535481 Linux:CentOS-7-x86_64-DVD-1804 FTP工具:FileZilla-3.37.4 安装CentOS虚拟机 首先安装虚拟机,成功后重启电脑 新

大数据-spark HA集群搭建

一.安装scala 我们安装的是scala-2.11.8 5台机器全部安装 下载需要的安装包并进行解压 配置环境变量 [root@master1 ~]# vi /etc/profile export SCALA_HOME=/opt/software/scala-2.11.8 export PATH=$SCALA_HOME/bin:$PATH [root@master1 ~]# source /etc/profile 启动scala [root@master1 workspace]# vim /e

大数据(hdfs集群及其集群的高级管理)

#### 大数据课程第二天 伪分布式hadoop的启动停止脚本[使用] sbin/hadoop-daemon.sh start namenode sbin/hadoop-daemon.sh start datanode sbin/yarn-daemon.sh start resourcemanager sbin/yarn-daemon.sh start nodemanager ? shell脚本 xxx.sh ls mkdir hadoop-start.sh sbin/hadoop-daemon

大数据高可用集群环境安装与配置(07)——安装HBase高可用集群

1. 下载安装包 登录官网获取HBase安装包下载地址 https://hbase.apache.org/downloads.html 2. 执行命令下载并安装 cd /usr/local/src/ wget http://mirrors.tuna.tsinghua.edu.cn/apache/hbase/2.1.8/hbase-2.1.8-bin.tar.gz tar -zxvf hbase-2.1.8-bin.tar.gz mv hbase-2.1.8 /usr/local/hbase/ 3

Laxcus大数据管理系统单机集群版

Laxcus大数据管理系统是我们Laxcus大数据实验室历时5年,全体系全功能设计研发的大数据产品,目前的最新版本是2.1版本.从三年前的1.0版本开始,Laxcus大数据系统投入到多个大数据和云计算项目中使用.2.0版本的Laxcus大数据管理系统,已经从紧耦合架构转为松耦合架构,整合了最新的大数据和关系数据库的技术,实现了一站式数据处理,大幅度提高了并行处理能力,同时兼具易操作.易维护.运行稳定的特点,节点数和数据存储计算规模已经达到百万台级和EB量级.目前已经覆盖的技术包括:行列混合存储.

云帆大数据学院Hadoop 集群 ——机器信息分布表

1.分布式环境搭建采用4 台安装Linux 环境的机器来构建一个小规模的分布式集群. 其中有一台机器是Master 节点,即名称节点,另外三台是Slaver 节点,即数据节点.这四台机器彼此间通过路由器相连,从而实验相互通信以及数据传输.它们都可以通过路由器访问Internet,实验网页文档的采集.2.集群机器详细信息2.1 Master 服务器名称详细信息机器名称Master.Hadoop机器IP 地址192.168.1.2最高用户名称(Name) root最用用户密码(PWD) hadoop

大数据高可用集群环境安装与配置(06)——安装Hadoop高可用集群

下载Hadoop安装包 登录 https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/ 镜像站,找到我们要安装的版本,点击进去复制下载链接 安装Hadoop时要注意版本与后续安装的HBase.Spark等相关组件的兼容,不要安装了不匹配的版本,而导致某些组件需要重装 输入命令进行安装操作 cd /usr/local/src/ wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/

大数据高可用集群环境安装与配置(03)——设置SSH免密登录

Hadoop的NameNode需要启动集群中所有机器的Hadoop守护进程,这个过程需要通过SSH登录来实现 Hadoop并没有提供SSH输入密码登录的形式,因此,为了能够顺利登录每台机器,需要将所有机器配置为NameNode可以免密登录 由于是双master,所以需要在master与master_backup服务器上都生成ssh密钥,都可以免密登录其他服务器 生成SSH公钥和私钥 在master与master_backup服务器输入命令 ssh-keygen -t rsa 按四次回车键,即可生