大数据-spark HA集群搭建

一、安装scala

我们安装的是scala-2.11.8 5台机器全部安装

下载需要的安装包并进行解压

配置环境变量

[root@master1 ~]# vi /etc/profile
export SCALA_HOME=/opt/software/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH
[root@master1 ~]# source /etc/profile

启动scala

[root@master1 workspace]# vim /etc/profile
[root@master1 workspace]# scala -version
-bash: /opt/workspace/scala-2.11.8/bin/scala: 权限不够
[root@master1 workspace]# chmod +x /opt/workspace/scala-2.11.8/bin/scala
[root@master1 workspace]# scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL
[root@master1 workspace]# scala
Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181).
Type in expressions for evaluation. Or try :help.

scala>

二、安装spark

1、下载spark对应版本

我们应用的是spark-2.3.0-bin-hadoop2-without-hive.tgz 自己编译的版本

可参考https://blog.csdn.net/sinat_25943197/article/details/81906060进行编译

2、文件解压

[root@master1 workspace]# tar -zxvf spark-2.3.0-bin-hadoop2-without-hive.tgz 

 3、配置文件 spark-env.sh  slaves、/etc/profile

/etc/profile文件中添加

# Spark Config
export SPARK_HOME=/opt/workspace/spark-2.3.0-bin-hadoop2-without-hiveexport PATH=.:${JAVA_HOME}/bin:${SCALA_HOME}/bin:${MAVEN_HOME}/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:${SPARK_HOME}/bin:$SQOOP_HOME/bin:${ZK_HOME}/bin:$PATH
source /etc/profile

spark-env.sh.template重新命名为spark-env.sh文件、配置如下:

#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# This file is sourced when running various Spark programs.
# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program

# Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
# - MESOS_NATIVE_JAVA_LIBRARY, to point to your libmesos.so if you use Mesos

# Options read in YARN client/cluster mode
# - SPARK_CONF_DIR, Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - YARN_CONF_DIR, to point Spark towards YARN configuration files when you use YARN
# - SPARK_EXECUTOR_CORES, Number of cores for the executors (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Executor (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G)
#export SPARK_MASTER_IP=master1
export SPARK_SSH_OPTS="-p 61333"
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_INSTANCES=1
export SCALA_HOME=/opt/workspace/scala-2.11.8
export JAVA_HOME=/opt/workspace/jdk1.8
export HADOOP_HOME=/opt/workspace/hadoop-2.9.1
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/opt/workspace/spark-2.3.0-bin-hadoop2-without-hive
export SPARK_CONF_DIR=$SPARK_HOME/conf
export SPARK_EXECUTOR_MEMORY=5120M
export SPARK_DIST_CLASSPATH=$(/opt/workspace/hadoop-2.9.1/bin/hadoop classpath)
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master1:2181,master2:2181,slave1:2181,slave2:2181,slave3:2181 -Dspark.deploy.zookeeper.dir=/spark"
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers

# Generic options for the daemons used in the standalone deploy mode
# - SPARK_CONF_DIR      Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_LOG_DIR       Where log files are stored.  (Default: ${SPARK_HOME}/logs)
# - SPARK_PID_DIR       Where the pid file is stored. (Default: /tmp)
# - SPARK_IDENT_STRING  A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS      The scheduling priority for daemons. (Default: 0)
# - SPARK_NO_DAEMONIZE  Run the proposed command in the foreground. It will not output a PID file.
# Options for native BLAS, like Intel MKL, OpenBLAS, and so on.
# You might get better performance to enable these options if using native BLAS (see SPARK-21305).
# - MKL_NUM_THREADS=1        Disable multi-threading of Intel MKL
# - OPENBLAS_NUM_THREADS=1   Disable multi-threading of OpenBLAS

slaves.template文件重新命名为slaves、配置如下

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# A Spark Worker will be started on each of the machines listed below.
slave1
slave2
slave3

4、启动spark

[root@master1 workspace]# ./spark-2.3.0-bin-hadoop2-without-hive/sbin/start-all.sh

报错:默认是22端口,进行ssh端口修改

在spark-env.sh中增加端口

export SPARK_SSH_OPTS="-p 61333"

重新启动spark

启动成功

手动启动备用master

[root@master2 workspace]# ./spark-2.3.0-bin-hadoop2-without-hive/sbin/start-master.sh

原文地址:https://www.cnblogs.com/learn-bigdata/p/10407145.html

时间: 2024-08-27 10:26:08

大数据-spark HA集群搭建的相关文章

大数据中Linux集群搭建与配置

因测试需要,一共安装4台linux系统,在windows上用vm搭建. 对应4个IP为192.168.1.60.61.62.63,这里记录其中一台的搭建过程,其余的可以直接复制虚拟机,并修改相关配置即可. 软件版本选择: 虚拟机:VMware Workstation 12 Pro   版本:12.5.9 build-7535481 Linux:CentOS-7-x86_64-DVD-1804 FTP工具:FileZilla-3.37.4 安装CentOS虚拟机 首先安装虚拟机,成功后重启电脑 新

【原创】大数据基础之集群搭建

Cluster Platform redhat/centos7, docker, mesos, cloudera manager(cdh) Checklist 1 check user & password & network reachability, make sure everything is fine to login all remote servers by ssh client2 check linux release, upgrade or reinstall if ne

Hadoop-2.8.5的HA集群搭建

一.Hadoop HA 机制的学习 1.1.Hadoop 2.X 的架构图 2.x版本中,HDFS架构解决了单点故障问题,即引入双NameNode架构,同时借助共享存储系统来进行元数据的同步,共享存储系统类型一般有几类,如:Shared NAS+NFS.BookKeeper.BackupNode 和 Quorum Journal Manager(QJM),上图中用的是QJM作为共享存储组件,通过搭建奇数结点的JournalNode实现主备NameNode元数据操作信息同步. 1.2.QJM原理

大数据(hdfs集群及其集群的高级管理)

#### 大数据课程第二天 伪分布式hadoop的启动停止脚本[使用] sbin/hadoop-daemon.sh start namenode sbin/hadoop-daemon.sh start datanode sbin/yarn-daemon.sh start resourcemanager sbin/yarn-daemon.sh start nodemanager ? shell脚本 xxx.sh ls mkdir hadoop-start.sh sbin/hadoop-daemon

大数据高可用集群环境安装与配置(06)——安装Hadoop高可用集群

下载Hadoop安装包 登录 https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/ 镜像站,找到我们要安装的版本,点击进去复制下载链接 安装Hadoop时要注意版本与后续安装的HBase.Spark等相关组件的兼容,不要安装了不匹配的版本,而导致某些组件需要重装 输入命令进行安装操作 cd /usr/local/src/ wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/

hadoop 的HA集群搭建

1.关闭防火墙 1.1 查看防火墙状态 service iptables status 1.2 关闭防火墙 service iptables off 1.3 关闭防火墙开机启动 chkconfig iptables off 2.关闭selinux vi /etc/selinux/config 将 SELINUX=enforcing 改为 SELINUX=disabled 3.ssh免密登陆 ssh-keygen -t rsa ssh-copy-id hostname 4.解压安装hadoop j

大数据高可用集群环境安装与配置(09)——安装Spark高可用集群

1. 获取spark下载链接 登录官网:http://spark.apache.org/downloads.html 选择要下载的版本 2. 执行命令下载并安装 cd /usr/local/src/ wget http://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz mv spark-2.4.4

hadoop2.8 ha 集群搭建

简介: 最近在看hadoop的一些知识,下面搭建一个ha (高可用)的hadoop完整分布式集群: 搭建步骤: 1>  关闭防火墙,禁止设置开机启动: (1) //临时关闭 systemctl stop firewalld (2) //禁止开机启动 systemctl disable firewalld   注意:centos7防火墙默认是:firewalld centos6 的命令是: //临时关闭 service iptables stop //禁止开机启动 chkconfig iptabl

Laxcus大数据管理系统单机集群版

Laxcus大数据管理系统是我们Laxcus大数据实验室历时5年,全体系全功能设计研发的大数据产品,目前的最新版本是2.1版本.从三年前的1.0版本开始,Laxcus大数据系统投入到多个大数据和云计算项目中使用.2.0版本的Laxcus大数据管理系统,已经从紧耦合架构转为松耦合架构,整合了最新的大数据和关系数据库的技术,实现了一站式数据处理,大幅度提高了并行处理能力,同时兼具易操作.易维护.运行稳定的特点,节点数和数据存储计算规模已经达到百万台级和EB量级.目前已经覆盖的技术包括:行列混合存储.