============= 基础环境准备 ==========
1. 节点规划:
集群环境为3节点
主节点:dc1 --- 172.16.100.165
从节点:dc2 --- 172.16.100.166
从节点:dc3 --- 172.16.100.167
2. 主机名修改 为 dc1/dc2/dc3 (仅修改HOSTNAME= 这行)
dc1上:
vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=dc1.com
dc2上:
vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=dc2.com
dc3上:
vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=dc3.com
-- vim /etc/hosts
172.16.100.165 dc1 dc1.com
172.16.100.166 dc2 dc2.com
176.16.100.167 dc3 dc3.com
3. 重启liunx生效 (以上修改完后 3个节点机器均重启)
#shutdown -r now
4. 目录创建 (dc1/dc2/dc3要都要创建)(因当前环境存储空间所限,目录建在了/home下,实际生产上应该规划好软件目录和数据目录)
--datacell和impala的存放目录
mkdir -p /home/geedata
-- 建立datacell的数据存放目录 (有多个块磁盘,就应该建多少个目录,每块盘mount到对应的目录上,以实现数据均衡冗余)
mkdir /home/geedata/data # datacell的数据及元数据的存放目录
-- datacell的数据存放目录
mkdir -p /home/geedata/data/sdb #对应DataCell.xml的 <Id>yy_webmedia_detail_0_volume</Id>
mkdir -p /home/geedata/data/sdc #对应DataCell.xml的 <Id>yy_webmedia_detail_1_volume</Id>
mkdir -p /home/geedata/data/sdd #对应DataCell.xml的 <Id>yy_webmedia_detail_2_volume</Id>
mkdir -p /home/geedata/data/sde #对应DataCell.xml的 <Id>yy_webmedia_detail_3_volume</Id>
mkdir -p /home/geedata/data/sdf #对应DataCell.xml的 <Id>yy_webmedia_detail_4_volume</Id>
-- 建立datacell元数据目录
mkdir -p /home/geedata/data/meta4dc
5. ssh互信配置 (要关闭selinux,否则会把 .ssh目录权限自动修改,还要关闭iptables)
[[email protected] ~]#
-- dc1/dc2/dc3 3个节点都执行
mkdir ~/.ssh
chmod 755 ~/.ssh ## 700会有问题,必须755权限
cd ~/.ssh
ssh-keygen -t rsa ##会有3次提示 (3次提示全部直接回车)
ssh-keygen -t dsa ##会有3次提示 (3次提示全部直接回车)
--在dc1上执行
--把dc1的公钥输入到公共文件authorized_keys中
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
--把dc2的公钥输入到公共文件authorized_keys中
ssh dc2 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ##输入当前用户(root)密码 www.geedata.com
ssh dc2 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys ##输入当前用户(root)密码 www.geedata.com
--把dc3的公钥输入到公共文件authorized_keys中
ssh dc3 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ##输入当前用户(root)密码 www.geedata.com
ssh dc3 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys ##输入当前用户(root)密码 www.geedata.com
--把输入好的公共文件authorized_keys复制到dc2、dc3中
scp ~/.ssh/authorized_keys dc2:~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys dc3:~/.ssh/authorized_keys
-- 验证 在各节点上执行如下,如能正常显示时间(date),说明ssh互信正常(第1次有的会提示输入口令,记录到known_hosts文件里,之后便不会提示 )
ssh dc1 date
ssh dc2 date
ssh dc3 date
PS:
1) 环境搭建分2个大环节:datacell搭建和impala搭建
2)问题一例 (注意ip和hosts文件等要配置正确)
[[email protected] ~]# ssh dc3
ssh: connect to host dc3 port 22: Connection refused
[[email protected] ~]# ping dc3 (延迟高,因为ip把172错写成了176了,这样连接的ip是外网了)
PING dc3 (176.16.100.167) 56(84) bytes of data.
64 bytes from dc3 (176.16.100.167): icmp_seq=1 ttl=39 time=3927 ms
64 bytes from dc3 (176.16.100.167): icmp_seq=3 ttl=39 time=2093 ms
============== datacell集群安装部署 ============
---------------------- 拷贝"安装包" ---------------------
-- 安装方式:这里"安装包"指拷贝一个安装好的环境下的目录文件,了解到还没有提供正式的安装光盘包
-- 拷贝"安装包" (开发环境下的impala目录、datacell目录文件scp到对应的机器上) (注意机器之间在对应)
dc1上:
mkdir -p /home/geedata
scp -r 172.16.100.146:/application/yoyosys/datacell /home/geedata/
scp -r 172.16.100.146:/application/yoyosys/impala /home/geedata/
dc2上:
mkdir -p /home/geedata
scp -r 172.16.100.147:/application/yoyosys/datacell /home/geedata/
scp -r 172.16.100.147:/application/yoyosys/impala /home/geedata/
dc3上:
mkdir -p /home/geedata
scp -r 172.16.100.148:/application/yoyosys/datacell /home/geedata/
scp -r 172.16.100.148:/application/yoyosys/impala /home/geedata/
--------------------- 修改datacell配置文件 ----------------------
0. DataCell.xml
配置文件/home/geedata/datacell/conf/DataCell.xml
如果 datacell集群有3个节点,需要修改DataCell.xml中如下为3,如果是4个节点就改成4. 否则无法正常启停,会认为节点数不完整。
<Bootstrap-Pending-Threshold>3</Bootstrap-Pending-Threshold>
<Bootstrap-Start-Threshold>3</Bootstrap-Start-Threshold>
1. DataCell.xml
dc1上修改后,scp到dc2、dc3 (各节点该步骤信息都是一样的)
找到以下内容修改为:
原来:<Root>/application/data/meta4dc</Root>
改为:<Root>/home/geedata/data/meta4dc</Root>
原来:<Name>172.16.100.146</Name>
改为:<Name>172.16.100.165</Name>
当前,datacell里有3张表,这3张表的对应区域都要修改,即修改3处,如下
原(yy_webmedia_detail):
<!-- Volume use to generate from console-->
<Volumes>
<Volume>yy_webmedia_detail_0_volume</Volume>
<Volume>yy_webmedia_detail_1_volume</Volume>
<Volume>yy_webmedia_detail_2_volume</Volume>
<Volume>yy_webmedia_detail_3_volume</Volume>
<Volume>yy_webmedia_detail_4_volume</Volume>
</Volumes>
<!-- Structured conf-->
<Enable-Structured-Storage>true</Enable-Structured-Storage>
<Structured-Storage>
<Schema>
<Id>1</Id>
<Name>yy_webmedia_detail</Name>
改(yy_webmedia_detail):
<!-- Volume use to generate from console-->
<Volumes>
<Volume>yy_webmedia_detail_0_volume</Volume>
</Volumes>
<!-- Structured conf-->
<Enable-Structured-Storage>true</Enable-Structured-Storage>
<Structured-Storage>
<Schema>
<Id>1</Id>
<Name>yy_webmedia_detail</Name>
原(yy_news):
<!-- Volume use to generate from console-->
<Volumes>
<Volume>yy_webmedia_detail_0_volume</Volume>
</Volumes>
<!-- Structured conf-->
<Enable-Structured-Storage>true</Enable-Structured-Storage>
<Structured-Storage>
<Schema>
<Id>11</Id>
<Name>yy_news</Name>
改(yy_news):
<!-- Volume use to generate from console-->
<Volumes>
<Volume>yy_webmedia_detail_0_volume</Volume>
<Volume>yy_webmedia_detail_1_volume</Volume>
<Volume>yy_webmedia_detail_2_volume</Volume>
<Volume>yy_webmedia_detail_3_volume</Volume>
<Volume>yy_webmedia_detail_4_volume</Volume>
</Volumes>
<!-- Structured conf-->
<Enable-Structured-Storage>true</Enable-Structured-Storage>
<Structured-Storage>
<Schema>
<Id>11</Id>
<Name>yy_news</Name>
原(yy_user_news):
<!-- Volume use to generate from console-->
<Volumes>
<Volume>yy_webmedia_detail_0_volume</Volume>
</Volumes>
<!-- Structured conf-->
<Enable-Structured-Storage>true</Enable-Structured-Storage>
<Structured-Storage>
<Schema>
<Id>11</Id>
<Name>yy_user_news</Name>
改(yy_user_news):
<!-- Volume use to generate from console-->
<Volumes>
<Volume>yy_webmedia_detail_0_volume</Volume>
<Volume>yy_webmedia_detail_1_volume</Volume>
<Volume>yy_webmedia_detail_2_volume</Volume>
<Volume>yy_webmedia_detail_3_volume</Volume>
<Volume>yy_webmedia_detail_4_volume</Volume>
</Volumes>
<!-- Structured conf-->
<Enable-Structured-Storage>true</Enable-Structured-Storage>
<Structured-Storage>
<Schema>
<Id>11</Id>
<Name>yy_user_news</Name>
-- 修改存储位置
原来:
</Storages>
<Volumes>
<Volume>
<Id>yy_webmedia_detail_0_volume</Id>
<Path>/application/data/data4dc</Path>
<Type>HD</Type>
<Num-Dispatchers>1</Num-Dispatchers>
</Volume>
</Volumes>
</DataCell>
</Configuration>
改为:
</Storages>
<Volumes>
<Volume>
<Id>yy_webmedia_detail_0_volume</Id>
<Path>/home/geedata/data/sdb</Path>
<Type>HD</Type>
<Num-Dispatchers>1</Num-Dispatchers>
</Volume>
<Volume>
<Id>yy_webmedia_detail_1_volume</Id>
<Path>/home/geedata/data/sdc</Path>
<Type>HD</Type>
<Num-Dispatchers>1</Num-Dispatchers>
</Volume>
<Volume>
<Id>yy_webmedia_detail_2_volume</Id>
<Path>/home/geedata/data/sdd</Path>
<Type>HD</Type>
<Num-Dispatchers>1</Num-Dispatchers>
</Volume>
<Volume>
<Id>yy_webmedia_detail_3_volume</Id>
<Path>/home/geedata/data/sde</Path>
<Type>HD</Type>
<Num-Dispatchers>1</Num-Dispatchers>
</Volume>
<Volume>
<Id>yy_webmedia_detail_4_volume</Id>
<Path>/home/geedata/data/sdf</Path>
<Type>HD</Type>
<Num-Dispatchers>1</Num-Dispatchers>
</Volume>
</Volumes>
</DataCell>
</Configuration>
2. dc2/dc3上:修改DataCell.xml:(scp dc1的后,还要修改下ip如下)
[[email protected] conf]# scp /home/geedata/datacell/conf/DataCell.xml 172.16.100.166:/home/geedata/datacell/conf/
[[email protected] conf]# scp /home/geedata/datacell/conf/DataCell.xml 172.16.100.167:/home/geedata/datacell/conf/
还需要修改如下:
dc2: DataCell.xml修改ip为dc2的ip
原:<Name>172.16.100.165</Name>
改:<Name>172.16.100.166</Name>
dc3:DataCell.xml修改ip为dc3的ip
原:<Name>172.16.100.165</Name>
改:<Name>172.16.100.167</Name>
3. agent.xml 主要是修改ip地址
路径:/home/geedata/datacell/conf/agent.xml
dc1上:
原: <Network-Interface>172.16.100.146</Network-Interface>
改为:<Network-Interface>172.16.100.165</Network-Interface>
dc2上:
原: <Network-Interface>172.16.100.147</Network-Interface>
改为:<Network-Interface>172.16.100.166</Network-Interface>
dc3上:
原: <Network-Interface>172.16.100.148</Network-Interface>
改为:<Network-Interface>172.16.100.167</Network-Interface>
#### ps: 还有一项 <Agent-Address>172.16.246.131</Agent-Address> 这个无需关心,是一个跨网段的代理,我们当前环境不需要
4. bf_setenv.sh 修改
dc1上:
原: export BITSFLOW_HOME=/application/yoyosys/datacell
改: export BITSFLOW_HOME=/home/geedata/datacell
dc2/dc3上:(scp dc1的)
[[email protected] ~]# scp /home/geedata/datacell/bf_setenv.sh 172.16.100.166:/home/geedata/datacell/
[[email protected] ~]# scp /home/geedata/datacell/bf_setenv.sh 172.16.100.167:/home/geedata/datacell/
5. agent_start.sh 修改
dc1上:
原: BITSFLOW_HOME=/application/yoyosys/datacell
改: BITSFLOW_HOME=/home/geedata/datacell
dc2/dc3上:(scp dc1的)
[[email protected] ~]# scp /home/geedata/datacell/agent_start.sh 172.16.100.166:/home/geedata/datacell/
[[email protected] ~]# scp /home/geedata/datacell/agent_start.sh 172.16.100.167:/home/geedata/datacell/
6. agent_stop.sh 修改
dc1上:
原: BITSFLOW_HOME=/application/yoyosys/datacell
改: BITSFLOW_HOME=/home/geedata/datacell
dc2/dc3上:(scp dc1的)
[[email protected] ~]# scp /home/geedata/datacell/agent_stop.sh 172.16.100.166:/home/geedata/datacell/
[[email protected] ~]# scp /home/geedata/datacell/agent_stop.sh 172.16.100.167:/home/geedata/datacell/
7. agent_status.sh 修改
dc1上:
原:BITSFLOW_HOME=/application/yoyosys/datacell
改:BITSFLOW_HOME=/home/geedata/datacell
dc2/dc3上:(scp dc1的)
[[email protected] ~]# scp /home/geedata/datacell/agent_status.sh 172.16.100.166:/home/geedata/datacell/
[[email protected] ~]# scp /home/geedata/datacell/agent_status.sh 172.16.100.167:/home/geedata/datacell/
8. rm -rf /home/geedata/datacell/db/groupd*.db (如果想留数据的话不要删除,不想留数据的话可以删除)
9. datacell验证
注意事项:如果该套datacell集群与另一套datacell集群在同一个局域网段,group启动时,会启动不了,报group节点已存在(在同网段确有一台开发环境下的group节点).
解决方法:修改端口号,把原来的31060改为其他端口如31068)
需要修改的文件有: (vim全局替换即可 :%s/31060/31068/g)
主节点的DataCell.xml/agent.xml/groupd.xml 3个配置文件
从节点的DataCell.xml/agent.xml 2个配置文件
--启动datacell (顺序如下 是各节点上并行执行)
cd /home/geedata/datacell
1) dc1/dc2/dc3上都执行如下命令--启动agent服务
./agent_start.sh
2) 仅主节点dc1上启动group服务
./start_groupd.sh
3) dc1/dc2/dc3上都执行如下命令--启动datacell服务
./start_datacell.sh
--查看datacell进程
dc1上:
[[email protected] datacell]# cd /home/geedata/datacell
[[email protected] datacell]# ps -ef |grep `pwd`|grep -v grep # dc1这个主节点上有3个进程,比从节点多一个 group进程
root 28712 1 0 19:44 ? 00:00:00 /home/geedata/datacell/bin/agent -cfgfile /home/geedata/datacell/conf/agent.xml -logfile /home/geedata/datacell/logs/agent.log -pidfile /home/geedata/datacell/run/agent.pid -daemon -licfile /home/geedata/datacell/creds/yoyo.lic -tokfile /home/geedata/datacell/creds/agent.tok
root 28733 1 0 19:44 ? 00:00:00 /home/geedata/datacell/bin/groupd -cfgfile /home/geedata/datacell/conf/groupd.xml -logfile /home/geedata/datacell/logs/groupd.log -pidfile /home/geedata/datacell/run/groupd.pid -daemon -dbdir /home/geedata/datacell/db
root 28745 1 0 19:44 ? 00:00:00 /home/geedata/datacell/bin/DataCell -cfgfile /home/geedata/datacell/conf/DataCell.xml -logfile /home/geedata/datacell/logs/DataCell.log -pidfile /home/geedata/datacell/run/DataCell.pid -daemon
dc2上:
[[email protected] datacell]# cd /home/geedata/datacell
[[email protected] datacell]# ps -ef |grep `pwd`|grep -v grep #dc2这个从节点上有2个进程
root 26109 1 0 19:41 ? 00:00:00 /home/geedata/datacell/bin/agent -cfgfile /home/geedata/datacell/conf/agent.xml -logfile /home/geedata/datacell/logs/agent.log -pidfile /home/geedata/datacell/run/agent.pid -daemon -licfile /home/geedata/datacell/creds/yoyo.lic -tokfile /home/geedata/datacell/creds/agent.tok
root 26176 1 0 19:42 ? 00:00:00 /home/geedata/datacell/bin/DataCell -cfgfile /home/geedata/datacell/conf/DataCell.xml -logfile /home/geedata/datacell/logs/DataCell.log -pidfile /home/geedata/datacell/run/DataCell.pid -daemon
dc3上:
[[email protected] ~]# cd /home/geedata/datacell
[[email protected] datacell]# ps -ef |grep `pwd`|grep -v grep # dc3这个从节点上有2个进程
root 22047 1 0 19:36 ? 00:00:00 /home/geedata/datacell/bin/agent -cfgfile /home/geedata/datacell/conf/agent.xml -logfile /home/geedata/datacell/logs/agent.log -pidfile /home/geedata/datacell/run/agent.pid -daemon -licfile /home/geedata/datacell/creds/yoyo.lic -tokfile /home/geedata/datacell/creds/agent.tok
root 22106 1 0 19:37 ? 00:00:00 /home/geedata/datacell/bin/DataCell -cfgfile /home/geedata/datacell/conf/DataCell.xml -logfile /home/geedata/datacell/logs/DataCell.log -pidfile /home/geedata/datacell/run/DataCell.pid -daemon
--停止datacell
cd /home/geedata/datacell
1) dc1/dc2/dc3上都执行如下命令--停止datacell服务
./stop_datacell.sh
2) dc1上停止group服务
./stop_groupd.sh
3) dc1/dc2/dc3上都执行如下命令--停止agent服务
./agent_stop.sh
-- 验证 登录datacell
# DataCellShell -agent localhost:31060 -service 1121 #(注意这个的端口号要与配置文件的一致,如果局域网内有2个group时,端口号是在修改的,默认是31060)
[[email protected] datacell]#. bf_setenv.sh
[[email protected] datacell]# DataCellShell -agent localhost:31068 -service 1121 #这里的端口是已修改了的
Open MX Channel
Copyright (c) 2007-2013 Yoyo Systems. All rights reserved.
Welcome to the DataCell shell. Commands end with ";"
agent : localhost:31068
service : 1121
login : 2015-10-27 11:12:19.262550
Type ‘help;‘ for help. Type ‘clear;‘ to clear the buffer.
DataCellShell> show storages;
Supported features: FILESYSTEM, OBJECT, STRUCTURED, TIMESERIES
------------------------------------------------------------------------------------
NO STORAGE
------------------------------------------------------------------------------------
1 yy_webmedia_detail (STRUCTURED | TIMESERIES)
2 yy_news (STRUCTURED | TIMESERIES)
3 yy_user_news (STRUCTURED | TIMESERIES)
------------------------------------------------------------------------------------
Done at 2015-10-27 11:12:41.847477, took time: 2300 microseconds
DataCellShell>
DataCellShell> use yy_webmedia_detail as STRUCTURED;
The storage "yy_webmedia_detail" is ready for you to enter commands.
Type ‘close;‘ to closed this storage. Type ‘quit;‘, ‘bye;‘ or ‘exit;‘ to back to terminal.
Done at 2015-10-27 11:13:58.880032, took time: 41889 microseconds
yy_webmedia_detail>
yy_webmedia_detail> select count(*) from yy_webmedia_detail;
Count result: 0
Done at 2015-10-27 11:14:33.844168, took time: 6235 microseconds
yy_webmedia_detail>
=============== impala集群安装部署 ==============
1. hadoop配置文件 core-site.xml 修改hadoop的临时文件的目录
位置:/home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/core-site.xml
原: <value>/application/yoyosys/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/tmp</value>
改: <value>/home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/tmp</value>
原: <value>/application/yoyosys/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/hdfs/socket._PORT</value>
改: <value>/home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/hdfs/socket._PORT</value>
原: <value>hdfs://dc146.yoyosys.com:20500</value>
改: <value>hdfs://dc1.com:20500</value>
2. setenv.sh修改 (3节点保持一样)
dc1上:
原:[ -z "$IMPALA_HOME" ] && IMPALA_HOME=/application/yoyosys/impala
改:[ -z "$IMPALA_HOME" ] && IMPALA_HOME=/home/geedata/impala
原:export DATACELL_HOME=/application/yoyosys/datacell
改:export DATACELL_HOME=/home/geedata/datacell
原:export MYSQL_SERVER=dc146.yoyosys.com
改:export MYSQL_SERVER=dc1.com
原:export SERVER_HOST_NAME=dc146.yoyosys.com
改:export SERVER_HOST_NAME=dc1.com
注意: 如上 mysql_server需要修改成为本地的主机名 而不是mysql数据库的主机名
3. dc2/dc3上:scp dc1的setenv.sh
[[email protected] impala]# scp /home/geedata/impala/setenv.sh 172.16.100.166:/home/geedata/impala/
[[email protected] impala]# scp /home/geedata/impala/setenv.sh 172.16.100.167:/home/geedata/impala/
4. 修改mysql数据库需要修改odbc.ini (3节点保持一样)
路径:/home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbc.ini
cp -a /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbc.ini /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbc.ini.bak
清空这个文件,填入以下内容:
dc1上:vim odbc.ini
修改帐号、mysql数据库的ip 如下 以gee_business为例,有几个mysql就修改几个
[gee_business]
Description = MySQL connection to ‘***‘ database
Driver=MYSQL-DRIVER
Database=gee_business
Server=172.16.100.31 ---------------- mysql的IP
User=admin ---------------- mysql的帐号(注意是User 不能是UserName,不过在有的环境下是UserName也正常,所以,如果连接不上mysql可调试这里)
Password=admin ---------------- mysql的密码
Port=
Socket=/var/lib/mysql/mysql.sock ----------- 按mysql实际路径
charset = UTF8
[gee_operate]
Description = MySQL connection to ‘***‘ database
Driver=MYSQL-DRIVER
Database=gee_operate
User=admin
Password=admin
Server=172.16.100.31
Port=
Socket=/var/lib/mysql/mysql.sock
charset = UTF8
[gee_person]
Description = MySQL connection to ‘***‘ database
Driver=MYSQL-DRIVER
Database=gee_person
User=admin
Password=admin
Server=172.16.100.31
Port=
Socket=/var/lib/mysql/mysql.sock
charset = UTF8
[gee_crawler]
Description = MySQL connection to ‘***‘ database
Driver=MYSQL-DRIVER
Database=gee_crawler
User=admin
Password=admin
Server=172.16.100.178
Port=
Socket=/var/lib/mysql/mysql.sock
5. dc2/dc3上: scp dc1的odbc.ini
[[email protected] ~]# scp /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbc.ini 172.16.100.166:/home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/
[[email protected] ~]# scp /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbc.ini 172.16.100.167:/home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/
6. odbcinst.ini (将原/application/yoyosys/改为/home/geedata/) 如下(有几处改几处)
路径:/home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbcinst.ini
原:Driver=/home/geedata/impala/thirdparty/unixodbc-2.3.2/drivers/mysql/x64/5.3.2/lib/libmyodbc5w.so
改:Driver=/home/geedata/impala/thirdparty/unixodbc-2.3.2/drivers/mysql/x64/5.3.2/lib/libmyodbc5w.so
7. dc2/dc3上: scp dc1的odbcinst.ini
scp /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbcinst.ini 172.16.100.166:/home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/
scp /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbcinst.ini 172.16.100.167:/home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/
####### 以下几个步骤参见readme文档 路径:/home/geedata/impala/README #######
readme文档中的如下3步忽略不做,如下
-- 1:mkdir /opt/yoyosys/impala ## 跳过不做
-- 2:tar xzf impala-2.5.0-cdh5.2.0.tar.gz -C /opt/yoyosys/impala ## 跳过不做
-- 5:拷贝至集群内所有节点 ## 跳过不做 (是指scp impala目录到其他节点)
8. impalactl.sh init
dc1上:
[[email protected] impala]# . setenv.sh
[[email protected] impala]# impalactl.sh init (仅执行这一次) ##(该步骤仅在主节点dc1上执行)
mkdir: cannot create directory `/home/geedata/impala/logs‘: File exists
mkdir: cannot create directory `/home/geedata/impala/thirdparty/hive-1.1.0-cdh5.4.2/logs‘: File exists
mkdir: cannot create directory `/home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/tmp‘: File exists
mkdir: cannot create directory `/home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/hdfs‘: File exists
sed -i "s/_hadoophome_/\/home\/geedata\/impala\/thirdparty\/hadoop-2.6.0-cdh5.4.2/g" /home/geedata/impala/conf/core-site.xml
sed -i "s/_hadoophome_/\\/home\\/geedata\\/impala\\/thirdparty\\/hadoop-2.6.0-cdh5.4.2/g" /home/geedata/impala/conf/hdfs-site.xml
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/conf/core-site.xml
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/conf/hive-site.xml
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/conf/yarn-site.xml
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/conf/mapred-site.xml
sed -i "s/_hadoophome_/\/home\/geedata\/impala\/thirdparty\/hadoop-2.6.0-cdh5.4.2/g" /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/core-site.xml
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/core-site.xml
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/masters
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/slaves
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/yarn-site.xml
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/mapred-site.xml
sed -i "s/_javahome_/\/home\/geedata\/impala\/thirdparty\/jdk1.8.0_25-x64/g" /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/hadoop-env.sh
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/thirdparty/hive-1.1.0-cdh5.4.2/conf/hive-site.xml
sed -i "s/_hivehome_/\/home\/geedata\/impala\/thirdparty\/hive-1.1.0-cdh5.4.2/g" /home/geedata/impala/thirdparty/hive-1.1.0-cdh5.4.2/conf/hive-log4j.properties
sed -i "s/_unixodbchome_/\/home\/geedata\/impala\/thirdparty\/unixodbc-2.3.2/g" /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbcinst.ini
sed -i "s/_mysqlserver_/dc1.com/g" /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbc.ini
/home ....
Change is finished!
Suceed init impala.
[[email protected] impala]#
dc2/dc3上:
[[email protected] impala]# . setenv.sh
[[email protected] impala]# impalactl.sh init
[[email protected] impala]# . setenv.sh
[[email protected] impala]# impalactl.sh init
9. impalactl.sh format (仅执行这一次) ##(该步骤仅在主节点dc1上执行)
dc1上:
[[email protected] impala]# impalactl.sh format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
15/10/26 17:11:33 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
...........
...........
SHUTDOWN_MSG: Shutting down NameNode at dc1/172.16.100.165
************************************************************/
Suceed in format namenode.
dc2/dc3上:
[[email protected] impala]# impalactl.sh format
Suceed in format namenode.
[[email protected] impala]# impalactl.sh format
Suceed in format namenode.
---------------------- 开始启动了 ----------------------
10. 启动hadoop:
-- dc1上: (仅dc1这个主节点有hadoop服务,其他节点无需启动)
[[email protected] impala]# ./impalactl.sh start hadoop
begin to start hadoop
Starting namenodes on [dc1.com]
The authenticity of host ‘dc1.com (172.16.100.165)‘ can‘t be established.
RSA key fingerprint is e3:19:42:cc:8a:5a:b2:4f:90:e4:f8:c1:f1:19:cc:64.
Are you sure you want to continue connecting (yes/no)? yes
dc1.com: Warning: Permanently added ‘dc1.com‘ (RSA) to the list of known hosts.
dc1.com: starting namenode, logging to /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/logs/hadoop-root-namenode-dc1.com.out
dc1.com: starting datanode, logging to /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/logs/hadoop-root-datanode-dc1.com.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host ‘0.0.0.0 (0.0.0.0)‘ can‘t be established.
RSA key fingerprint is e3:19:42:cc:8a:5a:b2:4f:90:e4:f8:c1:f1:19:cc:64.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added ‘0.0.0.0‘ (RSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/logs/hadoop-root-secondarynamenode-dc1.com.out
Suceed in starting hadoop.
验证:
[[email protected] impala]# ps -ef |grep `pwd`|grep hadoop
11. 启动hive:
-- dc1上:(仅dc1这个主节点有hadoop服务,其他节点无需启动)
[[email protected] impala]# ./impalactl.sh start hive -d
begin to start hive yes
/home/geedata/impala/thirdparty/hive-1.1.0-cdh5.4.2/bin/hive --service metastore 1>/dev/null 2>&1 &
hive pid:19796, status:0
Suceed in starting hive.
验证:
[[email protected] impala]# ps -ef |grep `pwd`|grep hive
登录hive:
[[email protected] impala]# cd /home/geedata/impala
[[email protected] impala]# . setenv.sh
[[email protected] impala]# ./thirdparty/hive-1.1.0-cdh5.4.2/bin/hive shell
Logging initialized using configuration in file:/home/geedata/impala/thirdparty/hive-1.1.0-cdh5.4.2/conf/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> show databases;
OK
dc
default
gee_business
gee_operate
gee_person
Time taken: 0.669 seconds, Fetched: 5 row(s)
hive>
12. 启动impala: (各节点均要启动,如下)
-- dc1上:
[[email protected] impala]# ./impalactl.sh start all -d
begin to start statestored yes
/home/geedata/impala/be/build/release/statestore/statestored -log_dir=/home/geedata/impala/logs -state_store_host=dc1.com -state_store_port=24000 -state_store_subscriber_port=23000 1>/dev/null 2>&1 &
statestored pid:20121, status:0
begin to start catalogd yes
/home/geedata/impala/bin/start-catalogd.sh -log_dir=/home/geedata/impala/logs -use_statestore -state_store_port=24000 -state_store_host=dc1.com -catalog_service_host=dc1.com -catalog_service_port=26000 -state_store_subscriber_port=23020 1>/dev/null 2>&1 &
catalogd pid:20148, status:0
begin to start impalad yes
/home/geedata/impala/bin/start-impalad.sh -log_dir=/home/geedata/impala/logs -use_statestore -state_store_port=24000 -state_store_host=dc1.com -state_store_subscriber_port=23030 -catalog_service_host=dc1.com -catalog_service_port=26000 -be_port=22001 -beeswax_port=21001 -webserver_port=25001 -hs2_port=21051 -default_pool_max_queued=1 -default_pool_max_requests=1 -default_query_options=‘exec_single_node_rows_threshold=0‘ -max_result_cache_size=21474836480 -mem_limit=6G -fe_service_threads=8 -be_service_threads=16 1>/dev/null 2>&1 &
impalad pid:20217, status:0
Suceed in starting all.
[[email protected] impala]#
-- dc2/dc3上:
[[email protected] impala]# ./impalactl.sh -d start impala
[[email protected] impala]# ./impalactl.sh -d start impala
查看impala进程
[[email protected] impala]# ps -ef |grep `pwd`|grep release
13. 验证impala
与mysql数据库连通测试:(各节点都要测试)
[[email protected] impala]# isql gee_business -v
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL>
-- 登录impala
[[email protected] impala]# . setenv.sh
[[email protected] impala]# ./bin/impala-shell.sh -i dc1.com:21001
Starting Impala Shell without Kerberos authentication
Connected to dc1.com:21001
Server version: impalad version 2.2.0-cdh5.4.2 RELEASE (build a0c23b5c27c4209cc22e138c72173842664fa98a)
Welcome to the Impala shell. Press TAB twice to see a list of available commands.
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
(Shell build version: build version not available)
[dc1.com:21001] > show databases;
Query: show databases
+------------------+
| name |
+------------------+
| _impala_builtins |
| dc |
| default |
| gee_business |
| gee_operate |
| gee_person |
+------------------+
[dc1.com:21001] > use gee_business;
[dc1.com:21001] > Query: select * from sv_recommend
+----+----------+----------------+--------------+--------+--------------------------------------------------+---------+---------+--------+---------------------+
| id | tenantid | recommend_type | website_type | isread | uuid | ruserid | suserid | status | create_time |
+----+----------+----------------+--------------+--------+--------------------------------------------------+---------+---------+--------+---------------------+
| 2 | 1 | 2 | 0 | 0 | 7-1432725332639-fb16769d17e10ed1f8175c1e9225109b | 2 | 2 | 1 | 2015-09-01 12:49:54 |
| 9 | 1 | 2 | 0 | 0 | 1 | 2 | 2 | 1 | 2015-09-06 17:21:10 |
| 10 | 1 | 2 | 0 | 0 | 1 | 1 | 2 | 1 | 2015-09-06 17:21:10 |
| 11 | 1 | 2 | 0 | 0 | 1 | 3 | 2 | 1 | 2015-09-06 17:21:10 |
| 12 | 1 | 2 | 0 | 0 | 2938214647 | 2 | 2 | 1 | 2015-09-21 15:01:42 |
| 13 | 1 | 2 | 0 | 0 | f92d9515-3bfd-443b-84dc-681771f15afa | 2 | 2 | 1 | 2015-09-22 16:31:43 |
| 14 | 1 | 2 | 0 | 0 | 5e3f2f4a-46a5-4bc5-ae96-41338224921d | 2 | 2 | 1 | 2015-09-25 16:37:57 |
+----+----------+----------------+--------------+--------+--------------------------------------------------+---------+---------+--------+---------------------+
Fetched 7 row(s) in 4.643058s, Rpc time:4.637681s
14. mysql表的映射关系重建
如果映射关系不可用,可删除原映射关系,再重建
[dc1.com:21001] > show create table yy_webmedia_detail; #主要原因是之前的表结构中‘backend.odbc.executor.host‘=‘dc147.yoyosys.com‘的指定要指向新的数据库IP
Query: show create table yy_webmedia_detail
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| result |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| CREATE TABLE gee_business.yy_webmedia_detail ( |
| id INT, |
| title VARCHAR(300), |
.....................
..................... |
| version INT |
| ) |
| STORED AS TEXTFILE |
| LOCATION ‘hdfs://dc146.yoyosys.com:20500/hive/warehouse/gee_business.db/yy_webmedia_detail‘ |
| TBLPROPERTIES (‘transient_lastDdlTime‘=‘1443598444‘, ‘backend.odbc.dsn‘=‘dsn=gee_business;uid=admin;pwd=admin‘, ‘backend.odbc.executor.host‘=‘dc147.yoyosys.com‘, ‘backend.type‘=‘odbc‘, ‘column.type.mapping‘=‘content:TEXT‘) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
登录impala:
use各库:drop掉所有表
drop table sv_app_version;
drop table sv_classification;
drop table sv_classification_type;
.................
.................
drop table sv_collect;
如果在impala里drop表报错如下: 这里进入hive去drop,drop完后,重启impala
[dc1.com:21001] > drop table sv_feedback ;
Query: drop table sv_feedback
ERROR:
ImpalaRuntimeException: Error making ‘dropTable‘ RPC to Hive Metastore:
CAUSED BY: MetaException: java.lang.IllegalArgumentException: java.net.UnknownHostException: dc146.yoyosys.com
登录hive:
[[email protected] impala]# ./thirdparty/hive-1.1.0-cdh5.4.2/bin/hive shell
重建映射关系,以一个表为例:
15. datacell库的表映射关系重建
-- 登录impala,删除原表,重建
如果在impala里无法把datacell表删除,就登录hive来删除,如下:
登录hive:
[[email protected] impala]# . setenv.sh
[[email protected] impala]# ./thirdparty/hive-1.1.0-cdh5.4.2/bin/hive shell
Logging initialized using configuration in file:/home/geedata/impala/thirdparty/hive-1.1.0-cdh5.4.2/conf/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> show databases;
OK
dc
default
gee_business
gee_operate
gee_person
Time taken: 0.669 seconds, Fetched: 5 row(s)
hive>
> drop database dc;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database dc is not empty. One or more tables exist.)
hive>
> use dc;
OK
Time taken: 0.01 seconds
hive> show tables;
OK
yy_news
yy_user_news
yy_webmedia_detail
Time taken: 0.025 seconds, Fetched: 3 row(s)
hive> drop table yy_news;
OK
Time taken: 8.812 seconds
hive> drop table yy_user_news;
OK
Time taken: 7.091 seconds
hive> drop table yy_webmedia_detail;
OK
Time taken: 7.098 seconds
hive>
> show tables;
OK
Time taken: 0.01 seconds
hive>
> drop database dc;
OK
Time taken: 7.07 seconds
hive> show databases;
OK
default
gee_business
gee_operate
gee_person
Time taken: 0.009 seconds, Fetched: 4 row(s)
hive>
然后,登录impala重建datacell数据库(dc库表结构)
[[email protected] impala]# . setenv.sh
[[email protected] impala]# ./bin/impala-shell.sh -i dc1.com:21001
建库:dc
[dc1.com:21001] > create database dc;
Query: create database dc
建表:以一个表为为例: ## 注意sql中的端口号应与配置文件一致 ‘127.0.0.1:31060‘
CREATE TABLE dc.yy_news (
id BIGINT,
create_time TIMESTAMP,
webmedia_area_type TINYINT,
website_type TINYINT,
update_time TIMESTAMP,
publish_time TIMESTAMP,
version INT,
title VARCHAR(300),
digest VARCHAR(500),
dynamic_abstract VARCHAR(1000),
content VARCHAR(5000),
url VARCHAR(2049),
imageurl VARCHAR(2049),
webmedia_author VARCHAR(50),
webmedia_original_source VARCHAR(255),
website_icp VARCHAR(20),
websitename VARCHAR(20),
keywords VARCHAR(255)
)
TBLPROPERTIES (‘backend.datacell.service‘=‘1121‘, ‘backend.type‘=‘datacell‘, ‘backend.datacell.schema‘=‘yy_news‘, ‘backend.datacell.agent‘=‘127.0.0.1:31060‘, ‘backend.datacell.storage‘=‘yy_news‘);
建完后,验证
[dc1.com:21001] > show tables;
Query: show tables
+--------------------+
| name |
+--------------------+
| yy_news |
| yy_user_news |
| yy_webmedia_detail |
+--------------------+
Fetched 3 row(s) in 0.004810s, Rpc time:0.003613s
[dc1.com:21001] > select count(*) from yy_webmedia_detail;
Query: select count(*) from yy_webmedia_detail
+----------+
| count(*) |
+----------+
| 0 |
+----------+
Fetched 1 row(s) in 3.508856s, Rpc time:3.507191s
[dc1.com:21001] >
16. 总体验证
登录impala后,连接mysql各库,连接datacell库,如果ddl(create/drop),dml(select/update/delete)语句都正常,说明整体部署成功.
17. 变量自动设置
dc1/dc2/dc3上均执行如下命令 使变量自动设置 (这样,不必每次都要手工指定 . setenv.sh 和 . bf_setenv.sh了)
[[email protected] ~]# cp -a ~/.bash_profile ~/.bash_profile.bak
[[email protected] ~]# echo "source /geedata/application/impala/setenv.sh" >> ~/.bash_profile
[[email protected] ~]# echo "source /geedata/application/datacell/bf_setenv.sh" >> ~/.bash_profile
--------------------------------------------------
---- ps:相关测试命令 ----
--------------------------------------------------
--odbc连接
[[email protected] impala]# cd /home/geedata/impala/thirdparty/unixodbc-2.3.2/bin
[[email protected] bin]# ./odbcinst -q -s
[odbc-mysql]
[odbc-oracle]
[odbc-iq]
-- 测试与mysql连通性
[[email protected] impala]# isql gee_business -v
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL>
-- impala启停
//启动impala方法
dc1:
[[email protected] impala]# ./impalactl.sh start all -d
dc2/dc3:
[[email protected] impala]# ./impalactl.sh -d start impala
[[email protected] impala]# ./impalactl.sh -d start impala
//登录impala
[[email protected] impala]# . setenv.sh
[[email protected] impala]# ./bin/impala-shell.sh -i dc1.com:21001
//停止impala方法:
dc1:
[[email protected] impala]# ./impalactl.sh stop all
dc2:
[[email protected] impala]# ./impalactl.sh -d stop impala
dc3:
[[email protected] impala]# ./impalactl.sh -d stop impala
-- hive启停
//启动hive:
[[email protected] impala]# ./impalactl.sh start hive -d
//登录hive:
[[email protected] impala]# cd /home/geedata/impala
[[email protected] impala]# . setenv.sh
[[email protected] impala]# ./thirdparty/hive-1.1.0-cdh5.4.2/bin/hive shell
// 停止hive
[[email protected] impala]# ./impalactl.sh stop hive
-- hadoop启停
// 启动hadoop:
[[email protected] impala]# ./impalactl.sh start hadoop
// 停止hadoop
[[email protected] impala]# ./impalactl.sh stop hadoop
-- 启停datacell
//启动datacell (顺序如下 是各节点上并行执行)
cd /home/geedata/datacell
1) dc1/dc2/dc3上都执行如下命令--启动agent服务
./agent_start.sh
2) 仅主节点dc1上启动group服务
./start_groupd.sh
3) dc1/dc2/dc3上都执行如下命令--启动datacell服务
./start_datacell.sh
//查看datacell进程
[[email protected] datacell]# cd /home/geedata/datacellps -ef |grep `pwd`|grep -v grep
[[email protected] datacell]# ps -ef |grep `pwd`|grep -v grep
//停止datacell
cd /home/geedata/datacell
1) dc1/dc2/dc3上都执行如下命令--停止datacell服务
./stop_datacell.sh
2) 仅dc1上停止group服务
./stop_groupd.sh
3) dc1/dc2/dc3上都执行如下命令--停止agent服务
./agent_stop.sh