1、前期工作:
(1)准备三台电脑:
安装Ubuntu14.04(最好用户名都为hadoop,方便后面的文件传输)
网络映射:
分别修改三台的主机名(sudo /etc/hostname):分别为master,slave1,slave2并配好IP假设为:ip1,ip2,ip3
修改网络映射:sudo /etc/hosts
可以注释掉127.0.1.1一行
增加ip1 master
ip2 slave1
ip3 slave2
(2)安装openssh-server(sudo apt-get install openssh-server5.5)
设置三台电脑之间的无密码登录:参考http://www.cnblogs.com/xiaomila-study/p/4971385.html
(3)安装jdk:下载jdk文件,解压并添加环境变量 sudo vim /etc/profile
#JAVA enviroment setting
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_79
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
退出并 source /etc/profile
2、Hadoop环境的搭建:
(1)下载hadoop2.5.2的tar.gz包并解压
(2)添加环境变量:方法同1中(3)
#hadoop enviroment
export HADOOP_HOME=/home/hadoop/my_project/hadoop-2.5.2
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export PATH=${HADOOP_HOME}/bin:$PATH
export PATH=${HADOOP_HOME}/sbin:$PATH
(3)修改配置文件内容:目录为hadoop的安装目录下/etc/hadoop的文件,分别为hadoop.env.sh(如果是hadoop.env.template.sh,改为hadoop.env.sh),core.site.xml,maped.site.xml,yarn.site.xml,yarn.env.sh,hdfs.site.xml、slaves
hadoop.env.sh:
# The java implementation to use.
export JAVA_HOME=/home/hadoop/my_project/jdk1.7.0_79(jdk的安装目录)
core.site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ip1:9000</value>
</property>
<!--property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property-->
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
maped.site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>ip1:9001</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!--property>
<name>mapreduce.jobhistory.address</name>
<value>ip1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>ip1:19888</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value> -Xmx4096m</value>
</property>
<property>
<name>mapreduce.admin.map.child.java.opts</name>
<value>-XX:-UseGCOverheadLimit</value>
</property-->
</configuration>
hdfs.site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>8192</value>
</property>
</configuration>
yarn.site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>ip:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>ip:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>ip1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>ip1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>ip1:8088</value>
</property>
<!-- property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.9</value>
</property-->
</configuration>
yarn.env.sh
# some Java parameters
export JAVA_HOME=/home/hadoop/my_project/jdk1.7.0_79
slaves
master
slave1
slave2
(4)格式化节点:hdfs namenode -format
(5)启动hadoop:start-all.sh
(6)运行自带的wordcount:参考http://www.cnblogs.com/xiaomila-study/p/4973662.html
运行成功则hadoop安装成功,否则需要在调试参数,即第(3)步的各文件
3、hbase安装:
(1)下载hbase0.98.7的tar.gz文件并解压
(2)配置环境变量:
#HBASE enviroment
export HBASE_HOME=/home/hadoop/my_project/hbase-0.98.7
export PATH=${HBASE_HOME}/bin:$PATH
(3)配置文件:在hbase的安装目录conf文件夹下:分别是hbase.env.sh和hbase.site.xml、regionservers
hbase.env.sh
export JAVA_HOME=/home/hadoop/my_project/jdk1.7.0_79
export HBASE_MANAGES_ZK=true(表示使用hbase自带的zookeeper)
hbase.site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://ip1:9000/hbase</value>
</property>
<property>
<name>hbase.master</name>
<value>hdfs://ip1:60000</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2222</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>ip1,ip2,ip3</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
<value>100</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>90000</value>
</property>
<property>
<name>hbase.regionserver.restart.on.zk.expire</name>
<value>true</value>
<description>
Zookeeper session expired will force regionserver exit.
Enable this will make the regionserver restart.
</description>
</property>
</configuration>
regionservers
master
slave1
slave2
(3)启动hbase:start-hbase.sh
(4)进入hbase:hbase shell
(5)验证是否成功:list
create ‘test‘,‘info‘
如果创建成功,则hbase安装成功
4、sqoop安装:
(1)下载sqoop1.4.6的tar.gz包并解压
(2)添加环境变量:
#sqoop enviroment
export SQOOP_HOME=/home/hadoop/my_project/sqoop-1.4.6
export PATH=${SQOOP_HOME}/bin:$PATH
(3)修改配置文件:sqoop.env.sh
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/hadoop/my_project/hadoop-2.5.2
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/hadoop/my_project/hadoop-2.5.2
config-sqoop(bin目录下):将下面的注释掉,这里如果hbase_home如果没有注释掉不知道有没有用,我的环境是注释了的
## Moved to be a runtime check in sqoop.
#if [ ! -d "${HBASE_HOME}" ]; then
# echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."
# echo ‘Please set $HBASE_HOME to the root of your HBase installation.‘
#fi
## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
# echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
# echo ‘Please set $HCAT_HOME to the root of your HCatalog installation.‘
#fi
#if [ ! -d "${ACCUMULO_HOME}" ]; then
# echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
# echo ‘Please set $ACCUMULO_HOME to the root of your Accumulo installation.‘
#fi
#if [ ! -d "${ZOOKEEPER_HOME}" ]; then
# echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
# echo ‘Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.‘
#fi
(4)将jdbc的jar包放入sqoop安装路径的lib文件夹下:mysql-connector-java-5.1.32-bin.jar、sqljdbc4.jar、
sqoop-sqlserver-1.0.jar
(5)如是测试mysql,则在mysql服务器端设置:grant all privileges on *.* to ‘root‘@‘%‘ identified by ‘123‘ with grant option;(这样可以远程访问),同理若是sqlserver可以设置远程访问
(6)测试:
测试sqoop连接mysql:
sqoop list-databases --connect jdbc:mysql://ip:3306/ --username root --password 123
sqoop import --connect ‘jdbc:sqlserver://ip;username=sa;password=123;database=WebHotPub‘ --query ‘select * from channelType where $CONDITIONS‘ --split-by channelType.chnTypeID --hbase-create-table --hbase-table chnType1 --column-family channelInfo --hbase-row-key chnTypeID -m 3
sqoop import --connect ‘jdbc:sqlserver://ip;username=sa;password=123;database=WebHotPub‘ --table channelType --hbase-create-table --hbase-table chnType --column-family channelInfo --hbase-row-key chnTypeID -m 1