本文使用的pig版本是pig-0.12.0.tar.gz,在安装以前已经安装好了hadoop,hadoop的安装方法参考 hadoop-1.2.1安装方法详解
pig的安装方法很简单,配置一下环境即可,pig有两种工作模式:本地模式和MapReduce模式(默认)。
1、上传并解压pig-0.12.0.tar.gz
[[email protected] temp]$ tar
zxf pig-0.12.0.tar.gz
2、配置pig的环境变量并使之生效
export PIG_HOME=/home/hadoop/pig-0.12.0 export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin:$PIG_HOME/bin
3、通过pig命令验证安装(本地模式)
[[email protected] ~]$ pig
-x local
2015-06-12 00:23:30,823 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14
2015-06-12 00:23:30,824 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop/pig_1434093810822.log
2015-06-12 00:23:30,876 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop/.pigbootup not found
2015-06-12 00:23:30,964 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: file:///
grunt> quit;
[[email protected] ~]$
能看到 grunt> 就说明已经配置成功,file:///表示现在是local模式,要使用MapReduce模式,需要正确配置启动hadoop集群,并且pig可以读取到hadoop的配置文件(hadoop的conf目录下的文件)
4、在 .bashrc文件中配置PIG_CLASSPATH,并使用生效
export PIG_CLASSPATH=/home/hadoop/hadoop-1.2.1/conf
5、使用pig命令验证安装(MapReduce模式)
[[email protected] ~]$ pig
2015-06-12 00:35:43,322 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14
2015-06-12 00:35:43,322 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop/pig_1434094543321.log
2015-06-12 00:35:43,342 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop/.pigbootup not found
2015-06-12 00:35:43,463 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: hdfs://master:9000
2015-06-12 00:35:43,613 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to map-reduce job tracker at: master:9001
grunt> quit;
[[email protected] ~]$
通过上面的红色部分可以看出,现在的文件系统是hdfs的文件系统,跟本地模式不一样
至此我们已经安装好了pig,由于pig的日志文件是保存到执行pig命令的目录下(不同目录下进入pig日志位置不一样),不利于日志的分析和管理,所以通常指定一个具体的目录,方法如下:
1、创建一个pig的日志目录,我这里放到hadoop用户下的pig/logs文件夹下
[[email protected] ~]$ mkdir
-p /home/hadoop/pig/logs
2、修改/home/hadoop/pig-0.12.0/conf/pig.properties文件,去掉配置pig.logfile参数的注释,并配置如下
pig.logfile=/home/hadoop/pig/logs
这样pig的日志就写到指定的目录下了,如下:
[[email protected] conf]$ pig
2015-06-12 00:51:12,399 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14
2015-06-12 00:51:12,399 [main] INFO org.apache.pig.Main - Logging
error messages to: /home/hadoop/pig/logs/pig_1434095472397.log
2015-06-12 00:51:12,418 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop/.pigbootup not found
2015-06-12 00:51:12,524 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master:9000
2015-06-12 00:51:12,659 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master:9001
grunt>
版权声明:本文为博主原创文章,转载请注明本文链接。