使用hadoop平台进行小型网站日志分析

0.上传日志文件到linux中，通过flume将文件收集到hdfs中。

执行命令/home/cloud/flume/bin/flume-ng agent -n a4 -c conf -f /home/cloud/flume/conf/a4.conf -Dflume.root.logger=DEBUG,console

1.建立hive表

create external table bbslog (ip string,logtime string,url string) partitioned by (logdate string) row format delimited fields terminated by ‘\t‘ location ‘/cleaned‘;

2.创建shell脚本

touch daily.sh

添加执行权限

chmod +x daily.sh

daily.sh:

CURRENT=`date +%Y%m%d`

#对数据进行清理，保存到cleaned文件夹，按照当前日期进行保存

/home/cloud/hadoop/bin/hadoop jar /home/cloud/cleaner.jar /flume/$CURRENT /cleaned/$CURRENT

#修改hive表，添加当前日期的分区

/home/cloud/hive/bin/hive -e "alter table bbslog add partition (logdate=$CURRENT) location ‘cleaned/$CURRENT‘"

#使用hive进行分析，根据业务需求而定

#统计pv并计入每日的pv表

/home/cloud/hive/bin/hive -e "create table pv_$CURRENT row format delimited fields terminated by ‘\t‘ as select count(*) from bbslog where logdate=$CURRENT;"

#统计点击次数过20的潜在用户

/home/cloud/hive/bin/hive -e "create table vip _$CURRENT row format delimited fields terminated by ‘\t‘ as select $CURRENT,ip,count(*) as hits from bbslog where logdate=$CURRENT group by ip having hits > 20 order by hits desc"

#查询uv

/home/cloud/hive/bin/hive -e "create table uv_$CURRENT row format delimited fields terminated by ‘\t‘ as select count(distinct ip) from bbslog where logdate=$CURRENT"

#查询每天的注册人数

/home/cloud/hive/bin/hive -e "create table reg_$CURRENT row format delimited fields terminated by ‘\t‘ as select count(*) from bbslog where logdate=$CURRENT AND instr(url,‘member.php?mod=register‘)>0"

#将hive表中的数据导入mysql

/home/cloud/sqoop/bin/sqoop export --connect jdbc:mysql://cloud3:3306/jchubby --username root --password JChubby123 --export-dir "/user/hive/warehouse/vip_$CURRENT" --table vip --fields-terminated-by ‘\t‘

时间： 2024-10-10 23:15:51

使用hadoop平台进行小型网站日志分析

使用hadoop平台进行小型网站日志分析的相关文章

Hadoop学习笔记—20.网站日志分析项目案例（二）数据清洗

Hadoop学习笔记—20.网站日志分析项目案例（三）统计分析

Hadoop学习笔记—20.网站日志分析项目案例（一）项目介绍

打造基于hadoop的网站日志分析系统（5）之spark在日志分析系统里的简单应用

网站日志分析项目案例（二）数据清洗(MiniMapreduce)

HDInsight-Hadoop实战（一）网站日志分析

Spark学习四：网站日志分析案例

Awk使用及网站日志分析

linux下搭建HTTP网站服务器和网站日志分析工具AWStats的使用