互联网营销精准决策项目总结

  在今天我已经吧互联网营销精准决策项目的所有数据处理和分析的工作都完成了,包括按照范围分类,给表格打标签,设置权重,添加宽表,价格分类分析,销售情况分类分析等

  在这几天的开发过程中,自己学到了很多,包括一些hive的使用方式、hql的语法,jdbc对hive的连接,hive的运行体制等,更重要是根据老师给的需求一步步自己探索,让我对数据分析有了一个新的认知,知道了拿到怎样的数据应该按照怎样的步骤,按照怎么样的范围去划分数据去分类去分析。

  下面附上我的具体操作代码,其中有基本的注释,但由于写的过程中太投入了,就没有截图。。。。。。

  以下代码的总结主要是用于以后自己在写hive时回顾,对于一些小白就算没有数据集也可以看一些基本的语法,很多代码还可以优化,但毕竟我刚开始学,多多包涵。

//手机表
create table iphone(
    >     id string,
    >     name array<string>,
    >     title string,
    >     storename string,
    >     storeid string,
    >     link string,
    >     price int,
    >     keyword string,
    >     comment string,
    >     goodcom int,
    >     brand string,
    >     model string,
    >     color string,
    >     uptime string,
    >     system string)
    >     row format delimited fields terminated by ‘,‘
    >     collection items terminated by ‘\t‘
    >     lines terminated by ‘\n‘;

create table iphone1(
    >     id string,
    >     name array<string>,
    >     title string,
    >     storename string,
    >     storeid string,
    >     link string,
    >     price int,
    >     keyword string,
    >     comment string,
    >     goodcom int,
    >     brand string,
    >     model string,
    >     color string,
    >     uptime string,
    >     system string)
    >     row format delimited fields terminated by ‘,‘
    >     collection items terminated by ‘\t‘
    >     lines terminated by ‘\n‘;
//清洗数据后的手机表
insert into iphone1(
    select * from iphone where system!="");

//用户信息表,该地区表中place中分割符与表中不符,建议使用userinfo1表
 create table userinfo(
    id string,
    name string,
    place Array<string>,
    sex string,
    birthday string)
        row format delimited fields terminated by ‘,‘
        lines terminated by ‘\n‘;

//行为表
create table action(
    > userid string,
    > commodityid string,
    > behavior string,
    > month string,
    > day string)
    > row format delimited fields terminated by ‘,‘
    > lines terminated by ‘\n‘;

//拼接日期后的行为表
create table action1(
    userid string,
    commodityid string,
    behavior string,
    time string)
    row format delimited fields terminated by ‘\t‘
    lines terminated by ‘\n‘;

//评论表
 create table comment(
    commodityid string,
    discuss string,
    time string,
    userid string,
    name string,
    rank string,
    color string,
    replynum string,
    score int,
    source string)
    row format delimited fields terminated by ‘\t‘
    lines terminated by ‘\n‘;

//拼接日期
insert into action1
    > select userid,commodityid,behavior,concat(month,"-",day) from action;

//用户表+评论表
create table com_user(
    commodityid string,//用户id
    name string,
    place Array<string>,
    sex string,
    agetitle int,
    agerange string,
    rank string)
    row format delimited fields terminated by ‘\t‘
    lines terminated by ‘\n‘;

select userinfo.id,userinfo.name,userinfo.place,userinfo.sex,
case when (2019-cast(substr(userinfo.birthday,1,4)as int))<18 then 1
when (2019-cast(substr(userinfo.birthday,1,4)as int))<24 then 2
when (2019-cast(substr(userinfo.birthday,1,4)as int))<29 then 3
when (2019-cast(substr(userinfo.birthday,1,4)as int))<34 then 4
when (2019-cast(substr(userinfo.birthday,1,4)as int))<39 then 5
when (2019-cast(substr(userinfo.birthday,1,4)as int))<49 then 6
else 7 end,
case when (2019-cast(substr(userinfo.birthday,1,4)as int))<18 then "18岁以下"
when (2019-cast(substr(userinfo.birthday,1,4)as int))<24 then "24岁以下"
when (2019-cast(substr(userinfo.birthday,1,4)as int))<29 then "29岁以下"
when (2019-cast(substr(userinfo.birthday,1,4)as int))<34 then "34岁以下"
when (2019-cast(substr(userinfo.birthday,1,4)as int))<39 then "39岁以下"
when (2019-cast(substr(userinfo.birthday,1,4)as int))<49 then "49岁以下"
else "50岁以上" end
,comment.rank from userinfo join comment on userinfo.id=comment.userid;

//查询销售量前十的

select brand,count(*) as num from action join iphone1 on action.commodityid=iphone1.id where brand!=""  group by brand order by num desc limit 10;

//销售量前10表
hive> create table saleTop10(
    brand string,
    scount int)
    row format delimited fields terminated by ‘\t‘
        lines terminated by ‘\n‘;

//查询华为旗下的top 20
 select name,count(*) as num from iphone where brand=‘华为(HUAWEI)‘ group by name order by num desc limit 20;

//华为top20表
create table HUAWETopSale20(
    name array<string>,
    mcount int)
    row format delimited fields terminated by ‘\t‘
    collection items terminated by ‘ ‘
    lines terminated by ‘\n‘;

//插入数据
insert into HUAWETopSale20  select name,count(*) as num from iphone where brand=‘华为(HUAWEI)‘ group by name order by num desc limit 20;

//苹果top20表
create table AppleTopSale20(
    name array<string>,
    mcount int)
    row format delimited fields terminated by ‘\t‘
    collection items terminated by ‘ ‘
    lines terminated by ‘\n‘;

//插入数据
insert into AppleTopSale20  select name,count(*) as num from iphone where brand=‘Apple‘ or brand=‘苹果‘ group by name order by num desc limit 20;

//各年龄段手机销售情况表
create table agesale(
agetitle int,
model string,
count int)
row format delimited fields terminated by ‘\t‘
lines terminated by ‘\n‘;

//插入数据
Insert into agesale
select com_user.agetitle,iphone1.model,count(*) from action  join iphone1 on action.commodityid=iphone1.id  join com_user on action.userid=com_user.commodityid group by agetitle,model order by agetitle desc limit 50; 

//用户信息表,该地区表中place中分割符与表中不符,建议使用userinfo1表
 create table userinfo1(
    id string,
    name string,
    place Array<string>,
    sex string,
    birthday string)
        row format delimited fields terminated by ‘,‘
        collection items terminated by ‘ ‘
        lines terminated by ‘\n‘;

//地区销售表1
create table placesale(
    addr string,
    model string,
    count int);

//地区销售表2
create table placesale1(
    addr string,
    model string,
    count int);

//建立用户标签
//用户性别标签表
 create table profile_tag_user_gender(
    userid string,
    tagid string,
    tagname string,
    tagtype string);
//用户年龄段标签表
create table profile_tag_user_age_region(
    userid string,
    tagid string,
    tagname string,
    tagetype string);
//插入用户年龄表数据
insert into profile_tag_user_age_region
    > select commodityid,
    > concat("A111U00100",cast(agetitle as string)),agerange,
    > "用户年龄段"
    > from com_user;

//用户等级标签表
 create table profile_tag_user_grade(
    > userid string,
    > tagid string,
    > tagname string,
    > tagtype string);

//插入用户等级表
insert into profile_tag_user_grade
select commodityid,
    case when rank="注册会员" then "A111U003_001"
    when rank="企业会员" then "A111U003_002"
    when rank="铜牌会员" then "A111U003_003"
    when rank="银牌会员" then "A111U003_004"
    when rank="金牌会员" then "A111U003_005"
    when rank="钻石会员" then "A111U003_006"
    when rank="PLUS会员[试用]" then "A111U003_007"
    when rank="PLUS会员" then "A111U003_008"
    else "A111U003_000" end,
    rank,"用户等级"
    from com_user;

//创建用户行为标签表
create table person_user_tag_action(
userid string,
tagid string,
tagname string,
tagtype string,
actioncount int);

//插入数据
insert into person_user_tag_action
select userid,concat("B21U001_00",behavior),
    case when behavior="0" then "点击"
    when behavior="1" then "添加购物车"
    when behavior="2" then "购买"
    when behavior="3" then "关注"
    else "" end,
    "用户行为",
    count(*) as c from action group by userid,behavior;

//自定义权重:购买:5,加入购物车:4,关注:3,点击:2  自定义冷却系数1.68

//权重表
create table act_weight_detail(
userid string,
tagid string,
tagname string,
cnt int,
tagtypeid int,
actweight float);

//计算时间,在插入数据中会使用
select month,day,(11-cast(month as int))*30,(12-cast(day as int)),(11-cast(month as int))*30+(12-cast(day as int)) as c from action order by c limit 20;

//计算tf,在插入数据中会使用
select userid,sum(actioncount)over(partition by userid,tagid) as tf1,sum(actioncount)over(partition by userid) from person_user_tag_action order by userid limit 20;
//计算idf,在插入数据中会使用
select tagid,(sum(actioncount)over(partition by tagid))/477008 from person_user_tag_action  limit 20;

//插入数据
insert into act_weight_detail
select action.userid,person_user_tag_action.tagid,person_user_tag_action.tagname,person_user_tag_action.actioncount,cast(action.behavior as int),
case when action.behavior="0" then 2/(exp((11-cast(action.month as int))*30+(12-cast(action.day as int))))*(sum(actioncount)over(partition by person_user_tag_action.userid,tagid)/sum(actioncount)over(partition by person_user_tag_action.userid))*477008/(sum(actioncount)over(partition by tagid))*person_user_tag_action.actioncount
when action.behavior="1" then 4/(exp((11-cast(action.month as int))*30+(12-cast(action.day as int))))*(sum(actioncount)over(partition by person_user_tag_action.userid,tagid)/sum(actioncount)over(partition by person_user_tag_action.userid))*477008/(sum(actioncount)over(partition by tagid))*person_user_tag_action.actioncount
when action.behavior="2" then 5/(exp((11-cast(action.month as int))*30+(12-cast(action.day as int))))*(sum(actioncount)over(partition by person_user_tag_action.userid,tagid)/sum(actioncount)over(partition by person_user_tag_action.userid))*477008/(sum(actioncount)over(partition by tagid))/477008*person_user_tag_action.actioncount
when action.behavior="3" then 3/(exp((11-cast(action.month as int))*30+(12-cast(action.day as int))))*(sum(actioncount)over(partition by person_user_tag_action.userid,tagid)/sum(actioncount)over(partition by person_user_tag_action.userid))*477008/(sum(actioncount)over(partition by tagid))/477008*person_user_tag_action.actioncount
else 0 end
from person_user_tag_action join action on person_user_tag_action.userid=action.userid;

//生成宽表
create table profile_user_tb(
    > userid string,
    > tagid1 string,
    > tagname1 string,
    > tagtype1 string,
    > tagid2 string,
    > tagname2 string,
    > tagtype2 string,
    > tagid3 string,
    > tagname3 string,
    > tagtype3 string,
    > tagid4 string,
    > tagname4 string,
    > actioncount int,
    > actionweight float,
    > tagtype4 string);

//插入数据
insert into profile_user_tb
select profile_tag_user_gender.userid,profile_tag_user_gender.tagid,profile_tag_user_gender.tagname,profile_tag_user_gender.tagtype,profile_tag_user_age_region.tagid,profile_tag_user_age_region.tagname,profile_tag_user_age_region.tagetype,profile_tag_user_grade.tagid,profile_tag_user_grade.tagname,profile_tag_user_grade.tagtype,act_weight_detail.tagid,act_weight_detail.tagname,act_weight_detail.cnt,act_weight_detail.actweight,"用户行为" from profile_tag_user_gender join profile_tag_user_age_region on profile_tag_user_gender.userid=profile_tag_user_age_region.userid join profile_tag_user_grade on profile_tag_user_gender.userid=profile_tag_user_grade.userid join act_weight_detail on  profile_tag_user_gender.userid=act_weight_detail.userid;

//在pycharm中操作python
//提取评论
f1 = open("../comment1.txt",encoding="utf-8")
f2 = open("../common","r+",encoding="utf-8")
line = f1.readline()
while line:
    f2.write(line.split("\t")[1]+"\n")
    line = f1.readline()
f1.close()
f2.close()

//计算评论的字数  输出字数: 12421790
content =""
try:
    f=open("../common",encoding="utf-8")
    for line in f.readlines():
        content+=line.strip();
    print("字数:",len(content))
except ValueError:
    print("AAAAAA")

//建立分词统计表
 create table comment_word_count_tb(
    word string,
    count int)row format delimited fields terminated by ‘\t‘
    lines terminated by ‘\n‘;
//导入分词统计数据
load data local inpath "/home/hadoop/file/wordcutsum" into table comment_word_count_tb;

//分时间销售表,显示某一天的销售量
create table date_range_sail_count(
 sailcount int,
 datarange string);

//价格区间销售表
create table price_range_sail_count(
pricerange string,
sailcount int);

//为了向价格区间表输入数据先建立一个中间表
create table mid1(
     pricerange string);
//向中间表导入数据
insert into mid1
    select
    case when price<=1000 then "1000元以下"
    when price<=2000 then "1000-2000元"
    when price<=3000 then "2000-3000元"
    when price<=5000 then "3000-5000元"
    when price<=10000 then "5000-10000元"
    when price>=10000 then "10000以上"
    else 0 end
    from iphone1;

//向价格区间销售表写入数据
 insert into price_range_sail_count
    select pricerange,count(*) from mid1 group by pricerange;

原文地址:https://www.cnblogs.com/837634902why/p/11505174.html

时间: 2024-10-31 17:35:34

互联网营销精准决策项目总结的相关文章

大数据技术暑期实习七___互联网营销精准决策(加载数据源)

1. 进入Hadoop环境(在Hadoop安装目录下运行命令.若配置好ssh则可以直接运行启动命令) 2. 启动hive进程(按照网上或林子雨的配置教程来就可以,不再赘述) 进入到shell 3.加载数据到hive数据库(在项目实操中不建议查询语句为select *,而应根据列名查询,若只是查看表结构及数据效果,建议加limit,不然要机子要崩~~沙卡拉卡) hive> show tables; ##查看表 hive> desc formatted hive_table; ##描述表信息 de

大数据技术暑期实习六___互联网营销精准决策(手机数据爬取)

一.解决方案 二.电商数据的爬取和清洗 2.1 Python爬取京东手机销售历史数据 1).环境 python3 环境.第三方包有 scrapy,re Pycharm .NotePad++.SublimeText 等代码编辑工具 2).爬虫步骤 采用 scrapy 爬虫框架编写爬虫脚本,选取核心代码讲解爬取京东手机销售数据的爬取逻辑.具体步骤如下: 1> 获取电商网站目标数据信息 2>根据手机品牌作为搜索关键词 withopen('./mobile_project/data/手机品牌.csv'

互联网营销新机遇,你需要更专业的APP在线制作平台

移动互联网的快速发展日新月异,如今手机APP是各行业企业的营销利器,但是目前应用市场上的APP同质化严重,有个性和特色的APP太少.那么如何定制开发一款个性化APP应用软件呢,制作一份好的APP策划方案尤其重要.那么一份好的APP策划方案该怎么入手呢?APP在线制作的专业平台APICloud给出自己的答案,为中小企业甚至草根创业者提供了敢想敢做的新方式,促成了互联网营销很多新机遇. 当今面临很多产业转型的经济形态下,传统行业的各大中小企业措手不及,包括个体户,想追上互联网步伐,却无从下手,对AP

2014移动互联网营销所带来的8大重要趋势

最近跟朋友聊电子书的行业前景时,说到中国人到底有没有阅读习惯这个问题,因为时常会听到一些人批判中国人不爱读书,西方很多国家的人在坐车时基本 都在看书,而在中国的车上则很少会看到这种场景.所以我最近坐车时就格外留心了一下,结果却有了以下这个感触:原来中国人并不是不爱阅读,而是大家都在看手机.而受限于网速等各种问题,多数人还仅仅将手机当做一台单机小电脑,即使用手机上网的人一般数据量也不大.或许,中国已经站在移动互联网的爆发临界点了. 2014年,移动互联网营销一定会在2013年的基础上进一步的深化,

西安移动互联网营销公司

如果说PC互联网知识俘获了部分70后.80后和90后这些人群的芳心的话,那么移动互联网已经全面拥抱新一批数字移民,在这 些数字移民中包括50后.60后,他们这部分人绝大多数没有用过PC,大多数人也从没有在亚马逊.淘宝.京东.当当上买过一 件东西,但是现在,他们已经习惯每天通过手机或平板电脑与移动互联网这个世界连接,他们也成为微信朋友圈刷屏的主力军 . 随着科技的不断发展,广告也在不断进步,现在移动互联网营销广告从客户体验,舒适度,感观都有很大的提升.移动互联网 营销广告优势在于丰富的展示,和精准

数据分析变革 大数据时代精准决策之道——互动出版网

这篇是计算机类的优质预售推荐>>>><数据分析变革 大数据时代精准决策之道> 畅销书<驾驭大数据>作者.Teradata公司的首席分析官Bill Franks力作 内容简介 能够快速适应不断变化的市场环境的能力是获得成功的关键.本书旨在将数据分析嵌入运营流程,帮助读者将从数据(包括大数据和小数据)分析中获得的业务洞察与日常运营紧密集成在一起. 本书确切地讲述了使分析运营化到底意味着哪些变革,并告诉读者如何建立团队.创建文化.升级分析方法论并利用技术,使企业向

互联网营销+教育培训机构=学校APP

互联网营销+教育培训机构=学校APP 随着互联网生活逐渐成为市民日常生活密不可分的一部分,利用互联网进行品牌传播,也自然而然的成为越来越多的企业进行品牌宣传的重要渠道. 合理的利用互联网的品牌营销渠道,可以更加科学有效的减少营销成本,并提高品牌营销的传播价值.所以,作为主打线下市场的教育培训机构,也在这些年将品牌传播的目光逐渐转移到互联网当中. 那么,对于教育培训机构而言,如果有自己的学校APP那么就能把互联网营销应用到教育培训机构的招生中,APP拥有以下五点传统招生所不能比拟的营销传播优势 精

场景应用:技术驱动移动互联网营销

随着移动互联网及大数据挖掘的不断深入,技术在营销中的作用越来越重要,甚至起到决定性作用.在移动互联网行业,最直观的目标就是变现,无论是在互联网.电商,还是游戏.支付都是如此.而用户对广告的认知,还停留在电视媒体和平面媒体上,认为只是强推的信息.如今,互联网和移动互联网已迈向大数据营销时代,通过对数据的深入挖掘,基于用户的特点为其推送感兴趣的广告,对用户来说,广告不再是广告而是有用的信息.不言而喻,技术在其中起着至关重要的作用,据此点入移动形成了以技术为根基服务客户的理念.根据客户的需求和服务提出

汇桔宝时代:重新定义互联网营销与流量格局!

在这个流量为王的时代,酒香也怕巷子深!时代在变,那个靠口口相传来获客盈利的年代已经一去不复返.如今,不会运营推广几乎意味着死路一条. 当手机品牌商玩起了情怀,奶茶出现了网红店,美妆博主摇身一变成为带货女王,连小区门口卖桔子的老奶奶都打出"甜过初恋"的宣传语吸引人,你或许已经发觉这个时代只有产品和服务还远远不够,要在行业里走得更深更远,除了要有能与竞争对手一较高下的产品,能够在消费者心中留下美好印象的服务,还必须要有足够的流量和转化. 流量从哪来 以前我们说酒深不怕巷子深,那是在互联网极