最近学习Hive。实现了其中几个作业,如下。Hql语句没怎么调整格式,见谅。
Hive练习二中的题目
(1). 2017 年4 月1 日各个商品品牌的交易笔数,按照销售交易从多到少排序
1 select 2 brand, 3 count(*) as totalCount 4 from 5 record 6 join brand_dimension on record.bid = brand_dimension.bid 7 where record.trancation_date= ‘2017-04-01‘ 8 group by brand_dimension.brand 9 order by totalCount desc;
输出
+------------+-------------+--+ | brand | totalcount | +------------+-------------+--+ | SAMSUNG | 2 | | WULIANGYE | 1 | | PUMA | 1 | | OPPO | 1 | | DELL | 1 | +------------+-------------+--+
(2). 不同性别消费的商品类别情况(不同性别消费不同商品类别的总价)
1 select 2 gender, 3 category, 4 sum(price) as totalPrice 5 from record 6 join user_dimension on record.uid = user_dimension.uid 7 join brand_dimension on record.bid = brand_dimension.bid 8 group by gender, 9 category 10 order by 11 gender, 12 category, 13 totalPrice;
输出
+---------+------------+-------------+--+ | gender | category | totalprice | +---------+------------+-------------+--+ | M | computer | 252 | | M | food | 429 | | M | sports | 120 | | M | telephone | 1669 | +---------+------------+-------------+--+
Hive练习三中的题目
(1). 谁不是经理
1 select 2 name 3 from 4 employees 5 where 6 size(subordinates)<=0;
输出
+-------------------+---------------+--+ | name | subordinates | +-------------------+---------------+--+ | Todd Jones | [] | | Bill King | [] | | Stacy Accountant | [] | +-------------------+---------------+--+ 3 rows selected (0.092 seconds)
(2). 谁住在邮编比60500 大的地区
1 select name,address.zip from employees where address.zip> 60500;
输出
+-------------------+--------+--+ | name | zip | +-------------------+--------+--+ | John Doe | 60600 | | Mary Smith | 60601 | | Todd Jones | 60700 | | Stacy Accountant | 60563 | +-------------------+--------+--+ 4 rows selected (0.123 seconds)
(3). 联邦税超过0.15 的雇员
1 select 2 name, 3 deductions[‘Federal Taxes‘] 4 from 5 employees 6 where deductions[‘Federal Taxes‘] > 0.15+1e-5;
输出
+---------------+---------------+--+ | name | federaltaxes | +---------------+---------------+--+ | John Doe | 0.2 | | Mary Smith | 0.2 | | Boss Man | 0.3 | | Fred Finance | 0.3 | +---------------+---------------+--+ 4 rows selected (0.126 seconds)
(4). 谁住在Drive 或Par 街道
1 select 2 name, 3 address 4 from 5 employees 6 where address.street rlike ‘^.*(Drive|Par).*$‘;
输出
---------------+------------------------------------------------------------------------------+--+ | name | address | +---------------+------------------------------------------------------------------------------+--+ | Boss Man | {"street":"1 Pretentious Drive.","city":"Chicago","state":"IL","zip":60500} | | Fred Finance | {"street":"2 Pretentious Drive.","city":"Chicago","state":"IL","zip":60500} | +---------------+------------------------------------------------------------------------------+--+ 2 rows selected (0.218 seconds)
Hive练习十中的题目
1. 建表
创建并 使用 database tmall
1 create database tmall; 2 use tmall;
建表 product,格式 text
1 create table if not exists product( 2 item_id string, 3 pic_url string, 4 category string, 5 brand_id string, 6 seller_id string 7 ) 8 row format delimited 9 fields terminated by ‘,‘ 10 lines terminated by ‘\n‘ 11 stored as textfile;
建表 review,格式 text
1 create table if not exists review( 2 item_id string, 3 user_id string, 4 feedback int, 5 feedback_time timestamp, 6 feedback_pic_url string 7 ) 8 row format delimited 9 fields terminated by ‘,‘ 10 lines terminated by ‘\n‘ 11 stored as textfile;
建表 log_orc,格式 orc
1 # log as textfile 2 create table if not exists log( 3 item_id string, 4 user_id string, 5 action string, 6 action_time timestamp 7 ) 8 row format delimited 9 fields terminated by ‘,‘ 10 lines terminated by ‘\n‘ 11 stored as textfile; 12 13 # log as orc 14 create table if not exists log_orc 15 like log 16 stored as orc;
2. 载入数据
1 #product 2 load data local inpath ‘/home/zkpk/test/tmall/tmall_product.csv‘ overwrite into table product; 3 #验证:select * from product limit 100; 4 #review 5 load data local inpath ‘/home/zkpk/test/tmall/tmall_review.csv‘ overwrite into table review; 6 #验证:select * from review limit 100; 7 #log 8 load data local inpath ‘/home/zkpk/test/tmall/tmall_log.csv‘ overwrite into table log; 9 #验证:select * from log limit 100; 10 insert into table log_orc select * from log;
3. 载入数据
- 热度前十名的商家(商品被浏览的最多)
1 select 2 p.seller_id, 3 count(*) as click_count 4 from log_orc l 5 inner join product p on l.item_id = p.item_id 6 where l.action=‘click‘ 7 group by p.seller_id 8 order by click_count desc 9 limit 10;
输出
+--------------+--------------+--+ | p.seller_id | click_count | +--------------+--------------+--+ | s403 | 143 | | s190 | 116 | | s284 | 86 | | s161 | 78 | | s227 | 59 | | s61 | 59 | | s29 | 57 | | s82 | 45 | | s464 | 42 | | s261 | 42 | +--------------+--------------+--+
- 好评率(feedback=5为好评)低于60%的商品
1 select 2 item_id, 3 count(case when feedback =5 then 1 else null end) as best, 4 count(1) as total, 5 count(case when feedback =5 then 1 else null end)/ count(1) as best_rate 6 from review 7 group by item_id 8 having count(case when feedback =5 then 1 else null end) < count(1) *0.6 ;
输出
+----------+-------+--------+----------------------+--+ | item_id | best | total | best_rate | +----------+-------+--------+----------------------+--+ | 221 | 0 | 1 | 0.0 | | 256 | 0 | 4 | 0.0 | | 287 | 1 | 3 | 0.3333333333333333 | | 288 | 2 | 6 | 0.3333333333333333 | | 378 | 0 | 1 | 0.0 | | 379 | 1 | 6 | 0.16666666666666666 | | 397 | 0 | 2 | 0.0 | | 398 | 0 | 1 | 0.0 | | 423 | 2 | 7 | 0.2857142857142857 | | 449 | 1 | 5 | 0.2 | | 45 | 0 | 6 | 0.0 | | 450 | 0 | 1 | 0.0 | | 451 | 0 | 3 | 0.0 | | 453 | 1 | 2 | 0.5 | | 505 | 1 | 2 | 0.5 | +----------+-------+--------+----------------------+--+
- 找出潜在购买用户(收藏了商品,但是没有购买)
1 select distinct 2 m.item_id, 3 m.user_id 4 from log_orc m 5 left outer join log_orc s 6 on (m.item_id = s.item_id 7 and m.user_id = s.user_id 8 and s.action=‘alipay‘) 9 where m.action=‘collect‘ 10 and s.item_id is null;
输出
+------------+------------+--+ | m.item_id | m.user_id | +------------+------------+--+ | 152 | 3286 | | 24 | 3389 | | 242 | 39 | | 422 | 3423 | | 468 | 2727 | +------------+------------+--+
时间: 2024-10-11 11:49:29