用pandas分析电影数据
Lift is short, use Python.
用Python做数据分析,pandas是Python数据分析的重要包,其他重要的包:numpy、matplotlib .
安装pandas(Linux, Mac, Windows皆同):
pip install pandas
电影数据来源:http://grouplens.org/datasets/movielens/
下载数据文件解压,包含如下4个文件:
- users.dat 用户数据
- movies.dat 电影数据
- ratings.dat 评分数据
- README 文件解释
查看README文件,可知源数据文件的格式:
- users.dat (UserID::Gender::Age::Occupation::Zip-code)
- movies.dat (MovieID::Title::Genres)
- ratings.dat (UserID::MovieID::Rating::Timestamp)
特别解释:Occupation用户职业,Zip-code邮编, Timestamp时间戳, Genres电影类型(更多解释可以查看README文件).
文件中各每条数据的分割符是 ::
环境:
- OS:Windows
- Language:Python3.4
- 编辑器:Jupyter
用pandas读取数据.
导入必要的头文件:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
读取数据,先定义字段名,因为源数据中无字段名,只有用’::’分割的每条数据.
user_names = [‘user_id‘, ‘gender‘, ‘age‘, ‘occupation‘, ‘zip‘] #用户表的数据字段名
读取数据,注意源文件的地址.
users = pd.read_table(‘C:\\Users\\Administrator\\Downloads\\ml-1m\\users.dat‘, sep=‘::‘, header=None, names=user_names)
D:\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: ParserWarning: Falling back to the ‘python‘ engine because the ‘c‘ engine does not support regex separators; you can avoid this warning by specifying engine=‘python‘.
if __name__ == ‘__main__‘:
上面有个警告,可以不管,即:加载数据是用的python engine 而不是 c engine.(更多请google)
查看有多少个数据.
前5行数据.
print(len(users))
users.head()
6040
user_id | gender | age | occupation | zip | |
---|---|---|---|---|---|
0 | 1 | F | 1 | 10 | 48067 |
1 | 2 | M | 56 | 16 | 70072 |
2 | 3 | M | 25 | 15 | 55117 |
3 | 4 | M | 45 | 7 | 02460 |
4 | 5 | M | 25 | 20 | 55455 |
同理将movies,ratings数据读进来.
ratings_names = [‘user_id‘, ‘movie_id‘, ‘rating‘, ‘timestamp‘]
ratings = pd.read_table(‘C:\\Users\\Administrator\\Downloads\\ml-1m\\ratings.dat‘, sep=‘::‘, header=None, names=ratings_names)
movies_names = [‘movie_id‘, ‘title‘, ‘genres‘]
movies = pd.read_table(‘C:\\Users\\Administrator\\Downloads\\ml-1m\\movies.dat‘, sep=‘::‘, header=None, names=movies_names)
D:\Anaconda3\lib\site-packages\ipykernel\__main__.py:2: ParserWarning: Falling back to the ‘python‘ engine because the ‘c‘ engine does not support regex separators; you can avoid this warning by specifying engine=‘python‘.
from ipykernel import kernelapp as app
D:\Anaconda3\lib\site-packages\ipykernel\__main__.py:4: ParserWarning: Falling back to the ‘python‘ engine because the ‘c‘ engine does not support regex separators; you can avoid this warning by specifying engine=‘python‘.
加载数据需要一点点时间,应为数据有上百万条.
查看ratings表,movies表.
print(len(ratings))
ratings.head()
1000209
user_id | movie_id | rating | timestamp | |
---|---|---|---|---|
0 | 1 | 1193 | 5 | 978300760 |
1 | 1 | 661 | 3 | 978302109 |
2 | 1 | 914 | 3 | 978301968 |
3 | 1 | 3408 | 4 | 978300275 |
4 | 1 | 2355 | 5 | 978824291 |
print(len(movies))
movies.head()
3883
movie_id | title | genres | |
---|---|---|---|
0 | 1 | Toy Story (1995) | Animation|Children’s|Comedy |
1 | 2 | Jumanji (1995) | Adventure|Children’s|Fantasy |
2 | 3 | Grumpier Old Men (1995) | Comedy|Romance |
3 | 4 | Waiting to Exhale (1995) | Comedy|Drama |
4 | 5 | Father of the Bride Part II (1995) | Comedy |
电影的评分的数据有1百万多个.
将3个表合并为一个表data .
data = pd.merge(pd.merge(users, ratings), movies)
print(len(data))
data.head()
1000209
user_id | gender | age | occupation | zip | movie_id | rating | timestamp | title | genres | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | F | 1 | 10 | 48067 | 1193 | 5 | 978300760 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
1 | 2 | M | 56 | 16 | 70072 | 1193 | 5 | 978298413 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
2 | 12 | M | 25 | 12 | 32793 | 1193 | 4 | 978220179 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
3 | 15 | M | 25 | 7 | 22903 | 1193 | 4 | 978199279 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
4 | 17 | M | 50 | 1 | 95350 | 1193 | 5 | 978158471 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
查看用户id为1,对所有电影的评分.
data[data.user_id==1]
user_id | gender | age | occupation | zip | movie_id | rating | timestamp | title | genres | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | F | 1 | 10 | 48067 | 1193 | 5 | 978300760 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
1725 | 1 | F | 1 | 10 | 48067 | 661 | 3 | 978302109 | James and the Giant Peach (1996) | Animation|Children’s|Musical |
2250 | 1 | F | 1 | 10 | 48067 | 914 | 3 | 978301968 | My Fair Lady (1964) | Musical|Romance |
2886 | 1 | F | 1 | 10 | 48067 | 3408 | 4 | 978300275 | Erin Brockovich (2000) | Drama |
4201 | 1 | F | 1 | 10 | 48067 | 2355 | 5 | 978824291 | Bug’s Life, A (1998) | Animation|Children’s|Comedy |
5904 | 1 | F | 1 | 10 | 48067 | 1197 | 3 | 978302268 | Princess Bride, The (1987) | Action|Adventure|Comedy|Romance |
8222 | 1 | F | 1 | 10 | 48067 | 1287 | 5 | 978302039 | Ben-Hur (1959) | Action|Adventure|Drama |
8926 | 1 | F | 1 | 10 | 48067 | 2804 | 5 | 978300719 | Christmas Story, A (1983) | Comedy|Drama |
10278 | 1 | F | 1 | 10 | 48067 | 594 | 4 | 978302268 | Snow White and the Seven Dwarfs (1937) | Animation|Children’s|Musical |
11041 | 1 | F | 1 | 10 | 48067 | 919 | 4 | 978301368 | Wizard of Oz, The (1939) | Adventure|Children’s|Drama|Musical |
12759 | 1 | F | 1 | 10 | 48067 | 595 | 5 | 978824268 | Beauty and the Beast (1991) | Animation|Children’s|Musical |
13819 | 1 | F | 1 | 10 | 48067 | 938 | 4 | 978301752 | Gigi (1958) | Musical |
14006 | 1 | F | 1 | 10 | 48067 | 2398 | 4 | 978302281 | Miracle on 34th Street (1947) | Drama |
14386 | 1 | F | 1 | 10 | 48067 | 2918 | 4 | 978302124 | Ferris Bueller’s Day Off (1986) | Comedy |
15859 | 1 | F | 1 | 10 | 48067 | 1035 | 5 | 978301753 | Sound of Music, The (1965) | Musical |
16741 | 1 | F | 1 | 10 | 48067 | 2791 | 4 | 978302188 | Airplane! (1980) | Comedy |
18472 | 1 | F | 1 | 10 | 48067 | 2687 | 3 | 978824268 | Tarzan (1999) | Animation|Children’s |
18914 | 1 | F | 1 | 10 | 48067 | 2018 | 4 | 978301777 | Bambi (1942) | Animation|Children’s |
19503 | 1 | F | 1 | 10 | 48067 | 3105 | 5 | 978301713 | Awakenings (1990) | Drama |
20183 | 1 | F | 1 | 10 | 48067 | 2797 | 4 | 978302039 | Big (1988) | Comedy|Fantasy |
21674 | 1 | F | 1 | 10 | 48067 | 2321 | 3 | 978302205 | Pleasantville (1998) | Comedy |
22832 | 1 | F | 1 | 10 | 48067 | 720 | 3 | 978300760 | Wallace & Gromit: The Best of Aardman Animatio… | Animation |
23270 | 1 | F | 1 | 10 | 48067 | 1270 | 5 | 978300055 | Back to the Future (1985) | Comedy|Sci-Fi |
25853 | 1 | F | 1 | 10 | 48067 | 527 | 5 | 978824195 | Schindler’s List (1993) | Drama|War |
28157 | 1 | F | 1 | 10 | 48067 | 2340 | 3 | 978300103 | Meet Joe Black (1998) | Romance |
28501 | 1 | F | 1 | 10 | 48067 | 48 | 5 | 978824351 | Pocahontas (1995) | Animation|Children’s|Musical|Romance |
28883 | 1 | F | 1 | 10 | 48067 | 1097 | 4 | 978301953 | E.T. the Extra-Terrestrial (1982) | Children’s|Drama|Fantasy|Sci-Fi |
31152 | 1 | F | 1 | 10 | 48067 | 1721 | 4 | 978300055 | Titanic (1997) | Drama|Romance |
32698 | 1 | F | 1 | 10 | 48067 | 1545 | 4 | 978824139 | Ponette (1996) | Drama |
32771 | 1 | F | 1 | 10 | 48067 | 745 | 3 | 978824268 | Close Shave, A (1995) | Animation|Comedy|Thriller |
33428 | 1 | F | 1 | 10 | 48067 | 2294 | 4 | 978824291 | Antz (1998) | Animation|Children’s |
34073 | 1 | F | 1 | 10 | 48067 | 3186 | 4 | 978300019 | Girl, Interrupted (1999) | Drama |
34504 | 1 | F | 1 | 10 | 48067 | 1566 | 4 | 978824330 | Hercules (1997) | Adventure|Animation|Children’s|Comedy|Musical |
34973 | 1 | F | 1 | 10 | 48067 | 588 | 4 | 978824268 | Aladdin (1992) | Animation|Children’s|Comedy|Musical |
36324 | 1 | F | 1 | 10 | 48067 | 1907 | 4 | 978824330 | Mulan (1998) | Animation|Children’s |
36814 | 1 | F | 1 | 10 | 48067 | 783 | 4 | 978824291 | Hunchback of Notre Dame, The (1996) | Animation|Children’s|Musical |
37204 | 1 | F | 1 | 10 | 48067 | 1836 | 5 | 978300172 | Last Days of Disco, The (1998) | Drama |
37339 | 1 | F | 1 | 10 | 48067 | 1022 | 5 | 978300055 | Cinderella (1950) | Animation|Children’s|Musical |
37916 | 1 | F | 1 | 10 | 48067 | 2762 | 4 | 978302091 | Sixth Sense, The (1999) | Thriller |
40375 | 1 | F | 1 | 10 | 48067 | 150 | 5 | 978301777 | Apollo 13 (1995) | Drama |
41626 | 1 | F | 1 | 10 | 48067 | 1 | 5 | 978824268 | Toy Story (1995) | Animation|Children’s|Comedy |
43703 | 1 | F | 1 | 10 | 48067 | 1961 | 5 | 978301590 | Rain Man (1988) | Drama |
45033 | 1 | F | 1 | 10 | 48067 | 1962 | 4 | 978301753 | Driving Miss Daisy (1989) | Drama |
45685 | 1 | F | 1 | 10 | 48067 | 2692 | 4 | 978301570 | Run Lola Run (Lola rennt) (1998) | Action|Crime|Romance |
46757 | 1 | F | 1 | 10 | 48067 | 260 | 4 | 978300760 | Star Wars: Episode IV - A New Hope (1977) | Action|Adventure|Fantasy|Sci-Fi |
49748 | 1 | F | 1 | 10 | 48067 | 1028 | 5 | 978301777 | Mary Poppins (1964) | Children’s|Comedy|Musical |
50759 | 1 | F | 1 | 10 | 48067 | 1029 | 5 | 978302205 | Dumbo (1941) | Animation|Children’s|Musical |
51327 | 1 | F | 1 | 10 | 48067 | 1207 | 4 | 978300719 | To Kill a Mockingbird (1962) | Drama |
52255 | 1 | F | 1 | 10 | 48067 | 2028 | 5 | 978301619 | Saving Private Ryan (1998) | Action|Drama|War |
54908 | 1 | F | 1 | 10 | 48067 | 531 | 4 | 978302149 | Secret Garden, The (1993) | Children’s|Drama |
55246 | 1 | F | 1 | 10 | 48067 | 3114 | 4 | 978302174 | Toy Story 2 (1999) | Animation|Children’s|Comedy |
56831 | 1 | F | 1 | 10 | 48067 | 608 | 4 | 978301398 | Fargo (1996) | Crime|Drama|Thriller |
59344 | 1 | F | 1 | 10 | 48067 | 1246 | 4 | 978302091 | Dead Poets Society (1989) | Drama |
不同性别对不同电影的平均评分.
mean_ratings_by_gender = data.pivot_table(values=‘rating‘,index=‘title‘,columns=‘gender‘, aggfunc=‘mean‘)
mean_ratings_by_gender.head(10)#查看前10条数据
gender | F | M |
---|---|---|
title | ||
$1,000,000 Duck (1971) | 3.375000 | 2.761905 |
‘Night Mother (1986) | 3.388889 | 3.352941 |
‘Til There Was You (1997) | 2.675676 | 2.733333 |
‘burbs, The (1989) | 2.793478 | 2.962085 |
…And Justice for All (1979) | 3.828571 | 3.689024 |
1-900 (1994) | 2.000000 | 3.000000 |
10 Things I Hate About You (1999) | 3.646552 | 3.311966 |
101 Dalmatians (1961) | 3.791444 | 3.500000 |
101 Dalmatians (1996) | 3.240000 | 2.911215 |
12 Angry Men (1957) | 4.184397 | 4.328421 |
mean_ratings_by_gender增加一列,男女的平均评分差.
mean_ratings_by_gender[‘diff‘] = mean_ratings_by_gender.F - mean_ratings_by_gender.M
mean_ratings_by_gender.head()
gender | F | M | diff |
---|---|---|---|
title | |||
$1,000,000 Duck (1971) | 3.375000 | 2.761905 | 0.613095 |
‘Night Mother (1986) | 3.388889 | 3.352941 | 0.035948 |
‘Til There Was You (1997) | 2.675676 | 2.733333 | -0.057658 |
‘burbs, The (1989) | 2.793478 | 2.962085 | -0.168607 |
…And Justice for All (1979) | 3.828571 | 3.689024 | 0.139547 |
哪些电影是男女评分差异最大的(男性评分高女生评分低,女性高男性低).
mean_ratings_by_gender.sort_values(by=‘diff‘,ascending=True).head()
#男高女低
gender | F | M | diff |
---|---|---|---|
title | |||
Tigrero: A Film That Was Never Made (1994) | 1.0 | 4.333333 | -3.333333 |
Neon Bible, The (1995) | 1.0 | 4.000000 | -3.000000 |
Enfer, L’ (1994) | 1.0 | 3.750000 | -2.750000 |
Stalingrad (1993) | 1.0 | 3.593750 | -2.593750 |
Killer: A Journal of Murder (1995) | 1.0 | 3.428571 | -2.428571 |
mean_ratings_by_gender.sort_values(by=‘diff‘,ascending=False).head()
#女高男低
gender | F | M | diff |
---|---|---|---|
title | |||
James Dean Story, The (1957) | 4.000000 | 1.000000 | 3.000000 |
Spiders, The (Die Spinnen, 1. Teil: Der Goldene See) (1919) | 4.000000 | 1.000000 | 3.000000 |
Country Life (1994) | 5.000000 | 2.000000 | 3.000000 |
Babyfever (1994) | 3.666667 | 1.000000 | 2.666667 |
Woman of Paris, A (1923) | 5.000000 | 2.428571 | 2.571429 |
不同电影的评分次数.
total_rating_by_title = data.groupby(‘title‘).size()
total_rating_by_title #第一列是电影标题,第二列是评分次数
title
$1,000,000 Duck (1971) 37
‘Night Mother (1986) 70
‘Til There Was You (1997) 52
‘burbs, The (1989) 303
...And Justice for All (1979) 199
1-900 (1994) 2
10 Things I Hate About You (1999) 700
101 Dalmatians (1961) 565
101 Dalmatians (1996) 364
12 Angry Men (1957) 616
13th Warrior, The (1999) 750
187 (1997) 55
2 Days in the Valley (1996) 286
20 Dates (1998) 139
20,000 Leagues Under the Sea (1954) 575
200 Cigarettes (1999) 181
2001: A Space Odyssey (1968) 1716
2010 (1984) 470
24 7: Twenty Four Seven (1997) 5
24-hour Woman (1998) 9
28 Days (2000) 505
3 Ninjas: High Noon On Mega Mountain (1998) 47
3 Strikes (2000) 4
301, 302 (1995) 9
39 Steps, The (1935) 253
400 Blows, The (Les Quatre cents coups) (1959) 187
42 Up (1998) 88
52 Pick-Up (1986) 140
54 (1998) 259
7th Voyage of Sinbad, The (1958) 258
...
Wrongfully Accused (1998) 123
Wyatt Earp (1994) 270
X-Files: Fight the Future, The (1998) 996
X-Men (2000) 1511
X: The Unknown (1956) 12
Xiu Xiu: The Sent-Down Girl (Tian yu) (1998) 69
Yankee Zulu (1994) 2
Yards, The (1999) 77
Year My Voice Broke, The (1987) 27
Year of Living Dangerously (1982) 391
Year of the Horse (1997) 4
Yellow Submarine (1968) 399
Yojimbo (1961) 215
You Can‘t Take It With You (1938) 77
You So Crazy (1994) 13
You‘ve Got Mail (1998) 838
Young Doctors in Love (1982) 79
Young Frankenstein (1974) 1193
Young Guns (1988) 562
Young Guns II (1990) 369
Young Poisoner‘s Handbook, The (1995) 79
Young Sherlock Holmes (1985) 379
Young and Innocent (1937) 10
Your Friends and Neighbors (1998) 109
Zachariah (1971) 2
Zed & Two Noughts, A (1985) 29
Zero Effect (1998) 301
Zero Kelvin (Kj鎟lighetens kj鴗ere) (1995) 2
Zeus and Roxanne (1997) 23
eXistenZ (1999) 410
dtype: int64
评分次数最多的10部电影.
top_10_total_rating = total_rating_by_title.sort_values(ascending=False).head(10)
top_10_total_rating
title
American Beauty (1999) 3428
Star Wars: Episode IV - A New Hope (1977) 2991
Star Wars: Episode V - The Empire Strikes Back (1980) 2990
Star Wars: Episode VI - Return of the Jedi (1983) 2883
Jurassic Park (1993) 2672
Saving Private Ryan (1998) 2653
Terminator 2: Judgment Day (1991) 2649
Matrix, The (1999) 2590
Back to the Future (1985) 2583
Silence of the Lambs, The (1991) 2578
dtype: int64
可以看出,评分次数最多的电影一般是我们比较熟知的电影,一般可认为是热门电影.
再来看看评分最高的10大电影(注:最高分为5.0)
mean_ratings_by_title = data.pivot_table(values=‘rating‘,index=‘title‘,aggfunc=‘mean‘)
top_10_mean_ratings = mean_ratings_by_title.sort_values(ascending=False).head(10)
top_10_mean_ratings
title
Gate of Heavenly Peace, The (1995) 5.0
Lured (1947) 5.0
Ulysses (Ulisse) (1954) 5.0
Smashing Time (1967) 5.0
Follow the Bitch (1998) 5.0
Song of Freedom (1936) 5.0
Bittersweet Motel (2000) 5.0
Baby, The (1973) 5.0
One Little Indian (1973) 5.0
Schlafes Bruder (Brother of Sleep) (1995) 5.0
Name: rating, dtype: float64
评分人数最多的10部电影的平均评分.
mean_ratings_by_title[top_10_total_rating.index]
title
American Beauty (1999) 4.317386
Star Wars: Episode IV - A New Hope (1977) 4.453694
Star Wars: Episode V - The Empire Strikes Back (1980) 4.292977
Star Wars: Episode VI - Return of the Jedi (1983) 4.022893
Jurassic Park (1993) 3.763847
Saving Private Ryan (1998) 4.337354
Terminator 2: Judgment Day (1991) 4.058513
Matrix, The (1999) 4.315830
Back to the Future (1985) 3.990321
Silence of the Lambs, The (1991) 4.351823
Name: rating, dtype: float64
可以了解到评论人数最多的10部电影在平均评分最高的10大中排名并不高,评分高的电影有一部分是我们不熟知的电影,是不是数据有问题呢?其实不是,
假如有某部烂片,去观影的人很少,这很少的人给了很高的评分,所以导致一些评论人数很少但平均评分和高的电影.
如若不信,请看数据,评分最高的10大电影的评论次数
total_rating_by_title[top_10_mean_ratings.index]
title
Gate of Heavenly Peace, The (1995) 3
Lured (1947) 1
Ulysses (Ulisse) (1954) 1
Smashing Time (1967) 2
Follow the Bitch (1998) 1
Song of Freedom (1936) 1
Bittersweet Motel (2000) 1
Baby, The (1973) 1
One Little Indian (1973) 1
Schlafes Bruder (Brother of Sleep) (1995) 1
dtype: int64
现在来重新统计10大热门电影,此处认为热门电影至少有1000人评论。
统计出热门电影
hot_movie = total_rating_by_title[total_rating_by_title>1000]
print(len(hot_movie))
hot_movie
207
title
2001: A Space Odyssey (1968) 1716
Abyss, The (1989) 1715
African Queen, The (1951) 1057
Air Force One (1997) 1076
Airplane! (1980) 1731
Aladdin (1992) 1351
Alien (1979) 2024
Aliens (1986) 1820
Amadeus (1984) 1382
American Beauty (1999) 3428
American Pie (1999) 1389
American President, The (1995) 1033
Animal House (1978) 1207
Annie Hall (1977) 1334
Apocalypse Now (1979) 1176
Apollo 13 (1995) 1251
Arachnophobia (1990) 1367
Armageddon (1998) 1110
As Good As It Gets (1997) 1424
Austin Powers: International Man of Mystery (1997) 1205
Austin Powers: The Spy Who Shagged Me (1999) 1434
Babe (1995) 1751
Back to the Future (1985) 2583
Back to the Future Part II (1989) 1158
Back to the Future Part III (1990) 1148
Batman (1989) 1431
Batman Returns (1992) 1031
Beauty and the Beast (1991) 1060
Beetlejuice (1988) 1495
Being John Malkovich (1999) 2241
...
Superman (1978) 1222
Talented Mr. Ripley, The (1999) 1331
Taxi Driver (1976) 1240
Terminator 2: Judgment Day (1991) 2649
Terminator, The (1984) 2098
Thelma & Louise (1991) 1417
There‘s Something About Mary (1998) 1371
This Is Spinal Tap (1984) 1118
Thomas Crown Affair, The (1999) 1089
Three Kings (1999) 1021
Time Bandits (1981) 1010
Titanic (1997) 1546
Top Gun (1986) 1010
Total Recall (1990) 1996
Toy Story (1995) 2077
Toy Story 2 (1999) 1585
True Lies (1994) 1400
Truman Show, The (1998) 1005
Twelve Monkeys (1995) 1511
Twister (1996) 1110
Untouchables, The (1987) 1127
Usual Suspects, The (1995) 1783
Wayne‘s World (1992) 1120
When Harry Met Sally... (1989) 1568
Who Framed Roger Rabbit? (1988) 1799
Willy Wonka and the Chocolate Factory (1971) 1313
Witness (1985) 1046
Wizard of Oz, The (1939) 1718
X-Men (2000) 1511
Young Frankenstein (1974) 1193
dtype: int64
#热门电影的评分
hot_movie_mean_rating = mean_ratings_by_title[hot_movie.index]
print(len(hot_movie_mean_rating))
hot_movie_mean_rating
207
title
2001: A Space Odyssey (1968) 4.068765
Abyss, The (1989) 3.683965
African Queen, The (1951) 4.251656
Air Force One (1997) 3.588290
Airplane! (1980) 3.971115
Aladdin (1992) 3.788305
Alien (1979) 4.159585
Aliens (1986) 4.125824
Amadeus (1984) 4.251809
American Beauty (1999) 4.317386
American Pie (1999) 3.709863
American President, The (1995) 3.793804
Animal House (1978) 4.053024
Annie Hall (1977) 4.141679
Apocalypse Now (1979) 4.243197
Apollo 13 (1995) 4.073541
Arachnophobia (1990) 3.002926
Armageddon (1998) 3.191892
As Good As It Gets (1997) 3.950140
Austin Powers: International Man of Mystery (1997) 3.710373
Austin Powers: The Spy Who Shagged Me (1999) 3.388424
Babe (1995) 3.891491
Back to the Future (1985) 3.990321
Back to the Future Part II (1989) 3.343696
Back to the Future Part III (1990) 3.242160
Batman (1989) 3.600978
Batman Returns (1992) 2.976722
Beauty and the Beast (1991) 3.885849
Beetlejuice (1988) 3.567893
Being John Malkovich (1999) 4.125390
...
Superman (1978) 3.536825
Talented Mr. Ripley, The (1999) 3.503381
Taxi Driver (1976) 4.183871
Terminator 2: Judgment Day (1991) 4.058513
Terminator, The (1984) 4.152050
Thelma & Louise (1991) 3.680311
There‘s Something About Mary (1998) 3.904449
This Is Spinal Tap (1984) 4.179785
Thomas Crown Affair, The (1999) 3.641873
Three Kings (1999) 3.807052
Time Bandits (1981) 3.694059
Titanic (1997) 3.583441
Top Gun (1986) 3.686139
Total Recall (1990) 3.682365
Toy Story (1995) 4.146846
Toy Story 2 (1999) 4.218927
True Lies (1994) 3.634286
Truman Show, The (1998) 3.861692
Twelve Monkeys (1995) 3.945731
Twister (1996) 3.173874
Untouchables, The (1987) 4.007986
Usual Suspects, The (1995) 4.517106
Wayne‘s World (1992) 3.600893
When Harry Met Sally... (1989) 4.073342
Who Framed Roger Rabbit? (1988) 3.679822
Willy Wonka and the Chocolate Factory (1971) 3.861386
Witness (1985) 3.996176
Wizard of Oz, The (1939) 4.247963
X-Men (2000) 3.820649
Young Frankenstein (1974) 4.250629
Name: rating, dtype: float64
#评论人数>=1000的10大评分最高电影
top_10_rating_movie = hot_movie_mean_rating.sort_values(ascending=False).head(10)
top_10_rating_movie
title
Shawshank Redemption, The (1994) 4.554558
Godfather, The (1972) 4.524966
Usual Suspects, The (1995) 4.517106
Schindler‘s List (1993) 4.510417
Raiders of the Lost Ark (1981) 4.477725
Rear Window (1954) 4.476190
Star Wars: Episode IV - A New Hope (1977) 4.453694
Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963) 4.449890
Casablanca (1942) 4.412822
Sixth Sense, The (1999) 4.406263
Name: rating, dtype: float64
%matplotlib inline #在ipython(或jupyter)中使用此命令,其他则不必
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(1,11)
y = top_10_rating_movie.values
name = top_10_rating_movie.index
#画出图像
plt.plot(x, y, ‘r-o‘)
#添加注释
for i in range(10):
plt.text(x[i], y[i], name[i])
#设置坐标范围
plt.xlim(0, 15)
plt.ylim(4.4, 4.56)
#设置坐标标题
#plt.xlabel(‘Rank‘)
#plt.ylabel=(‘Rating‘)
#plt.show() #非ipython用户使用此命令
这图太丑,献上下图:
import matplotlib.pyplot as plt
import numpy as np
plt.rcdefaults()
people = name
y_pos = np.arange(len(people))
performance = y
error = np.random.rand(len(people))
plt.barh(y_pos, performance, xerr=error, align=‘center‘, alpha=0.4)
plt.yticks(y_pos, people)
#plt.xlabel(‘Rating‘)
#plt.title(‘Rank‘)
#plt.show() #非ipython用户使用此命令
)
时间: 2024-10-09 19:25:34