用pandas分析百万电影数据

用pandas分析电影数据

Lift is short, use Python.

用Python做数据分析,pandas是Python数据分析的重要包,其他重要的包:numpy、matplotlib .

安装pandas(Linux, Mac, Windows皆同):

pip install pandas

电影数据来源:http://grouplens.org/datasets/movielens/

下载数据文件解压,包含如下4个文件:

  • users.dat 用户数据
  • movies.dat 电影数据
  • ratings.dat 评分数据
  • README 文件解释

查看README文件,可知源数据文件的格式:

  • users.dat (UserID::Gender::Age::Occupation::Zip-code)
  • movies.dat (MovieID::Title::Genres)
  • ratings.dat (UserID::MovieID::Rating::Timestamp)

特别解释:Occupation用户职业,Zip-code邮编, Timestamp时间戳, Genres电影类型(更多解释可以查看README文件).

文件中各每条数据的分割符是 ::



环境:

  • OS:Windows
  • Language:Python3.4
  • 编辑器:Jupyter

用pandas读取数据.

导入必要的头文件:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

读取数据,先定义字段名,因为源数据中无字段名,只有用’::’分割的每条数据.

user_names = [‘user_id‘, ‘gender‘, ‘age‘, ‘occupation‘, ‘zip‘] #用户表的数据字段名

读取数据,注意源文件的地址.

users = pd.read_table(‘C:\\Users\\Administrator\\Downloads\\ml-1m\\users.dat‘, sep=‘::‘, header=None, names=user_names)
D:\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: ParserWarning: Falling back to the ‘python‘ engine because the ‘c‘ engine does not support regex separators; you can avoid this warning by specifying engine=‘python‘.
  if __name__ == ‘__main__‘:

上面有个警告,可以不管,即:加载数据是用的python engine 而不是 c engine.(更多请google)

查看有多少个数据.

前5行数据.

print(len(users))
users.head()

6040

user_id gender age occupation zip
0 1 F 1 10 48067
1 2 M 56 16 70072
2 3 M 25 15 55117
3 4 M 45 7 02460
4 5 M 25 20 55455

同理将movies,ratings数据读进来.

ratings_names = [‘user_id‘, ‘movie_id‘, ‘rating‘, ‘timestamp‘]
ratings = pd.read_table(‘C:\\Users\\Administrator\\Downloads\\ml-1m\\ratings.dat‘, sep=‘::‘, header=None, names=ratings_names)
movies_names = [‘movie_id‘, ‘title‘, ‘genres‘]
movies = pd.read_table(‘C:\\Users\\Administrator\\Downloads\\ml-1m\\movies.dat‘, sep=‘::‘, header=None, names=movies_names)
D:\Anaconda3\lib\site-packages\ipykernel\__main__.py:2: ParserWarning: Falling back to the ‘python‘ engine because the ‘c‘ engine does not support regex separators; you can avoid this warning by specifying engine=‘python‘.
  from ipykernel import kernelapp as app
D:\Anaconda3\lib\site-packages\ipykernel\__main__.py:4: ParserWarning: Falling back to the ‘python‘ engine because the ‘c‘ engine does not support regex separators; you can avoid this warning by specifying engine=‘python‘.

加载数据需要一点点时间,应为数据有上百万条.

查看ratings表,movies表.

print(len(ratings))
ratings.head()

1000209

user_id movie_id rating timestamp
0 1 1193 5 978300760
1 1 661 3 978302109
2 1 914 3 978301968
3 1 3408 4 978300275
4 1 2355 5 978824291
print(len(movies))
movies.head()

3883

movie_id title genres
0 1 Toy Story (1995) Animation|Children’s|Comedy
1 2 Jumanji (1995) Adventure|Children’s|Fantasy
2 3 Grumpier Old Men (1995) Comedy|Romance
3 4 Waiting to Exhale (1995) Comedy|Drama
4 5 Father of the Bride Part II (1995) Comedy

电影的评分的数据有1百万多个.

将3个表合并为一个表data .

data = pd.merge(pd.merge(users, ratings), movies)
print(len(data))
data.head()

1000209

user_id gender age occupation zip movie_id rating timestamp title genres
0 1 F 1 10 48067 1193 5 978300760 One Flew Over the Cuckoo’s Nest (1975) Drama
1 2 M 56 16 70072 1193 5 978298413 One Flew Over the Cuckoo’s Nest (1975) Drama
2 12 M 25 12 32793 1193 4 978220179 One Flew Over the Cuckoo’s Nest (1975) Drama
3 15 M 25 7 22903 1193 4 978199279 One Flew Over the Cuckoo’s Nest (1975) Drama
4 17 M 50 1 95350 1193 5 978158471 One Flew Over the Cuckoo’s Nest (1975) Drama

查看用户id为1,对所有电影的评分.

data[data.user_id==1]
user_id gender age occupation zip movie_id rating timestamp title genres
0 1 F 1 10 48067 1193 5 978300760 One Flew Over the Cuckoo’s Nest (1975) Drama
1725 1 F 1 10 48067 661 3 978302109 James and the Giant Peach (1996) Animation|Children’s|Musical
2250 1 F 1 10 48067 914 3 978301968 My Fair Lady (1964) Musical|Romance
2886 1 F 1 10 48067 3408 4 978300275 Erin Brockovich (2000) Drama
4201 1 F 1 10 48067 2355 5 978824291 Bug’s Life, A (1998) Animation|Children’s|Comedy
5904 1 F 1 10 48067 1197 3 978302268 Princess Bride, The (1987) Action|Adventure|Comedy|Romance
8222 1 F 1 10 48067 1287 5 978302039 Ben-Hur (1959) Action|Adventure|Drama
8926 1 F 1 10 48067 2804 5 978300719 Christmas Story, A (1983) Comedy|Drama
10278 1 F 1 10 48067 594 4 978302268 Snow White and the Seven Dwarfs (1937) Animation|Children’s|Musical
11041 1 F 1 10 48067 919 4 978301368 Wizard of Oz, The (1939) Adventure|Children’s|Drama|Musical
12759 1 F 1 10 48067 595 5 978824268 Beauty and the Beast (1991) Animation|Children’s|Musical
13819 1 F 1 10 48067 938 4 978301752 Gigi (1958) Musical
14006 1 F 1 10 48067 2398 4 978302281 Miracle on 34th Street (1947) Drama
14386 1 F 1 10 48067 2918 4 978302124 Ferris Bueller’s Day Off (1986) Comedy
15859 1 F 1 10 48067 1035 5 978301753 Sound of Music, The (1965) Musical
16741 1 F 1 10 48067 2791 4 978302188 Airplane! (1980) Comedy
18472 1 F 1 10 48067 2687 3 978824268 Tarzan (1999) Animation|Children’s
18914 1 F 1 10 48067 2018 4 978301777 Bambi (1942) Animation|Children’s
19503 1 F 1 10 48067 3105 5 978301713 Awakenings (1990) Drama
20183 1 F 1 10 48067 2797 4 978302039 Big (1988) Comedy|Fantasy
21674 1 F 1 10 48067 2321 3 978302205 Pleasantville (1998) Comedy
22832 1 F 1 10 48067 720 3 978300760 Wallace & Gromit: The Best of Aardman Animatio… Animation
23270 1 F 1 10 48067 1270 5 978300055 Back to the Future (1985) Comedy|Sci-Fi
25853 1 F 1 10 48067 527 5 978824195 Schindler’s List (1993) Drama|War
28157 1 F 1 10 48067 2340 3 978300103 Meet Joe Black (1998) Romance
28501 1 F 1 10 48067 48 5 978824351 Pocahontas (1995) Animation|Children’s|Musical|Romance
28883 1 F 1 10 48067 1097 4 978301953 E.T. the Extra-Terrestrial (1982) Children’s|Drama|Fantasy|Sci-Fi
31152 1 F 1 10 48067 1721 4 978300055 Titanic (1997) Drama|Romance
32698 1 F 1 10 48067 1545 4 978824139 Ponette (1996) Drama
32771 1 F 1 10 48067 745 3 978824268 Close Shave, A (1995) Animation|Comedy|Thriller
33428 1 F 1 10 48067 2294 4 978824291 Antz (1998) Animation|Children’s
34073 1 F 1 10 48067 3186 4 978300019 Girl, Interrupted (1999) Drama
34504 1 F 1 10 48067 1566 4 978824330 Hercules (1997) Adventure|Animation|Children’s|Comedy|Musical
34973 1 F 1 10 48067 588 4 978824268 Aladdin (1992) Animation|Children’s|Comedy|Musical
36324 1 F 1 10 48067 1907 4 978824330 Mulan (1998) Animation|Children’s
36814 1 F 1 10 48067 783 4 978824291 Hunchback of Notre Dame, The (1996) Animation|Children’s|Musical
37204 1 F 1 10 48067 1836 5 978300172 Last Days of Disco, The (1998) Drama
37339 1 F 1 10 48067 1022 5 978300055 Cinderella (1950) Animation|Children’s|Musical
37916 1 F 1 10 48067 2762 4 978302091 Sixth Sense, The (1999) Thriller
40375 1 F 1 10 48067 150 5 978301777 Apollo 13 (1995) Drama
41626 1 F 1 10 48067 1 5 978824268 Toy Story (1995) Animation|Children’s|Comedy
43703 1 F 1 10 48067 1961 5 978301590 Rain Man (1988) Drama
45033 1 F 1 10 48067 1962 4 978301753 Driving Miss Daisy (1989) Drama
45685 1 F 1 10 48067 2692 4 978301570 Run Lola Run (Lola rennt) (1998) Action|Crime|Romance
46757 1 F 1 10 48067 260 4 978300760 Star Wars: Episode IV - A New Hope (1977) Action|Adventure|Fantasy|Sci-Fi
49748 1 F 1 10 48067 1028 5 978301777 Mary Poppins (1964) Children’s|Comedy|Musical
50759 1 F 1 10 48067 1029 5 978302205 Dumbo (1941) Animation|Children’s|Musical
51327 1 F 1 10 48067 1207 4 978300719 To Kill a Mockingbird (1962) Drama
52255 1 F 1 10 48067 2028 5 978301619 Saving Private Ryan (1998) Action|Drama|War
54908 1 F 1 10 48067 531 4 978302149 Secret Garden, The (1993) Children’s|Drama
55246 1 F 1 10 48067 3114 4 978302174 Toy Story 2 (1999) Animation|Children’s|Comedy
56831 1 F 1 10 48067 608 4 978301398 Fargo (1996) Crime|Drama|Thriller
59344 1 F 1 10 48067 1246 4 978302091 Dead Poets Society (1989) Drama

不同性别对不同电影的平均评分.

mean_ratings_by_gender = data.pivot_table(values=‘rating‘,index=‘title‘,columns=‘gender‘, aggfunc=‘mean‘)
mean_ratings_by_gender.head(10)#查看前10条数据
gender F M
title
$1,000,000 Duck (1971) 3.375000 2.761905
‘Night Mother (1986) 3.388889 3.352941
‘Til There Was You (1997) 2.675676 2.733333
‘burbs, The (1989) 2.793478 2.962085
…And Justice for All (1979) 3.828571 3.689024
1-900 (1994) 2.000000 3.000000
10 Things I Hate About You (1999) 3.646552 3.311966
101 Dalmatians (1961) 3.791444 3.500000
101 Dalmatians (1996) 3.240000 2.911215
12 Angry Men (1957) 4.184397 4.328421

mean_ratings_by_gender增加一列,男女的平均评分差.

mean_ratings_by_gender[‘diff‘] = mean_ratings_by_gender.F - mean_ratings_by_gender.M
mean_ratings_by_gender.head()
gender F M diff
title
$1,000,000 Duck (1971) 3.375000 2.761905 0.613095
‘Night Mother (1986) 3.388889 3.352941 0.035948
‘Til There Was You (1997) 2.675676 2.733333 -0.057658
‘burbs, The (1989) 2.793478 2.962085 -0.168607
…And Justice for All (1979) 3.828571 3.689024 0.139547

哪些电影是男女评分差异最大的(男性评分高女生评分低,女性高男性低).

mean_ratings_by_gender.sort_values(by=‘diff‘,ascending=True).head()
#男高女低
gender F M diff
title
Tigrero: A Film That Was Never Made (1994) 1.0 4.333333 -3.333333
Neon Bible, The (1995) 1.0 4.000000 -3.000000
Enfer, L’ (1994) 1.0 3.750000 -2.750000
Stalingrad (1993) 1.0 3.593750 -2.593750
Killer: A Journal of Murder (1995) 1.0 3.428571 -2.428571
mean_ratings_by_gender.sort_values(by=‘diff‘,ascending=False).head()
#女高男低
gender F M diff
title
James Dean Story, The (1957) 4.000000 1.000000 3.000000
Spiders, The (Die Spinnen, 1. Teil: Der Goldene See) (1919) 4.000000 1.000000 3.000000
Country Life (1994) 5.000000 2.000000 3.000000
Babyfever (1994) 3.666667 1.000000 2.666667
Woman of Paris, A (1923) 5.000000 2.428571 2.571429

不同电影的评分次数.

total_rating_by_title = data.groupby(‘title‘).size()
total_rating_by_title    #第一列是电影标题,第二列是评分次数
title
$1,000,000 Duck (1971)                              37
‘Night Mother (1986)                                70
‘Til There Was You (1997)                           52
‘burbs, The (1989)                                 303
...And Justice for All (1979)                      199
1-900 (1994)                                         2
10 Things I Hate About You (1999)                  700
101 Dalmatians (1961)                              565
101 Dalmatians (1996)                              364
12 Angry Men (1957)                                616
13th Warrior, The (1999)                           750
187 (1997)                                          55
2 Days in the Valley (1996)                        286
20 Dates (1998)                                    139
20,000 Leagues Under the Sea (1954)                575
200 Cigarettes (1999)                              181
2001: A Space Odyssey (1968)                      1716
2010 (1984)                                        470
24 7: Twenty Four Seven (1997)                       5
24-hour Woman (1998)                                 9
28 Days (2000)                                     505
3 Ninjas: High Noon On Mega Mountain (1998)         47
3 Strikes (2000)                                     4
301, 302 (1995)                                      9
39 Steps, The (1935)                               253
400 Blows, The (Les Quatre cents coups) (1959)     187
42 Up (1998)                                        88
52 Pick-Up (1986)                                  140
54 (1998)                                          259
7th Voyage of Sinbad, The (1958)                   258
                                                  ...
Wrongfully Accused (1998)                          123
Wyatt Earp (1994)                                  270
X-Files: Fight the Future, The (1998)              996
X-Men (2000)                                      1511
X: The Unknown (1956)                               12
Xiu Xiu: The Sent-Down Girl (Tian yu) (1998)        69
Yankee Zulu (1994)                                   2
Yards, The (1999)                                   77
Year My Voice Broke, The (1987)                     27
Year of Living Dangerously (1982)                  391
Year of the Horse (1997)                             4
Yellow Submarine (1968)                            399
Yojimbo (1961)                                     215
You Can‘t Take It With You (1938)                   77
You So Crazy (1994)                                 13
You‘ve Got Mail (1998)                             838
Young Doctors in Love (1982)                        79
Young Frankenstein (1974)                         1193
Young Guns (1988)                                  562
Young Guns II (1990)                               369
Young Poisoner‘s Handbook, The (1995)               79
Young Sherlock Holmes (1985)                       379
Young and Innocent (1937)                           10
Your Friends and Neighbors (1998)                  109
Zachariah (1971)                                     2
Zed & Two Noughts, A (1985)                         29
Zero Effect (1998)                                 301
Zero Kelvin (Kj鎟lighetens kj鴗ere) (1995)             2
Zeus and Roxanne (1997)                             23
eXistenZ (1999)                                    410
dtype: int64

评分次数最多的10部电影.

top_10_total_rating = total_rating_by_title.sort_values(ascending=False).head(10)
top_10_total_rating
title
American Beauty (1999)                                   3428
Star Wars: Episode IV - A New Hope (1977)                2991
Star Wars: Episode V - The Empire Strikes Back (1980)    2990
Star Wars: Episode VI - Return of the Jedi (1983)        2883
Jurassic Park (1993)                                     2672
Saving Private Ryan (1998)                               2653
Terminator 2: Judgment Day (1991)                        2649
Matrix, The (1999)                                       2590
Back to the Future (1985)                                2583
Silence of the Lambs, The (1991)                         2578
dtype: int64
可以看出,评分次数最多的电影一般是我们比较熟知的电影,一般可认为是热门电影.
再来看看评分最高的10大电影(注:最高分为5.0)
mean_ratings_by_title = data.pivot_table(values=‘rating‘,index=‘title‘,aggfunc=‘mean‘)
top_10_mean_ratings = mean_ratings_by_title.sort_values(ascending=False).head(10)
top_10_mean_ratings
title
Gate of Heavenly Peace, The (1995)           5.0
Lured (1947)                                 5.0
Ulysses (Ulisse) (1954)                      5.0
Smashing Time (1967)                         5.0
Follow the Bitch (1998)                      5.0
Song of Freedom (1936)                       5.0
Bittersweet Motel (2000)                     5.0
Baby, The (1973)                             5.0
One Little Indian (1973)                     5.0
Schlafes Bruder (Brother of Sleep) (1995)    5.0
Name: rating, dtype: float64
评分人数最多的10部电影的平均评分.
mean_ratings_by_title[top_10_total_rating.index]
title
American Beauty (1999)                                   4.317386
Star Wars: Episode IV - A New Hope (1977)                4.453694
Star Wars: Episode V - The Empire Strikes Back (1980)    4.292977
Star Wars: Episode VI - Return of the Jedi (1983)        4.022893
Jurassic Park (1993)                                     3.763847
Saving Private Ryan (1998)                               4.337354
Terminator 2: Judgment Day (1991)                        4.058513
Matrix, The (1999)                                       4.315830
Back to the Future (1985)                                3.990321
Silence of the Lambs, The (1991)                         4.351823
Name: rating, dtype: float64
可以了解到评论人数最多的10部电影在平均评分最高的10大中排名并不高,评分高的电影有一部分是我们不熟知的电影,是不是数据有问题呢?其实不是,
假如有某部烂片,去观影的人很少,这很少的人给了很高的评分,所以导致一些评论人数很少但平均评分和高的电影.
如若不信,请看数据,评分最高的10大电影的评论次数
total_rating_by_title[top_10_mean_ratings.index]
title
Gate of Heavenly Peace, The (1995)           3
Lured (1947)                                 1
Ulysses (Ulisse) (1954)                      1
Smashing Time (1967)                         2
Follow the Bitch (1998)                      1
Song of Freedom (1936)                       1
Bittersweet Motel (2000)                     1
Baby, The (1973)                             1
One Little Indian (1973)                     1
Schlafes Bruder (Brother of Sleep) (1995)    1
dtype: int64
现在来重新统计10大热门电影,此处认为热门电影至少有1000人评论。
统计出热门电影
hot_movie = total_rating_by_title[total_rating_by_title>1000]
print(len(hot_movie))
hot_movie
207

title
2001: A Space Odyssey (1968)                          1716
Abyss, The (1989)                                     1715
African Queen, The (1951)                             1057
Air Force One (1997)                                  1076
Airplane! (1980)                                      1731
Aladdin (1992)                                        1351
Alien (1979)                                          2024
Aliens (1986)                                         1820
Amadeus (1984)                                        1382
American Beauty (1999)                                3428
American Pie (1999)                                   1389
American President, The (1995)                        1033
Animal House (1978)                                   1207
Annie Hall (1977)                                     1334
Apocalypse Now (1979)                                 1176
Apollo 13 (1995)                                      1251
Arachnophobia (1990)                                  1367
Armageddon (1998)                                     1110
As Good As It Gets (1997)                             1424
Austin Powers: International Man of Mystery (1997)    1205
Austin Powers: The Spy Who Shagged Me (1999)          1434
Babe (1995)                                           1751
Back to the Future (1985)                             2583
Back to the Future Part II (1989)                     1158
Back to the Future Part III (1990)                    1148
Batman (1989)                                         1431
Batman Returns (1992)                                 1031
Beauty and the Beast (1991)                           1060
Beetlejuice (1988)                                    1495
Being John Malkovich (1999)                           2241
                                                      ...
Superman (1978)                                       1222
Talented Mr. Ripley, The (1999)                       1331
Taxi Driver (1976)                                    1240
Terminator 2: Judgment Day (1991)                     2649
Terminator, The (1984)                                2098
Thelma & Louise (1991)                                1417
There‘s Something About Mary (1998)                   1371
This Is Spinal Tap (1984)                             1118
Thomas Crown Affair, The (1999)                       1089
Three Kings (1999)                                    1021
Time Bandits (1981)                                   1010
Titanic (1997)                                        1546
Top Gun (1986)                                        1010
Total Recall (1990)                                   1996
Toy Story (1995)                                      2077
Toy Story 2 (1999)                                    1585
True Lies (1994)                                      1400
Truman Show, The (1998)                               1005
Twelve Monkeys (1995)                                 1511
Twister (1996)                                        1110
Untouchables, The (1987)                              1127
Usual Suspects, The (1995)                            1783
Wayne‘s World (1992)                                  1120
When Harry Met Sally... (1989)                        1568
Who Framed Roger Rabbit? (1988)                       1799
Willy Wonka and the Chocolate Factory (1971)          1313
Witness (1985)                                        1046
Wizard of Oz, The (1939)                              1718
X-Men (2000)                                          1511
Young Frankenstein (1974)                             1193
dtype: int64
#热门电影的评分
hot_movie_mean_rating = mean_ratings_by_title[hot_movie.index]
print(len(hot_movie_mean_rating))
hot_movie_mean_rating
207

title
2001: A Space Odyssey (1968)                          4.068765
Abyss, The (1989)                                     3.683965
African Queen, The (1951)                             4.251656
Air Force One (1997)                                  3.588290
Airplane! (1980)                                      3.971115
Aladdin (1992)                                        3.788305
Alien (1979)                                          4.159585
Aliens (1986)                                         4.125824
Amadeus (1984)                                        4.251809
American Beauty (1999)                                4.317386
American Pie (1999)                                   3.709863
American President, The (1995)                        3.793804
Animal House (1978)                                   4.053024
Annie Hall (1977)                                     4.141679
Apocalypse Now (1979)                                 4.243197
Apollo 13 (1995)                                      4.073541
Arachnophobia (1990)                                  3.002926
Armageddon (1998)                                     3.191892
As Good As It Gets (1997)                             3.950140
Austin Powers: International Man of Mystery (1997)    3.710373
Austin Powers: The Spy Who Shagged Me (1999)          3.388424
Babe (1995)                                           3.891491
Back to the Future (1985)                             3.990321
Back to the Future Part II (1989)                     3.343696
Back to the Future Part III (1990)                    3.242160
Batman (1989)                                         3.600978
Batman Returns (1992)                                 2.976722
Beauty and the Beast (1991)                           3.885849
Beetlejuice (1988)                                    3.567893
Being John Malkovich (1999)                           4.125390
                                                        ...
Superman (1978)                                       3.536825
Talented Mr. Ripley, The (1999)                       3.503381
Taxi Driver (1976)                                    4.183871
Terminator 2: Judgment Day (1991)                     4.058513
Terminator, The (1984)                                4.152050
Thelma & Louise (1991)                                3.680311
There‘s Something About Mary (1998)                   3.904449
This Is Spinal Tap (1984)                             4.179785
Thomas Crown Affair, The (1999)                       3.641873
Three Kings (1999)                                    3.807052
Time Bandits (1981)                                   3.694059
Titanic (1997)                                        3.583441
Top Gun (1986)                                        3.686139
Total Recall (1990)                                   3.682365
Toy Story (1995)                                      4.146846
Toy Story 2 (1999)                                    4.218927
True Lies (1994)                                      3.634286
Truman Show, The (1998)                               3.861692
Twelve Monkeys (1995)                                 3.945731
Twister (1996)                                        3.173874
Untouchables, The (1987)                              4.007986
Usual Suspects, The (1995)                            4.517106
Wayne‘s World (1992)                                  3.600893
When Harry Met Sally... (1989)                        4.073342
Who Framed Roger Rabbit? (1988)                       3.679822
Willy Wonka and the Chocolate Factory (1971)          3.861386
Witness (1985)                                        3.996176
Wizard of Oz, The (1939)                              4.247963
X-Men (2000)                                          3.820649
Young Frankenstein (1974)                             4.250629
Name: rating, dtype: float64
#评论人数>=1000的10大评分最高电影
top_10_rating_movie = hot_movie_mean_rating.sort_values(ascending=False).head(10)
top_10_rating_movie
title
Shawshank Redemption, The (1994)                                               4.554558
Godfather, The (1972)                                                          4.524966
Usual Suspects, The (1995)                                                     4.517106
Schindler‘s List (1993)                                                        4.510417
Raiders of the Lost Ark (1981)                                                 4.477725
Rear Window (1954)                                                             4.476190
Star Wars: Episode IV - A New Hope (1977)                                      4.453694
Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963)    4.449890
Casablanca (1942)                                                              4.412822
Sixth Sense, The (1999)                                                        4.406263
Name: rating, dtype: float64
%matplotlib inline #在ipython(或jupyter)中使用此命令,其他则不必
import matplotlib.pyplot as plt
import numpy as np

x = np.arange(1,11)
y = top_10_rating_movie.values
name = top_10_rating_movie.index

#画出图像
plt.plot(x, y, ‘r-o‘)

#添加注释
for i in range(10):
    plt.text(x[i], y[i], name[i])

#设置坐标范围
plt.xlim(0, 15)
plt.ylim(4.4, 4.56)

#设置坐标标题
#plt.xlabel(‘Rank‘)
#plt.ylabel=(‘Rating‘)

#plt.show() #非ipython用户使用此命令

这图太丑,献上下图:
import matplotlib.pyplot as plt
import numpy as np

plt.rcdefaults()

people = name
y_pos = np.arange(len(people))
performance = y
error = np.random.rand(len(people))

plt.barh(y_pos, performance, xerr=error, align=‘center‘, alpha=0.4)
plt.yticks(y_pos, people)

#plt.xlabel(‘Rating‘)
#plt.title(‘Rank‘)

#plt.show() #非ipython用户使用此命令

)

时间: 2024-10-09 19:25:34

用pandas分析百万电影数据的相关文章

实操 | 内存占用减少高达90%,还不用升级硬件?没错,这篇文章教你妙用Pandas轻松处理大规模数据

相比较于 Numpy,Pandas 使用一个二维的数据结构 DataFrame 来表示表格式的数据, 可以存储混合的数据结构,同时使用 NaN 来表示缺失的数据,而不用像 Numpy 一样要手工处理缺失的数据,并且 Pandas 使用轴标签来表示行和列. 通常用于处理小数据(小于 100Mb),而且对计算机的性能要求不高,但是当我们需要处理更大的数据时(100Mb到几千Gb),计算机性能就成了问题,如果配置过低就会导致更长的运行时间,甚至因为内存不足导致运行失败. 在处理大型数据集时(100Gb

小白学 Python 数据分析(10):Pandas (九)数据运算

人生苦短,我用 Python 前文传送门: 小白学 Python 数据分析(1):数据分析基础 小白学 Python 数据分析(2):Pandas (一)概述 小白学 Python 数据分析(3):Pandas (二)数据结构 Series 小白学 Python 数据分析(4):Pandas (三)数据结构 DataFrame 小白学 Python 数据分析(5):Pandas (四)基础操作(1)查看数据 小白学 Python 数据分析(6):Pandas (五)基础操作(2)数据选择 小白学

pandas从数据库读取数据

因为本周有一个是需要使用pandos做一个数据分析的需求,所以在这里做一下记录. Python中用Pandas进行数据分析,最常用的就是Dataframe数据结构, 这里我们主要介绍Pandas如何读取数据到Dataframe. Pandas读取Mysql数据要读取Mysql中的数据,首先要安装Mysqldb包.假设我数据库安装在本地,用户名位myusername,密码为mypassword,要读取mydb数据库中的数据,那么对应的代码如下: import pandas as pd import

JDBC实现往MySQL插入百万级数据

from:http://www.cnblogs.com/fnz0/p/5713102.html JDBC实现往MySQL插入百万级数据 想往某个表中插入几百万条数据做下测试, 原先的想法,直接写个循环10W次随便插入点数据试试吧,好吧,我真的很天真.... DROP PROCEDURE IF EXISTS proc_initData;--如果存在此存储过程则删掉 DELIMITER $ CREATE PROCEDURE proc_initData() BEGIN DECLARE i INT DE

夺命雷公狗---CMS---8-dedecms(实例之电影网3-添加电影数据)

添加具体电影数据 这里现在是没有电影模型的,要刷新下他才会出来 在后台->内容管理->选择电影模型->单击后面的“+”号, 要注意:此处只显示创建了栏目的模型,因此要创建完成栏目后,要重新刷新后台,才能显示出来. 添加几部测试数据的电影 确定后将会出现这个界面 加了两部看下后台的列表页 然后看下数据库的附加表里面有没数据 数据已经过来了,再来看看内容表里面的数据有没有过来 很明显,数据也已经过来了,而且这里还冒出了个flag的字段,这个字段也不难理解,其实他就是刚才在添加数据的时候的一个

Linux 4TB*2 组成JBOD 使用winhex手工分析重组恢复数据案例

Linux 4TB*2 组成JBOD 使用winhex手工分析重组恢复数据案例 一:案例描述:2块4TB硬盘组成一个JBOD分区,系统Linux. 其中有一个硬盘损坏.   二:恢复方法: 1:先恢复损坏的硬盘. 2:用winhex分析,两块硬盘JBOD结构. 在两块硬盘上查找十六进制"53EF" 方向:向下,偏移调制:512=56,查找超极块.  3:找到正确的超极块   4:打开查看中模板查看超极块信息.这里主要查看这个JBOD分区的,块总数与,块大小.   5:打开模板   6:

批量新增百万条数据 十百万条数据

--创建用户表CREATE TABLE table_1(    id int PRIMARY KEY, -- 主键ID    c1 varchar(24) NOT NULL,-- 列1    c2 datetime NOT NULL -- 列2) -- 批量新增一万条数据CREATE PROCEDURE PROC_INSERT @max int = 1000000, @c1 int = 1as WHILE @c1  <= @maxBEGIN  INSERT INTO Table_1 VALUES

赵雅智_android系统联系人app分析并获取数据

手机联系人存放位置 和短信一样在data-data下 手机联系人数据库解析 将contacts2.db表导出,通过SQLiteexpert查看 mimetypes表:存放的数据类型(电话,头像,姓名,邮箱) 外键: raw_contacts表:存放联系人的id contact_id:联系人id display_name:联系人姓名 data表:存放联系人的数据 data1:联系人数据 data2:在mimetypes表中data1表示值得意义 mimetype_id:联系人ID,data数据所属

百万条数据快速查询优化技巧参考

百万条数据快速查询优化技巧 1.应尽量避免在where子句中使用!=或<>操作符 2.应尽量避免在where子句中使用or来连接条件 如:select Id from t where num=10 or num=20 可以这样查询 Select id from t where num=10 Union all Select id from t where num=20 3. in 和not in 也要慎用,否则会导致全表扫描 如:select id from t where num in(1,