python数据分析实战-第9章-数据分析实例气象数据

第9章 数据分析实例——气象数据  230
9.1 待检验的假设:靠海对气候的影响  230
9.2 数据源  233
9.3 用IPython Notebook做数据分析  234
9.4 风向频率玫瑰图  246
9.5 小结  251

123
import numpy as npimport pandas as pdimport datetime
1
ferrara = pd.read_json(‘http://api.openweathermap.org/data/2.5/history/city?q=Ferrara,IT‘)
1
torino = pd.read_json(‘http://api.openweathermap.org/data/2.5/history/city?q=Torino,IT‘)
1
mantova = pd.read_json(‘http://api.openweathermap.org/data/2.5/history/city?q=Mantova,IT‘)
1
milano = pd.read_json(‘http://api.openweathermap.org/data/2.5/history/city?q=Milano,IT‘)
1
ravenna = pd.read_json(‘http://api.openweathermap.org/data/2.5/history/city?q=Ravenna,IT‘)
1
asti = pd.read_json(‘http://api.openweathermap.org/data/2.5/history/city?q=Asti,IT‘)
1
bologna = pd.read_json(‘http://api.openweathermap.org/data/2.5/history/city?q=Bologna,IT‘)
1
piacenza = pd.read_json(‘http://api.openweathermap.org/data/2.5/history/city?q=Piacenza,IT‘)
1
cesena = pd.read_json(‘http://api.openweathermap.org/data/2.5/history/city?q=Cesena,IT‘)
1
faenza = pd.read_json(‘http://api.openweathermap.org/data/2.5/history/city?q=Faenza,IT‘)
1234567891011121314151617181920212223
def prepare(city_list,city_name):    temp = [ ]    humidity = [ ]    pressure = [ ]    description = [ ]    dt = [ ]    wind_speed = [ ]    wind_deg = [ ]    for row in city_list:       temp.append(row[‘main‘][‘temp‘]-273.15)       humidity.append(row[‘main‘][‘humidity‘])       pressure.append(row[‘main‘][‘pressure‘])       description.append(row[‘weather‘][0][‘description‘])       dt.append(row[‘dt‘])       wind_speed.append(row[‘wind‘][‘speed‘])       wind_deg.append(row[‘wind‘][‘deg‘])    headings = [‘temp‘,‘humidity‘,‘pressure‘,‘description‘,‘dt‘,‘wind_speed‘,‘wind_deg‘]    data = [temp,humidity,pressure,description,dt,wind_speed,wind_deg]    df = pd.DataFrame(data,index=headings)    city = df.T    city[‘city‘] = city_name    city[‘day‘] = city[‘dt‘].apply(datetime.datetime.fromtimestamp)    return city
1234567891011
df_ferrara = prepare(ferrara.list,‘Ferrara‘)df_milano = prepare(milano.list,‘Milano‘)df_mantova = prepare(mantova.list,‘Mantova‘)df_ravenna = prepare(ravenna.list,‘Ravenna‘)df_torino = prepare(torino.list,‘Torino‘)#df_alessandria = prepare(alessandria.list,‘Alessandria‘)df_asti = prepare(asti.list,‘Asti‘)df_bologna = prepare(bologna.list,‘Bologna‘)df_piacenza = prepare(piacenza.list,‘Piacenza‘)df_cesena = prepare(cesena.list,‘Cesena‘)df_faenza = prepare(faenza.list,‘Faenza‘)
12345678910
print df_ferrara.shapeprint df_milano.shapeprint df_mantova.shapeprint df_ravenna.shapeprint df_torino.shapeprint df_asti.shapeprint df_bologna.shapeprint df_piacenza.shapeprint df_cesena.shapeprint df_faenza.shape
(24, 9)
(24, 9)
(24, 9)
(24, 9)
(24, 9)
(24, 9)
(24, 9)
(24, 9)
(24, 9)
(24, 9)
123456789101112
#http://it.thetimenow.com/distance-calculator.php#(Comacchio)df_ravenna[‘dist‘] = 8df_cesena[‘dist‘] = 14df_faenza[‘dist‘] = 37df_ferrara[‘dist‘] = 47df_bologna[‘dist‘] = 71df_mantova[‘dist‘] = 121 df_piacenza[‘dist‘] = 200df_milano[‘dist‘] = 250df_asti[‘dist‘] = 315df_torino[‘dist‘] = 357
123456789101112131415161718192021
import pandas as pd#df_ferrara.to_csv(‘ferrara_270615.csv‘)#df_milano.to_csv(‘milano_270615.csv‘)#df_mantova.to_csv(‘mantova_270615.csv‘)#df_ravenna.to_csv(‘ravenna_270615.csv‘)#df_torino.to_csv(‘torino_270615.csv‘)#df_asti.to_csv(‘asti_270615.csv‘)#df_bologna.to_csv(‘bologna_270615.csv‘)#df_piacenza.to_csv(‘piacenza_270615.csv‘)#df_cesena.to_csv(‘cesena_270615.csv‘)#df_faenza.to_csv(‘faenza_270615.csv‘)df_ferrara = pd.read_csv(‘ferrara_270615.csv‘)df_milano = pd.read_csv(‘milano_270615.csv‘)df_mantova = pd.read_csv(‘mantova_270615.csv‘)df_ravenna = pd.read_csv(‘ravenna_270615.csv‘)df_torino = pd.read_csv(‘torino_270615.csv‘)df_asti = pd.read_csv(‘asti_270615.csv‘)df_bologna = pd.read_csv(‘bologna_270615.csv‘)df_piacenza = pd.read_csv(‘piacenza_270615.csv‘)df_cesena = pd.read_csv(‘cesena_270615.csv‘)df_faenza = pd.read_csv(‘faenza_270615.csv‘)
1
df_cesena.columns
Index([‘Unnamed: 0‘, ‘temp‘, ‘humidity‘, ‘pressure‘, ‘description‘, ‘dt‘,
       ‘wind_speed‘, ‘wind_deg‘, ‘city‘, ‘day‘, ‘dist‘],
      dtype=‘object‘)
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
dist = [df_ravenna[‘dist‘][0],     df_cesena[‘dist‘][0],     df_faenza[‘dist‘][0],     df_ferrara[‘dist‘][0],     df_bologna[‘dist‘][0],     df_mantova[‘dist‘][0],     df_piacenza[‘dist‘][0],     df_milano[‘dist‘][0],     df_asti[‘dist‘][0],     df_torino[‘dist‘][0]]temp_max = [df_ravenna[‘temp‘].max(),     df_cesena[‘temp‘].max(),     df_faenza[‘temp‘].max(),     df_ferrara[‘temp‘].max(),     df_bologna[‘temp‘].max(),     df_mantova[‘temp‘].max(),     df_piacenza[‘temp‘].max(),     df_milano[‘temp‘].max(),     df_asti[‘temp‘].max(),     df_torino[‘temp‘].max()]temp_min = [df_ravenna[‘temp‘].min(),     df_cesena[‘temp‘].min(),     df_faenza[‘temp‘].min(),     df_ferrara[‘temp‘].min(),     df_bologna[‘temp‘].min(),     df_mantova[‘temp‘].min(),     df_piacenza[‘temp‘].min(),     df_milano[‘temp‘].min(),     df_asti[‘temp‘].min(),     df_torino[‘temp‘].min()]hum_min = [df_ravenna[‘humidity‘].min(),     df_cesena[‘humidity‘].min(),     df_faenza[‘humidity‘].min(),     df_ferrara[‘humidity‘].min(),     df_bologna[‘humidity‘].min(),     df_mantova[‘humidity‘].min(),     df_piacenza[‘humidity‘].min(),     df_milano[‘humidity‘].min(),     df_asti[‘humidity‘].min(),     df_torino[‘humidity‘].min()]hum_max = [df_ravenna[‘humidity‘].max(),     df_cesena[‘humidity‘].max(),     df_faenza[‘humidity‘].max(),     df_ferrara[‘humidity‘].max(),     df_bologna[‘humidity‘].max(),     df_mantova[‘humidity‘].max(),     df_piacenza[‘humidity‘].max(),     df_milano[‘humidity‘].max(),     df_asti[‘humidity‘].max(),     df_torino[‘humidity‘].max()]
123
%matplotlib inlineimport matplotlib.pyplot as pltimport matplotlib.dates as mdates
12
#temperatura massimaplt.plot(dist,temp_max,‘ro‘)
[<matplotlib.lines.Line2D at 0xd697650>]

12345678
x = np.array(dist)y = np.array(temp_max)x1 = x[x<100]x1 = x1.reshape((x1.size,1))y1 = y[x<100]x2 = x[x>50]x2 = x2.reshape((x2.size,1))y2 = y[x>50]
123456789
from sklearn.svm import SVRsvr_lin1 = SVR(kernel=‘linear‘, C=1e3)svr_lin2 = SVR(kernel=‘linear‘, C=1e3)svr_lin1.fit(x1, y1)svr_lin2.fit(x2, y2)xp1 = np.arange(10,100,10).reshape((9,1))xp2 = np.arange(50,400,50).reshape((7,1))yp1 = svr_lin1.predict(xp1)yp2 = svr_lin2.predict(xp2)
1234
plt.plot(xp1, yp1, c=‘r‘, label=‘Strong sea effect‘)plt.plot(xp2, yp2, c=‘b‘, label=‘Light sea effect‘)plt.axis((0,400,20,40))plt.scatter(x, y, c=‘k‘, label=‘data‘)
<matplotlib.collections.PathCollection at 0x18627cf8>

1234567891011121314151617
from scipy.optimize import fsolve

def line1(x):    a1 = svr_lin1.coef_[0][0]    b1 = svr_lin1.intercept_[0]    return -a1*x + b1def line2(x):    a2 = svr_lin2.coef_[0][0]    b2 = svr_lin2.intercept_[0]    return -a2*x + b2def findIntersection(fun1,fun2,x0): return fsolve(lambda x : fun1(x) - fun2(x),x0)

result = findIntersection(line1,line2,0.0)print "[x,y] = [ %d , %d ]" % (result,line1(result))x = numpy.linspace(0,300,31)plt.plot(x,line1(x),x,line2(x),result,line1(result),‘ro‘)
[x,y] = [ 101 , 34 ]

---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-25-389f5c694cae> in <module>()
     14 result = findIntersection(line1,line2,0.0)
     15 print "[x,y] = [ %d , %d ]" % (result,line1(result))
---> 16 x = numpy.linspace(0,300,31)
     17 plt.plot(x,line1(x),x,line2(x),result,line1(result),‘ro‘)

NameError: name ‘numpy‘ is not defined
123
#temperatures minplt.axis((0,400,15,25))plt.plot(dist,temp_min,‘bo‘)
[<matplotlib.lines.Line2D at 0x18716320>]

12
#min humidityplt.plot(dist,hum_min,‘bo‘)
[<matplotlib.lines.Line2D at 0x18b3de80>]

12
#max humidityplt.plot(dist,hum_max,‘bo‘)
[<matplotlib.lines.Line2D at 0x18bc8080>]

12345678
#temperaturey1 = df_milano[‘temp‘]x1 = df_milano[‘day‘]fig, ax = plt.subplots()plt.xticks(rotation=70)hours = mdates.DateFormatter(‘%H:%M‘)ax.xaxis.set_major_formatter(hours)ax.plot(x1,y1,‘r‘)
[<matplotlib.lines.Line2D at 0x1a109f28>]

12345678
#humidityy1 = df_milano[‘humidity‘]x1 = df_milano[‘day‘]fig, ax = plt.subplots()plt.xticks(rotation=70)hours = mdates.DateFormatter(‘%H:%M‘)ax.xaxis.set_major_formatter(hours)ax.plot(x1,y1,‘r‘)
[<matplotlib.lines.Line2D at 0x1a2f47f0>]

1234567891011
y1 = df_ravenna[‘temp‘]x1 = df_ravenna[‘day‘]y2 = df_ferrara[‘temp‘]x2 = df_ferrara[‘day‘]y3 = df_milano[‘temp‘]x3 = df_milano[‘day‘]fig, ax = plt.subplots()plt.xticks(rotation=70)hours = mdates.DateFormatter(‘%H:%M‘)ax.xaxis.set_major_formatter(hours)plt.plot(x1,y1,‘r‘,x2,y2,‘b‘,x3,y3,‘g‘)
[<matplotlib.lines.Line2D at 0x1a432e10>,
 <matplotlib.lines.Line2D at 0x1a586748>,
 <matplotlib.lines.Line2D at 0x1a586b38>]

1234567891011
y1 = df_ravenna[‘humidity‘]x1 = df_ravenna[‘day‘]y2 = df_ferrara[‘humidity‘]x2 = df_ferrara[‘day‘]y3 = df_milano[‘humidity‘]x3 = df_milano[‘day‘]fig, ax = plt.subplots()plt.xticks(rotation=70)hours = mdates.DateFormatter(‘%H:%M‘)ax.xaxis.set_major_formatter(hours)plt.plot(x1,y1,‘r‘,x2,y2,‘b‘,x3,y3,‘g‘)
[<matplotlib.lines.Line2D at 0x1a5d6f60>,
 <matplotlib.lines.Line2D at 0x1a7fb9b0>,
 <matplotlib.lines.Line2D at 0x1a7fbda0>]

123456789101112131415161718
y1 = df_ravenna[‘humidity‘]x1 = df_ravenna[‘day‘]y2 = df_faenza[‘humidity‘]x2 = df_faenza[‘day‘]y3 = df_cesena[‘humidity‘]x3 = df_cesena[‘day‘]y4 = df_milano[‘humidity‘]x4 = df_milano[‘day‘]y5 = df_asti[‘humidity‘]x5 = df_asti[‘day‘]y6 = df_torino[‘humidity‘]x6 = df_torino[‘day‘]fig, ax = plt.subplots()plt.xticks(rotation=70)hours = mdates.DateFormatter(‘%H:%M‘)ax.xaxis.set_major_formatter(hours)plt.plot(x1,y1,‘r‘,x2,y2,‘r‘,x3,y3,‘r‘)plt.plot(x4,y4,‘g‘,x5,y5,‘g‘,x6,y6,‘g‘)
[<matplotlib.lines.Line2D at 0x18606668>,
 <matplotlib.lines.Line2D at 0x1a86ec18>,
 <matplotlib.lines.Line2D at 0x1a861470>]

123456789101112131415161718
y1 = df_ravenna[‘temp‘]x1 = df_ravenna[‘day‘]y2 = df_faenza[‘temp‘]x2 = df_faenza[‘day‘]y3 = df_cesena[‘temp‘]x3 = df_cesena[‘day‘]y4 = df_milano[‘temp‘]x4 = df_milano[‘day‘]y5 = df_asti[‘temp‘]x5 = df_asti[‘day‘]y6 = df_torino[‘temp‘]x6 = df_torino[‘day‘]fig, ax = plt.subplots()plt.xticks(rotation=70)hours = mdates.DateFormatter(‘%H:%M‘)ax.xaxis.set_major_formatter(hours)plt.plot(x1,y1,‘r‘,x2,y2,‘r‘,x3,y3,‘r‘)plt.plot(x4,y4,‘g‘,x5,y5,‘g‘,x6,y6,‘g‘)
[<matplotlib.lines.Line2D at 0x1aa22a90>,
 <matplotlib.lines.Line2D at 0x1ac54ba8>,
 <matplotlib.lines.Line2D at 0x1ac49518>]

123456789101112
hum_mean = [df_ravenna[‘humidity‘].mean(),     df_cesena[‘humidity‘].mean(),     df_faenza[‘humidity‘].mean(),     df_ferrara[‘humidity‘].mean(),     df_bologna[‘humidity‘].mean(),     df_mantova[‘humidity‘].mean(),     df_piacenza[‘humidity‘].mean(),     df_milano[‘humidity‘].mean(),     df_asti[‘humidity‘].mean(),     df_torino[‘humidity‘].mean()]plt.plot(dist,hum_mean,‘bo‘)
[<matplotlib.lines.Line2D at 0x1acbfb70>]

12345678
y1 = df_ravenna[‘wind_speed‘]*20y2 = df_ravenna[‘humidity‘]x = df_ravenna[‘day‘]fig, ax = plt.subplots()plt.xticks(rotation=70)hours = mdates.DateFormatter(‘%H:%M‘)ax.xaxis.set_major_formatter(hours)plt.plot(x,y1,‘r‘,x,y2,‘b‘)
[<matplotlib.lines.Line2D at 0x1ab2ee80>,
 <matplotlib.lines.Line2D at 0x1b0a0668>]

1
plt.plot(df_ravenna[‘wind_deg‘],df_ravenna[‘wind_speed‘],‘ro‘)
[<matplotlib.lines.Line2D at 0x1b11c4e0>]

1234
plt.subplot(211)plt.plot(df_cesena[‘wind_deg‘],df_cesena[‘humidity‘],‘bo‘)plt.subplot(212)plt.plot(df_cesena[‘wind_deg‘],df_cesena[‘wind_speed‘],‘bo‘)
[<matplotlib.lines.Line2D at 0x1b4db6d8>]

123
hist, bins = np.histogram(df_ravenna[‘wind_deg‘],8,[0,360])print histprint bins
[3 4 9 6 1 1 0 0]
[   0.   45.   90.  135.  180.  225.  270.  315.  360.]
12345678
def showRoseWind(values,city_name,max_value):   N = 8   theta = np.arange(0.,2 * np.pi, 2 * np.pi / N)   radii = np.array(values)   plt.axes([0.025, 0.025, 0.95, 0.95], polar=True)   colors = [(1-x/max_value, 1-x/max_value, 0.75) for x in radii]   plt.bar(theta, radii, width=(2*np.pi/N), bottom=0.0, color=colors)   plt.title(city_name,x=0.2, fontsize=20)
123
hist, bin = np.histogram(df_ravenna[‘wind_deg‘],8,[0,360])print histshowRoseWind(hist,‘Ravenna‘, 15.0)
[3 4 9 6 1 1 0 0]

123
hist, bin = np.histogram(df_piacenza[‘wind_deg‘],8,[0,360])print histshowRoseWind(hist,‘Piacenza‘, 15.0)
[8 3 4 2 4 1 1 1]

12
print df_milano[df_milano[‘wind_deg‘]<45][‘wind_speed‘]print df_milano[df_milano[‘wind_deg‘]<45][‘wind_speed‘].mean()
1     2.6
3     2.1
5     2.1
13    0.5
14      1
18      1
21      1
Name: wind_speed, dtype: object
1.47142857143
12345678910
print df_milano[df_milano[‘wind_deg‘]<45][‘wind_speed‘].mean()#print df_milano[(df_milano[‘wind_deg‘]>0) & (df_milano[‘wind_deg‘]<45)][‘wind_speed‘].mean()print df_milano[(df_milano[‘wind_deg‘]>44) & (df_milano[‘wind_deg‘]<90)][‘wind_speed‘].mean()print df_milano[(df_milano[‘wind_deg‘]>89) & (df_milano[‘wind_deg‘]<135)][‘wind_speed‘].mean()print df_milano[(df_milano[‘wind_deg‘]>134) & (df_milano[‘wind_deg‘]<180)][‘wind_speed‘].mean()print df_milano[(df_milano[‘wind_deg‘]>179) & (df_milano[‘wind_deg‘]<225)][‘wind_speed‘].mean()print df_milano[(df_milano[‘wind_deg‘]>224) & (df_milano[‘wind_deg‘]<270)][‘wind_speed‘].mean()print df_milano[(df_milano[‘wind_deg‘]>269) & (df_milano[‘wind_deg‘]<315)][‘wind_speed‘].mean()#print df_milano[(df_milano[‘wind_deg‘]>314) & (df_milano[‘wind_deg‘]<360)][‘wind_speed‘].mean()print df_milano[df_milano[‘wind_deg‘]>314][‘wind_speed‘].mean()
1.47142857143
2.04
2.06666666667
2.05
2.68333333333
2.1
nan
nan
12
degs = np.arange(45,361,45)print degs
[ 45  90 135 180 225 270 315 360]
123456
tmp =  []for deg in degs:    #print df_milano[(df_milano[‘wind_deg‘]>(deg-46)) & (df_milano[‘wind_deg‘]<deg)][‘wind_speed‘].mean()    tmp.append(df_milano[(df_milano[‘wind_deg‘]>(deg-46)) & (df_milano[‘wind_deg‘]<deg)][‘wind_speed‘].mean())speeds = np.array(tmp)print speeds
[ 1.675              nan         nan         nan  2.93333333  3.13636364
  2.58               nan]
1234567
N = 8theta = np.arange(0.,2 * np.pi, 2 * np.pi / N)radii = np.array(speeds)plt.axes([0.025, 0.025, 0.95, 0.95], polar=True)colors = [(1-x/10.0, 1-x/10.0, 0.75) for x in radii]bars = plt.bar(theta, radii, width=(2*np.pi/N), bottom=0.0, color=colors)plt.title(‘Milano‘,x=0.2, fontsize=20)
<matplotlib.text.Text at 0x1be13ef0>

123456
def RoseWind_Speed(df_city):   degs = np.arange(45,361,45)   tmp =  []   for deg in degs:      tmp.append(df_city[(df_city[‘wind_deg‘]>(deg-46)) & (df_city[‘wind_deg‘]<deg)][‘wind_speed‘].mean())   return np.array(tmp)
12345678
def showRoseWind_Speed(speeds,city_name):   N = 8   theta = np.arange(0.,2 * np.pi, 2 * np.pi / N)   radii = np.array(speeds)   plt.axes([0.025, 0.025, 0.95, 0.95], polar=True)   colors = [(1-x/10.0, 1-x/10.0, 0.75) for x in radii]   bars = plt.bar(theta, radii, width=(2*np.pi/N), bottom=0.0, color=colors)   plt.title(city_name,x=0.2, fontsize=20)
1
showRoseWind(RoseWind_Speed(df_milano),‘Milano‘,10)

1
showRoseWind_Speed(RoseWind_Speed(df_ravenna),‘Ravenna‘)

1
showRoseWind_Speed(RoseWind_Speed(df_faenza),‘Faenza‘)

1
showRoseWind_Speed(RoseWind_Speed(df_cesena),‘Cesena‘)

1
showRoseWind_Speed(RoseWind_Speed(df_ferrara),‘Ferrara‘)

1
showRoseWind_Speed(RoseWind_Speed(df_torino),‘Torino‘)

1
showRoseWind_Speed(RoseWind_Speed(df_mantova),‘Mantova‘)

1
ferrara = pd.read_json(‘http://api.openweathermap.org/data/2.5/history/city?q=Ferrara,IT‘)
12345678910
df_ferrara.to_csv(‘ferrara.csv‘)df_milano.to_csv(‘milano.csv‘)df_mantova.to_csv(‘mantova.csv‘)df_ravenna.to_csv(‘ravenna.csv‘)df_torino.to_csv(‘torino.csv‘)df_asti.to_csv(‘asti.csv‘)df_bologna.to_csv(‘bologna.csv‘)df_piacenza.to_csv(‘piacenza.csv‘)df_cesena.to_csv(‘cesena.csv‘)df_faenza.to_csv(‘faenza.csv‘)

原文地址:https://www.cnblogs.com/LearnFromNow/p/9349935.html

时间: 2024-08-01 07:11:18

python数据分析实战-第9章-数据分析实例气象数据的相关文章

python数据分析实战-第7章-用matplotlib实现数据可视化

第7章 用matplotlib实现数据可视化 149 7.1 matplotlib库 149 7.2 安装 150 7.3 IPython和IPython QtConsole 150 7.4 matplotlib架构 151 7.4.1 Backend层 152 7.4.2 Artist层 152 7.4.3 Scripting层(pyplot) 153 7.4.4 pylab和pyplot 153 7.5 pyplot 154 7.5.1 生成一幅简单的交互式图表 154 123 import

【python数据分析实战】电影票房数据分析(二)数据可视化

目录 图1 每年的月票房走势图 图2 年票房总值.上映影片总数及观影人次 图3 单片总票房及日均票房 图4 单片票房及上映月份关系图 在上一部分<[python数据分析实战]电影票房数据分析(一)数据采集> 已经获取到了2011年至今的票房数据,并保存在了mysql中. 本文将在实操中讲解如何将mysql中的数据抽取出来并做成动态可视化. 图1 每年的月票房走势图 第一张图,我们要看一下每月的票房走势,毫无疑问要做成折线图,将近10年的票房数据放在一张图上展示. 数据抽取: 采集到的票房数据是

【python数据分析实战】电影票房数据分析(一)数据采集

目录 1.获取url 2.开始采集 3.存入mysql 本文是爬虫及可视化的练习项目,目标是爬取猫眼票房的全部数据并做可视化分析. 1.获取url 我们先打开猫眼票房http://piaofang.maoyan.com/dashboard?date=2019-10-22 ,查看当日票房信息, 但是在通过xpath对该url进行解析时发现获取不到数据. 于是按F12打开Chrome DevTool,按照如下步骤抓包 再打开获取到的url:http://pf.maoyan.com/second-bo

Python开发实战教程(8)-向网页提交获取数据

来这里找志同道合的小伙伴!↑↑↑ Python应用现在如火如荼,应用范围很广.因其效率高开发迅速的优势,快速进入编程语言排行榜前几名.本系列文章致力于可以全面系统的介绍Python语言开发知识和相关知识总结.希望大家能够快速入门并学习Python这门语言. 本次课程是在掌握python基础之上进行的.基础没有学习的话建议先查看文章学习基础目录:Python开发实战系列教程-链接汇总,持续更新.进行学习. 最近几天感冒中,四肢乏力以及最近比较忙导致,更新较慢.还请见谅. 概述 很多时候我们需要给网

机器学习实战第8章预测数值型数据:回归

1.简单的线性回归 假定输入数据存放在矩阵X中,而回归系数存放在向量W中,则对于给定的数据X1,预测结果将会是 这里的向量都默认为列向量 现在的问题是手里有一些x和对应的y数据,怎样才能找到W呢?一个常用的方法是找到使误差最小的W,这里的误差是指预测y值与真实y值之间的差值,使用该误差的简单累加将使得正差值和负差值相互抵消,所以我们采用平方误差. 平方误差可以写做: 用矩阵表示可以写成 使用上式对w进行求导: 具体可参考https://blog.csdn.net/nomadlx53/articl

python数据分析实战-第6章-深入pandas数据处理

第6章 深入pandas:数据处理 117 6.1 数据准备 117 合并 1234567891011 #merge是两个dataframe共同包含的项import numpy as npimport pandas as pdframe1 = pd.DataFrame( {'id':['ball','pencil','pen','mug','ashtray'], 'price': [12.33,11.44,33.21,13.23,33.62]})print(frame1)print()frame

python数据分析实战-第2章-ptyhon世界简介

第2章 Python世界简介 122.1 Python--编程语言 122.2 Python--解释器 132.2.1 Cython 142.2.2 Jython 142.2.3 PyPy 142.3 Python 2和Python 3 142.4 安装Python 152.5 Python发行版 152.5.1 Anaconda 152.5.2 Enthought Canopy 162.5.3 Python(x,y) 172.6 使用Python 172.6.1 Python shell 17

python数据分析实战-第3章-numpy库

第3章 NumPy库 32 3.1 NumPy简史 32 3.2 NumPy安装 32 3.3 ndarray:NumPy库的心脏 33 1 import numpy as np 1 a = np.array([1, 2, 3]) 1 a array([1, 2, 3]) 1 type(a), a.dtype, a.ndim, a.size, a.shape, a.itemsize (numpy.ndarray, dtype('int64'), 1, 3, (3,), 8) 1 b = np.a

机器学习实战第8章预测数值型数据:回归2

1. Shrinkage(缩减) Methods 当特征比样本点还多时(n>m),输入的数据矩阵X不是满秩矩阵,在求解(XTX)-1时会出现错误.接下来主要介绍岭回归(ridge regression)和前向逐步回归(Foward Stagewise Regression)两种方法. 1.1 岭回归(ridge regression) 简单来说,岭回归就是在矩阵XTX上加上一个从而使得矩阵非奇异,进而能进行求逆.其中矩阵I是一个单位矩阵,是一个调节参数. 岭回归的回归系数计算公式为: 岭回归最先