Python爬虫(二)——对开封市58同城出租房数据进行分析

出租房面积(area)

出租房价格(price)

　　对比信息

代码

 1 import matplotlib as mpl
 2 import matplotlib.pyplot as plt
 3 import pandas as pad
 4 import seaborn as sns
 5 import numpy as np
 6
 7 sns.set_style(‘dark‘)
 8 kf = pad.read_csv(‘kf.csv‘)
 9
10 def sinplotone():
11     fig,ax = plt.subplots()
12     ax.violinplot(kf[‘price‘])
13     plt.show()
14
15 def sinplottwo():
16     sns.set_style(‘whitegrid‘)
17     sns.boxplot(kf[‘price‘],palette=‘deep‘)
18     # sns.despine(left=True)
19     plt.show()
20
21 def sinplotthree():
22     sns.distplot(kf[‘price‘])
23     plt.show()
24
25 def s():
26     df = pad.DataFrame(kf[‘area‘],kf[‘price‘])
27     sns.jointplot(x=‘x‘,y=‘y‘,data=df)
28     plt.show()
29
30 if __name__ == ‘__main__‘:
31     fig,ax = plt.subplots()
32     ax.scatter(kf[‘area‘],kf[‘price‘],12)
33     plt.show()

原文地址：https://www.cnblogs.com/LexMoon/p/58tc2.html

时间： 2024-10-15 08:32:54

Python爬虫(二)——对开封市58同城出租房数据进行分析的相关文章

Python爬虫(三)——开封市58同城出租房决策树构建

决策树框架: 1 # coding=utf-8 2 import matplotlib.pyplot as plt 3 4 decisionNode = dict(boxstyle='sawtooth', fc='10') 5 leafNode = dict(boxstyle='round4', fc='0.8') 6 arrow_args = dict(arrowstyle='<-') 7 8 9 def plotNode(nodeTxt, centerPt, parentPt, nodeTy

用Python写爬虫爬取58同城二手交易数据

爬了14W数据,存入Mongodb,用Charts库展示统计结果,这里展示一个示意模块1 获取分类url列表 from bs4 import BeautifulSoup import requests,pymongo main_url = 'http://bj.58.com/sale.shtml' client = pymongo.MongoClient('localhost',27017) tc_58 = client['58tc'] tab_link_list = tc_58['link_

Python爬虫：新浪新闻详情页的数据抓取（函数版）

上一篇文章<Python爬虫:抓取新浪新闻数据>详细解说了如何抓取新浪新闻详情页的相关数据,但代码的构建不利于后续扩展,每次抓取新的详情页时都需要重新写一遍,因此,我们需要将其整理成函数,方便直接调用. 详情页抓取的6个数据:新闻标题.评论数.时间.来源.正文.责任编辑. 首先,我们先将评论数整理成函数形式表示: 1 import requests 2 import json 3 import re 4 5 comments_url = 'http://comment5.news.sina.c

Python 爬虫二

requests模块 beautifulsoup模块 Request模块 get方法请求整体演示一下: import requests response = requests.get("https://www.baidu.com") print(type(response)) print(response.status_code) print(type(response.text)) print(response.text) print(response.cookies) print

python爬虫(二)--了解deque

队列-deque有了上面一节的基础,当然你需要完全掌握上一节的所有方法,因为上一节的方法,在下面的教程中会反复的用到.如果你没有记住,请你返回上一节.这一节我们要了解一种队列--deque.在下面的爬虫基础中,我们也要反复的使用deque,来完成网址的出队入队. 有了对deque基本的认识,我们开始进一步的学习了解他. colloections.deque([iterable[,maxlen]])从左到右初始化一个新的deque对象,如果iterable没有给出,那么产生一个空的deque.de

python爬虫(二)_HTTP的请求和响应

HTTP和HTTPS HTTP(HyperText Transfer Protocol,超文本传输协议):是一种发布和接收HTML页面的方法 HTTPS(HyperText Transfer Protocol over Secure Socket Layer)简单讲是HTTP的安全版,在HTTP下加入SSL层. SSL(Secure Socket Layer安全套接层)主要用于web的安全传输协议,在传输层对网络连接进行加密,保障在Internet上数据传输的安全. HTTP的端口号为80 HT

Python爬虫(二十)_动态爬取影评信息

本案例介绍从JavaScript中采集加载的数据.更多内容请参考:Python学习指南 #-*- coding:utf-8 -*- import requests import re import time import json #数据下载器 class HtmlDownloader(object): def download(self, url, params=None): if url is None: return None user_agent = 'Mozilla/5.0 (Wind

Python爬虫(二)_urllib2的使用

所谓网页抓取,就是把URL地址中指定的网络资源从网络流中读取出来,保存到本地.在Python中有很多库可以用来抓取网页,我们先学习urllib2. urllib2是Python2.x自带的模块(不需要下载,导入即可使用) urllib2官网文档:https://docs.python.org/2/library/urllib2.html urllib2源码 urllib2在python3.x中被改为urllib.request urlopen 我们先来段代码: #-*- coding:utf-8

python爬虫二、Urllib库的基本使用

什么是Urllib Urllib是python内置的HTTP请求库包括以下模块 urllib.request 请求模块 urllib.error 异常处理模块 urllib.parse url解析模块 urllib.robotparser robots.txt解析模块 urlopen 关于urllib.request.urlopen参数的介绍: urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=No