通用代码框架:
try: r=requests.get(url,timeout=30) r.raise_for_status() r.encoding=r.apparent_encoding return r.text except: return "产生异常"
爬取某网页100次花费的时间
import requests import time def getHTMLText(url): try: r=requests.get(url,timeout=30) r.raise_for_status() r.encoding=r.apparent_encoding return r.text except: return "产生异常" if __name__==‘__main__‘: url=‘http://www.baidu.com‘ a=time.time() for i in range(100): getHTMLText(url) b=time.time() print(‘爬取100次需要花费的时间为%d秒‘ %(b-a))
爬取京东商品页面的爬取:
import requests url=‘https://item.jd.com/5369026.html‘ try: r=requests.get(url) r.raise_for_status() r.encoding=r.apparent_encoding print(r.text[:1000]) except: print(‘爬取失败‘)
爬取有限制的网页:
import requests url = ‘http://yzb.tju.edu.cn/xwzx/tkbs_xw/201609/t20160914_285521.htm‘ try: kv={‘user-agent‘:‘Mozilla/5.0‘} r = requests.get(url,headers=kv) r.raise_for_status() r.encoding = r.apparent_encoding print(r.text[1000:2000]) except: print(‘爬取失败‘)
百度关键词搜索:
import requests keyword=‘Python‘ try: kv = {‘wd‘:keyword} r = requests.get(‘http://www.baidu.com/s‘,params=kv) print(r.request.url) r.raise_for_status() print(len(r.text)) except: print(‘爬取失败‘)
360关键词搜索全代码:
import requests keyword=‘Python‘ try: kv={‘q‘:keyword} r=requests.get(‘http://www.so.com/s‘,params=kv) print(r.request.url) r.raise_for_status() print(len(r.text)) except: print(‘爬取失败‘)
图片爬取:
import requests import os url=‘http://image.nationalgeographic.com.cn/2017/0905/20170905114825283.jpg‘ root=‘E://pics//‘ path=root+url.split(‘/‘)[-1] try: if not os.path.exists(root): os.mkdir(root) if not os.path.exists(path): r=requests.get(url) with open(path,‘wb‘) as f: f.write(r.content) f.close() print(‘文件保存成功‘) else: print(‘文件已存在‘) except: print(‘爬取失败‘)
ip地址查询:
import requests url=‘http://m.ip138.com/ip.asp?ip=‘ try: r=requests.get(url+‘202.204.80.112‘) r.raise_for_status() r.encoding=r.apparent_encoding print(r.text[-500:]) except: print(‘爬取失败‘)
时间: 2024-10-08 17:14:57