实例一--爬取页面
1 import requests 2 url="https//itemjd.com/2646846.html" 3 try: 4 r=requests.get(url) 5 r.raise_for_status() 6 r.encoding=r.apparent_encoding 7 print(r.text[:1000]) 8 except: 9 print("爬取失败")
正常页面爬取
实例二--爬取页面
1 import requests 2 url="https://www.amazon.cn/gp/product/B01M8L5Z3Y" 3 try: 4 kv={‘user-agent‘:‘Mozilla/5.0‘} 5 r=requests.get(url,headers=kv) 6 r.raise_for_status() 7 r.encoding=r.apparent_encoding 8 print(r.text[1000:2000]) 9 except: 10 print("爬取失败")
对访问用户名有限制,模拟浏览器对网站请求
实例三--爬取搜索引擎
1 #百度的关键词接口:http://www.baidu.com/s?wd=keyword 2 #360的关键词接口:http://www.so.com/s?q=keyword 3 import requests 4 keyword="python" 5 try: 6 kv={‘wd‘:keyword} 7 r=requests.get("http://www.baidu.com/s",params=kv) 8 print(r.request.url) 9 r.raise_for_status() 10 print(len(r.text)) 11 except: 12 print("爬取失败")--------------------------------------------------
import requestskeyword="python"try: kv={‘q‘:keyword} r=requests.get("http://www.so.com/s",params=kv) print(r.request.url) r.raise_for_status() print(len(r.text))except: print("爬取失败")
实例四--:爬取图片
1 import requests 2 import os 3 url="http://image.nationalgeographic.com.cn/2017/0211/20170211061910157.jpg" 4 root="F://pics//" 5 path=root+url.split(‘/‘)[-1] 6 try: 7 if not os.path.exists(root): 8 os.mkdir(root) 9 if not os.path.exists(path): 10 r=requests.get(url) 11 with open(path,‘wb‘) as f: 12 f.write(r.content) 13 f.close() 14 print("文件保存成功") 15 else: 16 print("文件已经存在") 17 except: 18 print("爬取失败")
爬取并保存图片
实例五--IP地址归属地查询:
http://m.ip138.com/ip.asp?ip=ipaddress
url="http://www.ip138.com/iplookup.asp?ip=" try: r=requests.get(url+‘202.204.80.112‘+‘&action=2‘) r.raise_for_status() r.encoding=r.apparent_encoding print(r.text[-500:]) except: print("爬取失败")
有反爬了
原文地址:https://www.cnblogs.com/cy2268540857/p/12424091.html
时间: 2024-10-09 03:11:57