爬虫
python3爬取网页资源方式(1.最简单:
- import‘http://www.baidu.com/‘print2.通过request
- import‘http://www.baidu.com‘print1.import urllib.request
‘wd‘‘python‘‘opt-webpage‘‘on‘‘ie‘‘gbk‘GET和POST请求的不同之处是POST请求通常有"副作用"
‘Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)‘‘User-Agent‘
import urllib.request
from urllib.error import URLError ,HTTPError
req=urllib.request.Request(‘http://www.baidu.com‘)
try:urllib.request.urlopen(req)
except URLError as e:
print(e.reason)
HTTPError
1.Openers:
2.Handles:
import urllib.request
password_mgr=urllib.request.HTTPPasswordMgrWithDefaultRealm()
top_level_url="http://example.com/foo/"
password_mgr.add_password(None,top_level_url,‘why‘,‘1223‘)
handler=urllib.request.HTTPBasicAuthHandler(password_mgr)
opener=urllib.request.build_opener(handler)
a_url=‘http://www.baidu.com/‘
opener.open(a_url)
urllib.request.install_opener(opener)
后者包含了端口号。
- import‘http://www.baidu.com‘print1.import urllib.request
时间: 2024-12-14 18:41:53