利用python3提供的urllib.request很方便爬网页上的东西。
1、urllib.request.urlopen(url)打开网页,并读取read()
2、python正则分析图片链接,如<photo=‘http://img3A.hualvtu.com/272492/20150223/2143e9d2b51b397cda16.jpg‘>
3、urllib.request.urlretrieve(url, filename)下载相应的url图片,保存到filename
外加创建文件目录os.makedirs(), log.txt文本记录
详细看代码:
# coding = utf-8 # by qiu import re, os import urllib.request page = 'http://fm.hualvtu.com/viewQuark.action?id=10150223231300000165&un=woshiyyh&reply=false' # download html def download_html(url): html = urllib.request.urlopen(url).read() return html.decode() def getImage(ht): reg = r'photo=\'(.*?\.jpg)\' dt=' obj = re.compile(reg) imglist = re.findall(obj, ht) folder = 'G:/download/photos/' if not os.path.exists(folder): os.makedirs(folder) logfile = open(folder+'log.txt', 'wt') logfile.write('图片下载来源'+ page + '\n') s = 1 for i in imglist: try: print('正在下载第%d张图片。。。'% s) urllib.request.urlretrieve(i, folder+'pic%s.jpg' % s) except: print("下载出错") logfile.write(i+'下载出错\n') continue logfile.write('图片%d链接--'% s+ i + '\n') s += 1 logfile.close() print('下载结束') html = download_html(page) getImage(html)
时间: 2024-10-13 15:18:02