python爬虫——中华网图片库下载

# -*- coding: utf-8 -*-
import requests
import re
import sys
reload(sys)
sys.setdefaultencoding(‘utf-8‘)

if __name__ == ‘__main__‘:
    url = ‘http://photostock.china.com.cn/Web_CHN/SpecialTopicPhoto.aspx?Id=296‘
    html = requests.get(url)
    img_src = re.findall(‘<img alt=.*?src="..(.*?)".*?/>‘, html.text, re.S)
    imgUrl = []
    for each_src in img_src:
        imgUrl.append("http://photostock.china.com.cn" + each_src)
    picName = 100
    for each in imgUrl:
        imgContext = requests.get(each).content
        with open("lovelyAnimals/" + str(picName) + ".jpg", "wb") as code:
            code.write(imgContext)
        picName += 1

‘‘‘
下载文件的3种方法
(1): 使用urllib.urlretrieve方法，可在callbackfunc函数中显示下载进度
def callbackfunc(blocknum, blocksize, totalsize):
    # 回调函数
    # @blocknum:
    #     已经下载的数据块

    # @blocksize:
    #     数据块的大小

    # @totalsize:
    #     远程文件的大小
    percent = 100.0 * blocknum * blocksize / totalsize
    if percent > 100:
        percent = 100
    print "%.2f%%"% percent
url = ‘http://www.sina.com.cn‘
local = ‘lovelyAnimals/sina.html‘
urllib.urlretrieve(url, local, callbackfunc)

(2):使用urllib2.urlopen
import urllib2
url = ‘http://www.sina.com.cn‘
f = urllib2.urlopen(url)
data = f.read()
with open("lovelyAnimals/sina.html", "wb") as code:
    code.write(data)

(3):使用requests模块
import requests
url = ‘http://www.sina.com.cn‘
html = requests.get(url)
with open("lovelyAnimals/sina.html", "wb") as code:
    code.write(html.content)
‘‘‘

时间： 2024-10-10 03:55:19

python爬虫——中华网图片库下载的相关文章

利用python爬虫关键词批量下载高清大图

前言在上一篇写文章没高质量配图?python爬虫绕过限制一键搜索下载图虫创意图片!中,我们在未登录的情况下实现了图虫创意无水印高清小图的批量下载.虽然小图能够在一些移动端可能展示的还行,但是放到pc端展示图片太小效果真的是很一般!建议阅读本文查看上一篇文章,在具体实现不做太多介绍,只讲个分析思路. 当然,本文可能技术要求不是特别高,但可以当作一个下图工具使用. 环境:python3+pycharm+requests+re+BeatifulSoup+json 在这里插入图片描述这个确实也属实有一

Python 爬虫 Vimeo视频下载链接

python vimeo_d.py https://vimeo.com/228013581 在https://vimeo.com/上看到稀罕的视频按照上面加上视频的观看地址运行即可获得视频下载链接 (为了凑够150字+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++) 1 #coding:utf-8 2 #sample url = 'https://vimeo.com/228013581' 3 4 5 import reque

Python 爬虫 CSDN 网页下载

import reimport urllib.requestimport urllib.errorurl="http://blog.csdn.net"header=("User-Agent",'User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36')opn=urllib.reque

Python 爬虫批量下载美剧 from 人人影视 HR-HDTV

本人比較喜欢看美剧.尤其喜欢人人影视上HR-HDTV 的 1024 分辨率的高清双字美剧,这里写了一个脚本来批量获得指定美剧的全部 HR-HDTV 的 ed2k下载链接.并依照先后顺序写入到文本文件,供下载工具进行批量下载.比方用迅雷.先打开迅雷,然后复制全部下载链接到剪切板,迅雷会监视剪切板来新建全部任务.假设迅雷没有自己主动监视,能够自己点击新建然后粘贴链接.Python源码例如以下.用的是Python3 : # python3 实现,以下的实例 3 部美剧爬完大概要 10 s import

python爬虫主要就是五个模块：爬虫启动入口模块，URL管理器存放已经爬虫的URL和待爬虫URL列表，html下载器，html解析器，html输出器同时可以掌握到urllib2的使用、bs4（BeautifulSoup）页面解析器、re正则表达式、urlparse、python基础知识回顾（set集合操作）等相关内容。

本次python爬虫百步百科,里面详细分析了爬虫的步骤,对每一步代码都有详细的注释说明,可通过本案例掌握python爬虫的特点: 1.爬虫调度入口(crawler_main.py) # coding:utf-8from com.wenhy.crawler_baidu_baike import url_manager, html_downloader, html_parser, html_outputer print "爬虫百度百科调度入口" # 创建爬虫类class SpiderMai

python爬虫——中华网图片库下载