爬取好看的妹子图片哟~ -《狗嗨默示录》-

#!/usr/bin/env python# -*- coding:utf-8 -*-import urllib.requestimport  re

#获取源码def gethtml():    papg = urllib.request.urlopen(‘https://www.4493.com/‘)    html = papg.read().decode(‘gbk‘)    #print(html)    return html

#下载图片def getimg(html):    imgre = re.compile(r‘ lazysrc="(.*?)" ‘)    imglist = re.findall(imgre,html)#匹配    print(imglist)    x = 0    for imgurl in imglist:        urllib.request.urlretrieve(imgurl, ‘D:\学习资料\pic\%s.jpg‘ % x)        x += 1

html = gethtml()getimg(html)

时间： 2024-12-26 01:02:28

爬取好看的妹子图片哟~ -《狗嗨默示录》-的相关文章

爬取小说网站整站小说内容 -《狗嗨默示录》-

# !/usr/bin/env python # -*- coding: utf-8 -*- import urllib.request import re import MySQLdb import socket domain = 'http://www.quanshuwang.com' headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Ch

分布式爬虫基本原理 -《狗嗨默示录》-

分布式爬虫基本原理: 找一台高性能服务器,用于redis队列的维护以及数据的存储. 扩展scrapy程序,让其通过服务器的redis来获取start_urls,并改写pipeline里数据存储部分,把存储地址改为服务器地址. 在服务器上写一些生成url的脚本,并定期执行. 常见的防抓取屏蔽的方法: 设置download_delay,这个方法基本上属于万能的,理论上只要你的delay足够长,网站服务器都没办法判断你是正常浏览还是爬虫.但它带来的副作用也是显然的:大量降低爬取效率.因此这个我们可能需

web.py搭建个人网址微信二维码后台开发 -《狗嗨默示录》-

建议在Python2.x版本食用 webapp.py #!/usr/bin/env python # -*- coding:UTF-8 -*- import web import qrcode from PIL import Image import datetime urls = ( '/','Index' # '/images/logo.png','Logo' #可放于静态文件夹static中 ) render = web.template.render('templates')#模板引擎

使用http.cookiejar带cookie信息登录爬取方法 -《狗嗨默示录》-

Login.py # !/usr/bin/env python # -*- coding: utf-8 -*- import urllib.request import urllib.parse import user_info import http.cookiejar import re import time import socket cookie = http.cookiejar.CookieJar() #创建cookieJar保存cookie handler = urllib.req

Scrapy 爬取保险条款 -《狗嗨默示录》-

items.py class IachinaItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() COMPANY = scrapy.Field() TYPE = scrapy.Field() PRODUCT = scrapy.Field() CLAUSE = scrapy.Field() CLAUSE_URL = scrapy.Field() iachina.py # -*-

鼠标经过图片时图片上出现文字，鼠标移出时隐藏（通俗版） -《狗嗨默示录》-

精简网络框架web.py学习笔记 -《狗嗨默示录》-

web.py 内置了web服务器,代码写完后,将其保存,例如文件名为mywebpy.py,可以用下面的方法来启动服务器: python mywebpy.py 打开你的浏览器输入 http://localhost:8080/ 查看页面. 若要制定另外的端口,使用 python mywebpy.py 1234. URL 处理任何网站最重要的部分就是它的URL结构.你的URL并不仅仅只是访问者所能看到并且能发给朋友的.它还规定了你网站运行的心智模型.在一些类似del.icio.us的流行网站 , U

MySQL使用手册 -《狗嗨默示录》-

1.创建数据库 CREATE DATABASE database dbname 2.删除数据库 drop database dbname 3.备份sql server -- 创建备份数据的 device USE master EXEC sp_addumpdevice 'disk', 'testBack', 'c:\Mysql\MyNwind_1.dat' -- 开始备份 BACKUP DATABASE pubs TO testBack 4.创建新表 create table tabname(

Scrapy指定顺序输出 -《狗嗨默示录》-

items.py import scrapy class CollectipsItem(scrapy.Item): IP = scrapy.Field() PORT = scrapy.Field() POSITION = scrapy.Field() TYPE = scrapy.Field() SPEED = scrapy.Field() CONNECT_TIME = scrapy.Field() SURVIVE_TIME = scrapy.Field() LAST_CHECK_TIME = s