Python爬虫-爬取伯乐在线美女邮箱

爬取伯乐在线美女邮箱

1.登录界面的进入,设置url，cookie，data，headers

2.进入主页，点击邮箱链接，需要重新设置url，cookie(读取重新保存的cookie)，data，headers

  1 ‘‘‘
  2 爬取伯乐在线的美女联系方式
  3 需要：
  4 1. 登录
  5 2. 在登录和相应声望值的前提下，提取对方的邮箱
  6 ‘‘‘
  7
  8 from urllib import request, error, parse
  9 from http import cookiejar
 10 import json
 11
 12 def login():
 13     ‘‘‘
 14     输入用户名称和密码
 15     获取相应的登录cookie
 16     cookie 写文件
 17     :return:
 18     ‘‘‘
 19
 20     # 1. 需要找到登录入口
 21     url = "http://date.jobbole.com/wp-login.php"
 22
 23     # 2. 准备登录数据
 24     data = {
 25         "log": "augsnano",
 26         "pwd": "123456789",
 27         # 登陆后重定向地址
 28         "redirect_to": "http://date.jobbole.com/4965/",
 29         "rememberme": "on"
 30     }
 31
 32     data = parse.urlencode(data).encode()
 33
 34
 35     # 3. 准备存放cookie文件
 36     # r表示不转义
 37     f = r‘jobbole_cookie.txt‘
 38
 39     # 4. 准备请求头信息
 40     headers = {
 41         "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36",
 42         "Connection": "keep-alive"
 43
 44     }
 45
 46     # 5. 准备cookie hanlder
 47     cookie_handler = cookiejar.MozillaCookieJar(f)
 48
 49     # 6. 准备http请求handler
 50     http_handler = request.HTTPCookieProcessor(cookie_handler)
 51
 52
 53     # 7. 构建opener
 54     opener = request.build_opener(http_handler)
 55
 56     # 8. 构建请求对象
 57     req = request.Request(url, data=data, headers=headers)
 58
 59     # 9. 发送请求
 60     try:
 61         rsp = opener.open(req)
 62
 63         cookie_handler.save(f, ignore_discard=True, ignore_expires=True)
 64
 65         html = rsp.read().decode()
 66         print(html)
 67     except error.URLError as e:
 68         print(e)
 69
 70
 71 def getInfo():
 72     # 1. 确定url
 73     url = "http://date.jobbole.com/wp-admin/admin-ajax.php"
 74
 75     # 2. 读取已经保存的cookie
 76     f = r‘jobbole_cookie.txt‘
 77     cookie = cookiejar.MozillaCookieJar()
 78     cookie.load(f, ignore_expires=True, ignore_discard=True)
 79
 80     # 3. 构建http_handler
 81     http_handler = request.HTTPCookieProcessor(cookie)
 82
 83     # 4. 构建opener
 84     opener = request.build_opener(http_handler)
 85
 86     # 以下是准备请求对象的过程
 87
 88     # 5. 构建data
 89     data = {
 90         "action": "get_date_contact",
 91         "postId": "4965"
 92     }
 93
 94     data = parse.urlencode(data).encode()
 95
 96     # 6. 构建请求头
 97     headers = {
 98         "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36",
 99         "Connection": "keep-alive"
100     }
101
102     # 7. 构建请求实体
103     req = request.Request(url, data=data, headers=headers)
104
105     # 8. 用opener打开
106     try:
107         rsp = opener.open(req)
108         html = rsp.read().decode()
109
110         html = json.loads(html)
111         print(html)
112
113         f = "rsp.html"
114         with open(f, ‘w‘) as f:
115             f.write(html)
116
117     except Exception as e:
118         print(e)
119
120
121
122
123
124
125 if __name__ == ‘__main__‘:
126     getInfo()

原文地址：https://www.cnblogs.com/xuxaut-558/p/10086450.html

时间： 2024-08-28 11:31:16

Python爬虫-爬取伯乐在线美女邮箱的相关文章

Scrapy分布式爬虫打造搜索引擎——（二） scrapy 爬取伯乐在线

1.开发环境准备 1.爬取策略目标:爬取“伯乐在线”的所有文章策略选择:由于“伯乐在线”提供了全部文章的索引页 ,所有不需要考虑url的去重方法,直接在索引页开始,一篇文章一篇文章地进行爬取,一直进行到最后一页即可. 索引页地址:http://blog.jobbole.com/all-posts/ 2. 搭建python3虚拟环境打开cmd,进入命令行,输入workon,查看当前存在的虚拟环境: workon 为爬虫项目,新建python3虚拟环境: mkvirtualenv -p py

python爬虫爬取美女图片

python 爬虫爬取美女图片 #coding=utf-8 import urllib import re import os import time import threading def getHtml(url): page = urllib.urlopen(url) html = page.read() return html def getImgUrl(html,src): srcre = re.compile(src) srclist = re.findall(srcre,html)

python爬虫爬取csdn博客专家所有博客内容

Python爬虫爬取博客园并保存

Python爬虫爬取博客园并保存爬取博客园指定用户的文章修饰后全部保存到本地首先定义爬取的模块文件: crawlers_main.py 执行入口 url_manager.py url管理器 download_manager.py 下载模块 parser_manager.py html解析器(解析html需要利用的内容) output_manager.py 输出html网页全部内容文件(包括css,png,js等) crawlers_main.py 执行入口 1 # coding

用Python爬虫爬取广州大学教务系统的成绩（内网访问）

用Python爬虫爬取广州大学教务系统的成绩(内网访问) 在进行爬取前,首先要了解: 1.什么是CSS选择器? 每一条css样式定义由两部分组成,形式如下: [code] 选择器{样式} [/code] 在{}之前的部分就是"选择器"."选择器"指明了{}中的"样式"的作用对象,也就是"样式"作用于网页中的哪些元素.可参考:http://www.w3school.com.cn/cssref/css_selectors.asph

python爬虫—爬取英文名以及正则表达式的介绍

python爬虫—爬取英文名以及正则表达式的介绍爬取英文名: 一. 爬虫模块详细设计 (1)整体思路对于本次爬取英文名数据的爬虫实现,我的思路是先将A-Z所有英文名的连接爬取出来,保存在一个csv文件中:再读取csv文件当中的每个英文名链接,采用循环的方法读取每一个英文名链接,根据每个英文名链接爬取每个链接中的数据,保存在新的csv文件当中. 需要写一个爬取英文名链接的函数.将爬取的内容保存在csv文件的函数以及读取csv文件内容的函数.爬取英文名详情页内容的函数. 表5.3.1 函数名

python爬虫爬取微博评论案例详解

这篇文章主要介绍了python爬虫爬取微博评论,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧数据格式:{"name":评论人姓名,"comment_time":评论时间,"comment_info":评论内容,"comment_url":评论人的主页} 以上就是我们需要的信息. 具体操作流程: 我们首相将主页获取完成以后,我们就会发现,其中的内容带有相

Python爬虫爬取知乎小结

博客首发至Marcovaldo's blog (http://marcovaldong.github.io/) 最近学习了一点网络爬虫,并实现了使用python来爬取知乎的一些功能,这里做一个小的总结.网络爬虫是指通过一定的规则自动的从网上抓取一些信息的程序或脚本.我们知道机器学习和数据挖掘等都是从大量的数据出发,找到一些有价值有规律的东西,而爬虫则可以帮助我们解决获取数据难的问题,因此网络爬虫是我们应该掌握的一个技巧. python有很多开源工具包供我们使用,我这里使用了requests.Be

Python爬虫爬取一篇韩寒新浪博客

网上看到大神对Python爬虫爬到非常多实用的信息,认为非常厉害.突然对想学Python爬虫,尽管自己没学过Python.但在网上找了一些资料看了一下,看到爬取韩寒新浪博客的视频.共三集,第一节讲爬取一篇博客,第二节讲爬取一页博客.第三集讲爬取所有博客. 看了视频.也留下了代码. 爬虫第一步:查看网页源码: 第一篇博客的代码为蓝底的部分<a title="" target="_blank" href="http://blog.sina.com.cn/