python中使用urllib2伪造HTTP报头的2个方法

在采集网页信息的时候，经常需要伪造报头来实现采集脚本的有效执行

下面，我们将使用urllib2的header部分伪造报头来实现采集信息

方法1、

#!/usr/bin/python

# -*- coding: utf-8 -*-

#encoding=utf-8

#Filename:urllib2-header.py

import urllib2

import sys

#抓取网页内容-发送报头-1

url= "http://www.jb51.net"

send_headers = {

‘Host‘:‘www.jb51.net‘,

‘User-Agent‘:‘Mozilla/5.0 (Windows NT 6.2; rv:16.0) Gecko/20100101 Firefox/16.0‘,

‘Accept‘:‘text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8‘,

‘Connection‘:‘keep-alive‘

}

req = urllib2.Request(url,headers=send_headers)

r = urllib2.urlopen(req)

html = r.read() #返回网页内容

receive_header = r.info() #返回的报头信息

# sys.getfilesystemencoding()

html = html.decode(‘utf-8‘,‘replace‘).encode(sys.getfilesystemencoding()) #转码:避免输出出现乱码

print receive_header

# print ‘####################################‘

print html

方法2、

#!/usr/bin/python

# -*- coding: utf-8 -*-

#encoding=utf-8

#Filename:urllib2-header.py

import urllib2

import sys

url = ‘http://www.jb51.net‘

req = urllib2.Request(url)

req.add_header(‘Referer‘,‘http://www.jb51.net/‘)

req.add_header(‘User-Agent‘,‘Mozilla/5.0 (Windows NT 6.2; rv:16.0) Gecko/20100101 Firefox/16.0‘)

r = urllib2.urlopen(req)

html = r.read()

receive_header = r.info()

html = html.decode(‘utf-8‘).encode(sys.getfilesystemencoding())

print receive_header

print ‘#####################################‘

print html

时间： 2024-10-25 05:12:57

python中使用urllib2伪造HTTP报头的2个方法的相关文章

python 中使用 urllib2 伪造 http 报头的2个方法

方法1. ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 #!/usr/bin/python # -*- coding: utf-8 -*- #encoding=utf-8 #Filename:urllib2-header.py import urllib2 import sys #抓取网页内容-发送报头-1 url= "http://www.jb51.net" se

python中urllib, urllib2,urllib3, httplib,httplib2, request的区别

permike原文python中urllib, urllib2,urllib3, httplib,httplib2, request的区别若只使用python3.X, 下面可以不看了, 记住有个urllib的库就行了 python2.X 有这些库名可用: urllib, urllib2, urllib3, httplib, httplib2, requests python3.X 有这些库名可用: urllib, urllib3, httplib2, requests 两者都有的urllib3

python中List添加、删除元素的几种方法

一.python中List添加元素的几种方法 List 是 Python 中常用的数据类型,它一个有序集合,即其中的元素始终保持着初始时的定义的顺序(除非你对它们进行排序或其他修改操作).在Python中,向List添加元素,方法有如下4种方法(append(),extend(),insert(), +加号). 1. append() 追加单个元素到List的尾部,只接受一个参数,参数可以是任何数据类型,被追加的元素在List中保持着原结构类型.此元素如果是一个list,那么这个list将作为一

python中init()、new()、call()几个魔法方法的用法

关于__new__()的用法参考: http://www.myhack58.com/Article/68/2014/48183.htm 正文: 一.__new__()的用法: __new__()是在新式类中新出现的方法,它作用在构造方法建造实例之前,可以这么理解,在Python 中存在于类里面的构造方法__init__()负责将类的实例化,而在__init__()启动之前,__new__()决定是否要使用该__init__()方法,因为__new__()可以调用其他类的构造方法或者直接返回别

python中单例模式的实现-通过闭包函数和魔术方法new实现单例模式

1.通过闭包函数实现单例模式: # 使用闭包函数实现单例 def single(cls, *args, **kwargs): instance = {} def get_instance(): if cls not in instance: instance[cls] = cls(*args, **kwargs) return instance[cls] return get_instance @single class Apple: pass a = Apple() b = Apple() p

Python中的10个常见安全漏洞及修复方法

编写安全的代码很困难,当你学习一门编程语言.一个模块或框架时,你会学习其使用方法.在考虑安全性时,你需要考虑如何避免代码被滥用,Python也不例外,即使在标准库中,也存在着许多糟糕的实例.然而,许多 Python 开发人员却根本不知道这些. 以下是我总结的10个Python常见安全漏洞,排名不分先后. 1.输入注入注入***影响广泛且很常见,注入有很多种类,它们影响所有的语言.框架和环境. SQL 注入是直接编写 SQL 查询(而非使用 ORM) 时将字符串与变量混合.我读过很多代码,其中"

Python 中的 urllib2 模块

通过python 的 urllib2 模块,可以轻易的去模拟用户访问网页的行为. 这里将自己的学习过程简单的记录下来. 一.urlopen函数 urlopen(url, data=None) -- Basic usage is the same as original urllib. pass the url and optionally data to post to an HTTP URL, and get a file-like object back. One diffe

python中实现两个列表同时输出元素的方法zip

记:这个问题其实曾经在群里向一些同学求教过,但是在后来的写程序的过程,又把这个方法忘记了,所以今天在这里把这个问题说明下,以免下次再犯同样的问题! 假设有两个列表 a = [1,2,3,4,5,], b = [6,7,8,9,10],现在要求分别从这两个列表中输出元素,则可以使用for循环来实现这个功能程序: for item1, item2 in zip(a, b): print "a:%s, b:%s" %(item1, item2) 结果如下: a:1 ,b:6 a:2 ,b:

Python中内置类型和定义了nonzero的魔术方法的类都能在if语句中呗解释为True或False

1 >>> a =[1,2,3] 2 >>> if a: 3 print('I found something') 4 5 6 I found something 7 >>> b = [] 8 >>> if not b: 9 print('Empty') 10 11 12 Empty 13 >>> b = [] 14 >>> if b: 15 print('Empty') 16 17 18 >