0.前言
(1) 拆部分reques中感兴趣t的轮子
(2)对一些感兴趣的pythonic写法做一些归纳
1.用object.__setattr__来初始化构造函数
反正我之前就是直接实例对象时把所有参数传入构造函数的,一般人都这样..但事实证明这种方式并不好(可能),所以后来作者又把这种方式改掉了...但原谅我也不知道这两者有什么好坏之分..
class Request(object): """The :class:`Request` object. It carries out all functionality of Requests. Recommended interface is with the Requests functions. """ _METHODS = (‘GET‘, ‘HEAD‘, ‘PUT‘, ‘POST‘, ‘DELETE‘) def __init__(self): self.url = None self.headers = dict() self.method = None self.params = {} self.data = {} self.response = Response() self.auth = None self.sent = False def __repr__(self): try: repr = ‘<Request [%s]>‘ % (self.method) except: repr = ‘<Request object>‘ return repr def __setattr__(self, name, value): if (name == ‘method‘) and (value): if not value in self._METHODS: raise InvalidMethod() object.__setattr__(self, name, value)
初始化操作:
def get(url, params={}, headers={}, auth=None): """Sends a GET request. Returns :class:`Response` object. :param url: URL for the new :class:`Request` object. :param params: (optional) Dictionary of GET Parameters to send with the :class:`Request`. :param headers: (optional) Dictionary of HTTP Headers to sent with the :class:`Request`. :param auth: (optional) AuthObject to enable Basic HTTP Auth. """ r = Request() r.method = ‘GET‘ r.url = url r.params = params r.headers = headers r.auth = _detect_auth(url, auth) r.send() return r.response
2.大量复杂的参数传递时采用**kwargs
用**kwargs可在方法间的传递大量参数,不需要自己每次都初始化一个dict用来传参(嗯,之前我就是这样的傻逼)
def get(url, params={}, headers={}, cookies=None, auth=None): return request(‘GET‘, url, params=params, headers=headers, cookiejar=cookies, auth=auth) def request(method, url, **kwargs): data = kwargs.pop(‘data‘, dict()) or kwargs.pop(‘params‘, dict()) r = Request(method=method, url=url, data=data, headers=kwargs.pop(‘headers‘, {}), cookiejar=kwargs.pop(‘cookies‘, None), files=kwargs.pop(‘files‘, None), auth=kwargs.pop(‘auth‘, auth_manager.get_auth(url))) r.send() return r.response
3.monkey patch
热修复技术方案,可以参考协程,协程为了实现异步效果,替换了python原生的很多库。就是模块在加载前,把自己的模块在系统加载前替换掉原系统模块,然后达到自己的(不可告人的)目的。
这里其实不是requests使用了monkey patch,而是pyopenssl这个库,这个是为了修复python2.7中SNI的bug,将原来的ssl_wrap_socket方法做了替换(不过我没看到requests有任何注入操作,坑爹...)
# 替换 def inject_into_urllib3(): ‘Monkey-patch urllib3 with PyOpenSSL-backed SSL-support.‘ connection.ssl_wrap_socket = ssl_wrap_socket util.HAS_SNI = HAS_SNI util.IS_PYOPENSSL = True # 还原 def extract_from_urllib3(): ‘Undo monkey-patching by :func:`inject_into_urllib3`.‘ connection.ssl_wrap_socket = orig_connection_ssl_wrap_socket util.HAS_SNI = orig_util_HAS_SNI util.IS_PYOPENSSL = False
如果在请求https过程中出现SNIMissing的问题,可以考虑这么解决:
pip install pyopenssl ndg-httpsclient pyasn1 try: import urllib3.contrib.pyopenssl urllib3.contrib.pyopenssl.inject_into_urllib3() except ImportError: pass
相当于就是执行主动注入的操作(但这个不应该是requests框架自己该集成的么...)
4.hook函数
requests中有一个钩子函数,看历史版本,原来提供的回调入口有好几个,目前只有response一个回调入口了,测试代码如下
import requests def print_url(r, *args, **kwargs): print r.content print r.url requests.get(‘http://httpbin.org‘, hooks=dict(response=print_url))
这会发生什么呢?requests会在requests.Response返回前回调这个print_url这个方法。可以看到,回调操作是在requests拿到请求结果后才去操作的
def send(self, request, **kwargs): """ Send a given PreparedRequest. :rtype: requests.Response """ ... # Get the appropriate adapter to use adapter = self.get_adapter(url=request.url) # Start time (approximately) of the request start = datetime.utcnow() # Send the request r = adapter.send(request, **kwargs) # Total elapsed time of the request (approximately) r.elapsed = datetime.utcnow() - start # Response manipulation hooks r = dispatch_hook(‘response‘, hooks, r, **kwargs)
那dispatch_hook又干了什么呢?
def dispatch_hook(key, hooks, hook_data, **kwargs): """Dispatches a hook dictionary on a given piece of data.""" hooks = hooks or dict() hooks = hooks.get(key) if hooks: if hasattr(hooks, ‘__call__‘): hooks = [hooks] for hook in hooks: _hook_data = hook(hook_data, **kwargs) if _hook_data is not None: hook_data = _hook_data return hook_data
可以看到dispatch_hook本身是可以拓展的,但可惜的是目前requests只有response入口了,也许是为了安全吧。
其实说真的,requests的hook使用起来真的不够好,真正好用的hook,可以看看flask.
5.上下文管理器(历史版本)
with requests.settings(timeout=0.5): requests.get(‘http://example.org‘) requests.get(‘http://example.org‘, timeout=10)
在with之中,所有的配置加载都是在局部生效的,就算requests.get(‘http://example.org‘, timeout=10),但requests对象中的timeout属性依然是0.5而不是10,怎么实现的呢?
class settings: """Context manager for settings.""" cache = {} def __init__(self, timeout): self.module = inspect.getmodule(self) # Cache settings self.cache[‘timeout‘] = self.module.timeout self.module.timeout = timeout def __enter__(self): pass def __exit__(self, type, value, traceback): # Restore settings for key in self.cache: setattr(self.module, key, self.cache[key])
其实很简单,只要在进入这个context时,将原有的属性储存起来,退出context时,重新set回去就行了。
6.重定向redirect
requests对每一个send请求都会做重定向的判断,具体就是如果是重定向,那就执行以下这个方法
def resolve_redirects(self, resp, req, stream=False, timeout=None, verify=True, cert=None, proxies=None, **adapter_kwargs): """Receives a Response. Returns a generator of Responses.""" i = 0 hist = [] # keep track of history while resp.is_redirect: prepared_request = req.copy() if i > 0: # Update history and keep track of redirects. hist.append(resp) new_hist = list(hist) resp.history = new_hist ... url = resp.headers[‘location‘] # Handle redirection without scheme (see: RFC 1808 Section 4) if url.startswith(‘//‘): parsed_rurl = urlparse(resp.url) url = ‘%s:%s‘ % (parsed_rurl.scheme, url) ... extract_cookies_to_jar(prepared_request._cookies, req, resp.raw) prepared_request._cookies.update(self.cookies) prepared_request.prepare_cookies(prepared_request._cookies) # Rebuild auth and proxy information. proxies = self.rebuild_proxies(prepared_request, proxies) self.rebuild_auth(prepared_request, resp) # Override the original request. req = prepared_request resp = self.send( req, stream=stream, timeout=timeout, verify=verify, cert=cert, proxies=proxies, allow_redirects=False, **adapter_kwargs ) extract_cookies_to_jar(self.cookies, prepared_request, resp.raw) i += 1 yield resp
可以看到,requests会从url = resp.headers[‘location‘]取出重定向后的url,将resp追加到history中,然后重设head,cookie,proxy,auth执行self.send操作,然后yield resp后进入下一次循环,判断是否是redirect,最多redirect次数为30次.