从0开始学爬虫10之urllib和requests库与github/api的交互

urllib库的使用

# coding=utf-8
import urllib2
import urllib

# htpbin模拟的环境
URL_IP="http://10.11.0.215:8080"
URL_GET = "http://10.11.0.215:8080/get"

def use_simple_urllib2():
    response = urllib2.urlopen(URL_IP)
    print ‘>>>> Response Headers:‘
    print response.info()
    print ‘>>>>Response Body:‘
    print ‘‘.join([line for line in response.readlines()])

def use_params_urllib2():
    # 构建请求参数
    params = urllib.urlencode({‘param1‘: ‘hello‘,‘param2‘: ‘world‘})
    print ‘Request Params:‘
    print params
    # 发送请求
    response = urllib2.urlopen(‘?‘.join([URL_GET, ‘%s‘]) % params)
    # 处理响应
    print ‘>>>Response Headers:‘
    print response.info()
    print ‘>>>Status code‘
    print response.getcode()
    print ‘>>>Response Body‘
    print ‘‘.join([line for line in response.readlines()])
    # print response.readlines()

if __name__ == ‘__main__‘:
    # print ‘>>>Use simple urllib2‘
    # use_simple_urllib2()
    print ‘>>>Use params urllib2‘
    use_params_urllib2()

requests库的简单使用

# coding=utf-8

import requests

URL_IP="http://10.11.0.215:8080/ip"
URL_GET="http://10.11.0.215:8080/get"

def use_simple_requests():
    response = requests.get(URL_IP)
    print ">>>Response Headers:"
    print response.headers
    print ">>>Response Code:"
    print response.status_code
    print "Response Body:"
    print response.text

def use_params_requests():
    response = requests.get(URL_GET)
    print ">>>Response Headers:"
    print response.headers
    print ">>>Response Code:"
    print response.status_code
    print response.reason
    print "Response Body:"
    print response.json()

if __name__ == "__main__":
    # print "simple requests:"
    # use_simple_requests()
    print "params requests:"
    use_params_requests()

requests和github api的互动

# coding=utf-8
import json
import requests
from requests import exceptions

URL = "https://api.github.com"

def build_uri(endpoint):
    # 拼凑url为最终的api路径
    return ‘/‘.join([URL, endpoint])

def better_print(json_str):
    # 格式化输出, indent=4是缩进为4个空格
    return json.dumps(json.loads(json_str), indent = 4)

def request_method():
    # 获取用户信息
    # response = requests.get(build_uri(‘users/reblue520‘))
    # response = requests.get(build_uri(‘user/emails‘), auth=(‘reblue520‘, ‘reblue520‘))
    response = requests.get(build_uri(‘user/public_emails‘), auth=(‘reblue520‘, ‘reblue520‘))
    print(better_print(response.text))

def params_request():
    response = requests.get(build_uri(‘users‘), params={‘since‘:11})
    print better_print(response.text)
    print response.request.headers
    print response.url

def json_request():
    # 更新用户信息,邮箱必须是已经验证过的邮箱
    # response = requests.patch(build_uri(‘user‘), auth=(‘reblue520‘,‘reblue520‘),json={‘name‘:‘hellojack2019‘,‘email‘:‘[email protected]‘})
    response = requests.post(build_uri(‘user/emails‘), auth=(‘reblue520‘,‘Reblue0225520‘),json=[‘[email protected]‘])
    print better_print(response.text)
    print response.request.headers
    print response.request.body
    print response.status_code

def timeout_request():
    # api异常处理:超时
    try:
        response = requests.get(build_uri(‘user/emails‘), timeout=10)
        response.raise_for_status()
    except exceptions.Timeout as e:
        print e.message
    except exceptions.HTTPError as e:
        print e.message
    else:
        print response.status_code
        print response.text

def hard_requests():
    # 自定义request
    from requests import Request, Session
    s = Session()
    headers = {‘User-Agent‘: ‘fake1.3.4‘}
    req = Request(‘GET‘, build_uri(‘user/emails‘), auth=(‘reblue520‘, ‘Reblue0225520‘), headers=headers)
    prepped = req.prepare()
    print prepped.body
    print prepped.headers

    resp = s.send(prepped, timeout = 5)
    print resp.status_code
    print resp.request.headers
    print resp.text

if __name__ == ‘__main__‘:
    # request_method()
    # params_request()
    # json_request()
    # timeout_request()
    hard_requests()

response响应的常用api

响应的基本API
In [1]: import requests                                                                                                                                                                                              

In [2]: response = requests.get("https://api.github.com")                                                                                                                                                            

In [3]: response.status_code
Out[3]: 200

In [4]: response.reason
Out[4]: ‘OK‘

In [5]: response.headers
Out[5]: {‘Date‘: ‘Sat, 20 Jul 2019 03:48:51 GMT‘, ‘Content-Type‘: ‘application/json; charset=utf-8‘, ‘Transfer-Encoding‘: ‘chunked‘, ‘Server‘: ‘GitHub.com‘, ‘Status‘: ‘200 OK‘, ‘X-RateLimit-Limit‘: ‘60‘, ‘X-RateLimit-Remaining‘: ‘47‘, ‘X-RateLimit-Reset‘: ‘1563598131‘, ‘Cache-Control‘: ‘public, max-age=60, s-maxage=60‘, ‘Vary‘: ‘Accept, Accept-Encoding‘, ‘ETag‘: ‘W/"7dc470913f1fe9bb6c7355b50a0737bc"‘, ‘X-GitHub-Media-Type‘: ‘github.v3; format=json‘, ‘Access-Control-Expose-Headers‘: ‘ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type‘, ‘Access-Control-Allow-Origin‘: ‘*‘, ‘Strict-Transport-Security‘: ‘max-age=31536000; includeSubdomains; preload‘, ‘X-Frame-Options‘: ‘deny‘, ‘X-Content-Type-Options‘: ‘nosniff‘, ‘X-XSS-Protection‘: ‘1; mode=block‘, ‘Referrer-Policy‘: ‘origin-when-cross-origin, strict-origin-when-cross-origin‘, ‘Content-Security-Policy‘: "default-src ‘none‘", ‘Content-Encoding‘: ‘gzip‘, ‘X-GitHub-Request-Id‘: ‘33D9:591B:9D084B:CF860E:5D328F23‘}

In [6]: response.url
Out[6]: ‘https://api.github.com/‘

In [7]: response.history
Out[7]: []

In [8]: response = requests.get("http://api.github.com")                                                                                                                                                             

In [9]: response.history
Out[9]: [<Response [301]>]

In [10]: response = requests.get("https://api.github.com")                                                                                                                                                           

In [11]: response.elapsed
Out[11]: datetime.timedelta(microseconds=459174)

In [12]: response.request
Out[12]: <PreparedRequest [GET]>

In [13]: response.request.headers
Out[13]: {‘User-Agent‘: ‘python-requests/2.22.0‘, ‘Accept-Encoding‘: ‘gzip, deflate‘, ‘Accept‘: ‘*/*‘, ‘Connection‘: ‘keep-alive‘}

In [14]: response.encoding
Out[14]: ‘utf-8‘

In [15]: response.raw.read(10)
Out[15]: b‘‘

In [16]: response.content
Out[16]: b‘{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}‘

In [17]: response.json()
Out[17]:
{‘current_user_url‘: ‘https://api.github.com/user‘,
 ‘current_user_authorizations_html_url‘: ‘https://github.com/settings/connections/applications{/client_id}‘,
 ‘authorizations_url‘: ‘https://api.github.com/authorizations‘,
 ‘code_search_url‘: ‘https://api.github.com/search/code?q={query}{&page,per_page,sort,order}‘,
 ‘commit_search_url‘: ‘https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}‘,
 ‘emails_url‘: ‘https://api.github.com/user/emails‘,
 ‘emojis_url‘: ‘https://api.github.com/emojis‘,
 ‘events_url‘: ‘https://api.github.com/events‘,
 ‘feeds_url‘: ‘https://api.github.com/feeds‘,
 ‘followers_url‘: ‘https://api.github.com/user/followers‘,
 ‘following_url‘: ‘https://api.github.com/user/following{/target}‘,
 ‘gists_url‘: ‘https://api.github.com/gists{/gist_id}‘,
 ‘hub_url‘: ‘https://api.github.com/hub‘,
 ‘issue_search_url‘: ‘https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}‘,
 ‘issues_url‘: ‘https://api.github.com/issues‘,
 ‘keys_url‘: ‘https://api.github.com/user/keys‘,
 ‘notifications_url‘: ‘https://api.github.com/notifications‘,
 ‘organization_repositories_url‘: ‘https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}‘,
 ‘organization_url‘: ‘https://api.github.com/orgs/{org}‘,
 ‘public_gists_url‘: ‘https://api.github.com/gists/public‘,
 ‘rate_limit_url‘: ‘https://api.github.com/rate_limit‘,
 ‘repository_url‘: ‘https://api.github.com/repos/{owner}/{repo}‘,
 ‘repository_search_url‘: ‘https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}‘,
 ‘current_user_repositories_url‘: ‘https://api.github.com/user/repos{?type,page,per_page,sort}‘,
 ‘starred_url‘: ‘https://api.github.com/user/starred{/owner}{/repo}‘,
 ‘starred_gists_url‘: ‘https://api.github.com/gists/starred‘,
 ‘team_url‘: ‘https://api.github.com/teams‘,
 ‘user_url‘: ‘https://api.github.com/users/{user}‘,
 ‘user_organizations_url‘: ‘https://api.github.com/user/orgs‘,
 ‘user_repositories_url‘: ‘https://api.github.com/users/{user}/repos{?type,page,per_page,sort}‘,
 ‘user_search_url‘: ‘https://api.github.com/search/users?q={query}{&page,per_page,sort,order}‘}

原文地址:https://www.cnblogs.com/reblue520/p/11230814.html

时间: 2024-10-08 18:56:10

从0开始学爬虫10之urllib和requests库与github/api的交互的相关文章

爬虫1.1-基础知识+requests库

目录 爬虫-基础知识+requests库 1. 状态返回码 2. URL各个字段解释 2. requests库 3. requests库爬虫的基本流程 爬虫-基础知识+requests库 关于html的知识,可以到w3school中进行了解学习.http://www.w3school.com.cn/html/index.asp,水平有限,这里不多提及. 1. 状态返回码 标志这这一次的请求状态,成功或失败,失败原因大概是什么 200:请求正常,服务器正常返回数据. 不代表爬去到正确信息了 301

Python入门必学2个重点及精髓-Requests库~正则基本使用(上)

作为一种便捷地收集网上信息并从中抽取出可用信息的方式,网络爬虫技术变得越来越有用.使用Python这样的简单编程语言,你可以使用少量编程技能就可以爬取复杂的网站. 如果手机上显示代码错乱,请分享到QQ或者其他地方,用电脑查看!!! python能干的东西有很多,这里不再过多叙述,直接重点干货. 什么是Requests Requests是用python语言基于urllib编写的,采用的是Apache2 Licensed开源协议的HTTP库 如果你看过上篇文章关于urllib库的使用,你会发现,其实

Python爬虫(二):Requests库

所谓爬虫就是模拟客户端发送网络请求,获取网络响应,并按照一定的规则解析获取的数据并保存的程序.要说 Python 的爬虫必然绕不过 Requests 库. 1 简介 对于 Requests 库,官方文档是这么说的: Requests 唯一的一个非转基因的 Python HTTP 库,人类可以安全享用. 警告:非专业使用其他 HTTP 库会导致危险的副作用,包括:安全缺陷症.冗余代码症.重新发明轮子症.啃文档症.抑郁.头疼.甚至死亡. 这个介绍还是比较生动形象的,便不再多说.安装使用终端命令 pi

从0开始学爬虫11之使用requests库下载图片

# coding=utf-8 import requests def download_imgage(): ''' demo: 下载图片 ''' headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"} url = "https:

从0开始学爬虫3之xpath的介绍和使用

Xpath:一种HTML和XML的查询语言,它能在XML和HTML的树状结构中寻找节点 安装xpath: pip install lxml HTML 超文本标记语言(HyperText Mark-up Language),是一种规范,一种标准,是构成网页文档的主要语言 URL 统一资源定位器(Uniform Resource Locator),互联网上的每个文件都有一个唯一的URL,它包含的信息之处文件的位置以及浏览器应该怎么处理它 Xpath的使用语法: 获取文本: //标签1[@属性1=”属

从0开始学爬虫4之requests基础知识

安装requestspip install requests get请求:可以用浏览器直接访问请求可以携带参数,但是又长度限制请求参数直接放在URL后面 POST请求:不能使用浏览器直接访问对请求参数的长度没有限制可以用来上传文件等需求 requests常用方法示例 use_requests.py #coding=utf-8 import requests def get_book(): """获取书本的信息""" url = 'http://s

从0开始学爬虫12之使用requests库基本认证

此处我们使用github的token进行简单测试验证 # coding=utf-8 import requests BASE_URL = "https://api.github.com" def construct_url(endpoint): return '/'.join([BASE_URL, endpoint]) def basic_auth(): ''' 基本认证 :return: ''' response = requests.get(construct_url('user'

Python爬虫:HTTP协议、Requests库

HTTP协议: HTTP(Hypertext Transfer Protocol):即超文本传输协议.URL是通过HTTP协议存取资源的Internet路径,一个URL对应一个数据资源. HTTP协议对资源的操作: Requests库提供了HTTP所有的基本请求方式.官方介绍:http://www.python-requests.org/en/master Requests库的6个主要方法: Requests库的异常: Requests库的两个重要对象:Request(请求).Response(

小白学 Python 爬虫(18):Requests 进阶操作

人生苦短,我用 Python 前文传送门: 小白学 Python 爬虫(1):开篇 小白学 Python 爬虫(2):前置准备(一)基本类库的安装 小白学 Python 爬虫(3):前置准备(二)Linux基础入门 小白学 Python 爬虫(4):前置准备(三)Docker基础入门 小白学 Python 爬虫(5):前置准备(四)数据库基础 小白学 Python 爬虫(6):前置准备(五)爬虫框架的安装 小白学 Python 爬虫(7):HTTP 基础 小白学 Python 爬虫(8):网页基