浅析requests库响应对象的text和content属性

在做爬虫时请求网页的requests库是必不可少的，我们常常会用到 res = resquests.get(url) 方法，在获取网页的html代码时常常使用res的text属性: html = res.text，在下载图片或文件时常常使用res的content属性:

with open(filename, ‘wb‘) as fp:

　　fp.write(res.content)

下面我们来看看 ‘text‘ 和 ‘content‘ 的不同之处：

输出本博客的响应对象的 text

import requests

url = ‘https://www.cnblogs.com/huwt/‘

res = requests.get(url, timeout = 6)

print(res.text)

(只截取到<title>标签)

输出本博客的响应对象的 content

import requests

url = ‘https://www.cnblogs.com/huwt/‘

res = requests.get(url, timeout = 6)

print(res.content)

(只截取到<title>标签)

通过<title>标签我们可以看出 res.text 直接输出了汉字，而 res.content 好像是以十六进制的形式来表示汉字

为了让进一步了解text 和 content 我们来看看它们的类型：

import requests

url = ‘https://www.cnblogs.com/huwt/‘

res = requests.get(url, timeout = 6)

print(type(res.text))

print(type(res.content))

我们可以看到res.text是字符串类型，而res.content是二进制类型

为了进一步验证我们使用bytes类型的decode()方法对content进行‘utf-8’编码再显示

import requests

url = ‘https://www.cnblogs.com/huwt/‘

res = requests.get(url, timeout = 6)

print(res.content.decode(‘utf-8‘))

发现和res.text显示的内容完全一样

因此我们可以得出结论：

resp.text返回的是Unicode型的数据。

resp.content返回的是bytes型也就是二进制的数据。、

获取文本一般使用res.text, 获取图片或文件一般使用res.conten

再做几点补充：

text是content经过编码之后的字符串，那编码方式是什么呢？

在返回text时requests会基于 HTTP 头部对响应的编码作出有根据的推测，但不一定准确，有可能出现乱码，

而我们可以手动指定一种编码方式：res.encoding = ‘需要的编码方式‘

或让requests根据body进行猜测：res.encoding = res.apparent_encoding

参考学习：

https://zhidao.baidu.com/question/941417472703558372.html

https://www.cnblogs.com/loveyouyou616/p/8135678.html

https://www.cnblogs.com/chownjy/p/6625299.html

https://www.jianshu.com/p/0e0336b370f3

原文地址：https://www.cnblogs.com/huwt/p/10368803.html

时间： 2024-10-29 17:37:37

浅析requests库响应对象的text和content属性的相关文章

MOOC《Python网络爬虫与信息提取》学习过程笔记【requests库】第一周1-3

一得到百度网页的html源代码: >>> import requests >>> r=requests.get("http://www.baidu.com") >>> r.status_code #查看状态码,为200表示访问成功,其他表示访问失败 200 >>> r.encoding='utf-8' #更改编码为utf-8编码 >>> r.text #打印网页内容 >>> r.

python爬虫从入门到放弃（四）之 Requests库的基本使用

什么是Requests Requests是用python语言基于urllib编写的,采用的是Apache2 Licensed开源协议的HTTP库如果你看过上篇文章关于urllib库的使用,你会发现,其实urllib还是非常不方便的,而Requests它会比urllib更加方便,可以节约我们大量的工作.(用了requests之后,你基本都不愿意用urllib了)一句话,requests是python实现的最简单易用的HTTP库,建议爬虫使用requests库. 默认安装好python之后,是没有安

[爬虫] requests库

requests库的7个常用方法 requests.request() 构造一个请求,支撑以下各种方法的基础方法 requests.get() 获取HTML网页的主要方法,对应于HTTP的GET requests.head() 获取HTML网页头信息的方法,对应于HTTP的HEAD requests.post() 向HTML网页提交POST请求的方法,对应于HTTP的POST requests.put() 向HTML网页提交PUT请求的方法,对应于HTTP的PUT requests.patch(

Requests 库

Requests 库的两个重要的对象:(Request , Response) Response对象的属性: import requests r=requests.get('http://www.bilibili.com') # response 对象 print(r.status_code) # 200状态码-----404错误 print(r.headers) # 请求码 print(r.text) # 字符串形式 print(r.encoding) # 网页的编码方式-根据headers猜

python requests库学习笔记（下）

1.请求异常处理请求异常类型: 请求超时处理(timeout): 实现代码: import requestsfrom requests import exceptions #引入exceptions A:请求超时 def timeout_request(): try: response = requests.get(build_uri('user/emails'), timeout=0.1) except exceptions.Timeout as e:

python WEB接口自动化测试之requests库详解

1.Get请求前提: requests库是python的第三方库,需要提前安装哦,可以直接用pip命令:`python –m pip install requests` 按照惯例,先将requests库的属性打印出来,看看哪些属性. >>> import requests >>> dir(requests) #查看requests库的属性 ['ConnectionError', 'HTTPError', 'NullHandler', 'PreparedRequest'

爬虫基础(requests库的基本使用)--02

Requests库的基本使用

Requests 库的使用

Python 的标准库 urllib 提供了大部分 HTTP 功能,但使用起来较繁琐.通常,我们会使用另外一个优秀的第三方库:Requests,它的标语是:Requests: HTTP for Humans. Requests 提供了很多功能特性,几乎涵盖了当今 Web 服务的需求,比如: 浏览器式的 SSL 验证身份认证 Keep-Alive & 连接池带持久 Cookie 的会话流下载文件分块上传下面,我们将从以下几个方面介绍 Requests 库: HTTP 请求 HTTP 响应