Python3 读写文件碰到的编码问题

body
{
font-family: "Microsoft YaHei UI","Microsoft YaHei",SimSun,"Segoe UI",Tahoma,Helvetica,Sans-Serif,"Microsoft YaHei", Georgia,Helvetica,Arial,sans-serif,宋体, PMingLiU,serif;
font-size: 10.5pt;
line-height: 1.5;
}
html, body
{

}
h1 {
font-size:1.5em;
font-weight:bold;
}
h2 {
font-size:1.4em;
font-weight:bold;
}
h3 {
font-size:1.3em;
font-weight:bold;
}
h4 {
font-size:1.2em;
font-weight:bold;
}
h5 {
font-size:1.1em;
font-weight:bold;
}
h6 {
font-size:1.0em;
font-weight:bold;
}
img {
border:0;
max-width: 100%;
}
blockquote {
margin-top:0px;
margin-bottom:0px;
}
table {
border-collapse:collapse;
border:1px solid #bbbbbb;
}
td {
border-collapse:collapse;
border:1px solid #bbbbbb;
}

1，远程文件资源读取 response的为 bytes，即utf-8或者gbk，需解码decode为unicode

如：

[python] view plain copy

# coding=gbk
import urllib.request
import re
url = 'http://www.163.com'
file = 'd:/test.html'
data = urllib.request.urlopen(url).read()
r1 = re.compile('<.*?>')
c_t = r1.findall(data)
print(c_t)

发现读取下来后,运行到第9 行,出现:

can't use a string pattern on a bytes-like object

查找了一下,是说3.0现在的参数更改了,现在读取的是bytes-like的,但参数要求是chart-like的,找了一下,加了个编码:

data = data.decode('GBK')

在与正则使用前,就可以正常使用了..

2.读取本地文本文件open（fname）的为str，即unicode，需编码为encode(utf-8")

如：

[python] view plain copy

import os
fname = 'e:/data/html.txt'
f = open(fname,'r')
html = f.read()
#print(html)
print (type(html)) #输出为 <class 'str'>
u = html.encode('utf-8')
print (type(u))<span style="white-space:pre"> </span>#输出为 <class 'bytes'>

在python3中 <str>型为unicode

来自为知笔记(Wiz)

时间： 2024-10-14 10:28:33

Python3 读写文件碰到的编码问题的相关文章

Dom4j读写文件时的编码问题

1.Dom4j写文件时的编码问题如上图代码所示,如果使用 writer = new XMLWriter(new FileWriter(new File(filepath)), xmlFormat); 下载的xml文件编码格式如下: 原因分析: 由于FileWriter默认的输出编码是ANSI编码,而Dom4j中的wirte方法提供的内容实际是以UTF-8保存的,因此会造成包括中文字符的XML文件无法正常阅读. 解决方法: 不能使用简单的FileWriter,而应该是使用一个能指定具体输出编码的

读写文件时遇到编码问题解决方法

读取文件时有时候会遇到编码问题,导致读取的内容为乱码,此时我们可以用iconv来进行转码. 使用file_get_contents()来读取文件内容,但是有时候文件过大的话,就会导致内存溢出,读取失败, 此时使用逐行读取的方法: $file = '1.txt';$handle = fopen($file, 'r');while (!feof($handle)) { $line = fgets($handle); $encoding = mb_detect_encoding($line, arra

python3中文件操作及编码

#之前一直没明白文件处理中的w和wb的区别到底是什么,#在看过视频后才知道,原来在linux里面是没有区别的,#但是在windows里面就能够看出区别来了#下面来个例子: with open("普通文本文件.txt", "w",encoding='utf-8') as f: data = 'This is testing!\nThis is testing!' f.write(data) f.close() with open("二进制文本文件.txt&q

python3 unicod,utf-8,gbk的编码和解码中文显示问题

python3的字符编码让人头疼.这个也不是一篇介绍gbk, utf-8, unicode怎么表示英文,中文字符的基础知识总结.网上有很多类似的文章,目前也不需要升入学习到了解各个bit位表示什么含义. 目的:清楚了解为什么在python3不同的编码,解码,windows,linux操作系统下,字符是否能够正确显示. 前提:了解不同编码用不同的二级制编码和长度来表示字符.在python3中,各种字符编码之间的互相转换都要通过unicode作为中间编码转换.gbk转换成unicode,再从unic

读写文件编码方式不一致导致文件乱码的解决方案

这几天在弄一个android应用的数据加密功能,为了避免加密.解密算法被破解,我将加密和解密的核心算法用JNI封装起来,只把接口暴露给java层. 工作流程是这样的: 1.通过自己写的加密解密工具将数据加密: 2.将加密的数据放在android的asserts文件夹下: 3.在首次使用数据时将asserts文件夹下的数据拷贝到一个隐藏文件夹下: 4.解密隐藏文件夹下的文件. 在用加密工具将数据加密好了,在程序解密这个数据文件的过程中,发现解密出来的文件是原来文件大小的2倍,并且全是乱码,跟踪发现

读写文件、文件方法、python2的乱码问题、python对passwd文件进行排序

读写文件 if __name__=='__main__': filename=input('请输入保存的文件:') fdfile=open(filename,'w+') while 1: text=input('请输入内容: ') if text == 'EOF': break else: fdfile.write(text) fdfile.write('\n') fdfile.close() readfile=open(filename) print('##############start#

解决python3读写中文txt时UnicodeDecodeError : 'ascii' codec can't decode byte 0xc4 in position 5595: ordinal not in range(128) on line 0的问题

今天使用python3读写含有中文的txt时突然报了如下错误,系统是MAC OS,iDE是pycharm: UnicodeDecodeError : 'ascii' codec can't decode byte 0xc4 in position 5595: ordinal not in range(128) on line 0 按理说python3的默认编码是unicode,不应该出现这种错误,排查以后发现问题及解决方案如下: import locale print(locale.getpre

python学习列表字符串字典集合文件操作字符串编码与转换

一.列表 1 names = "ZhangYang GuYun XiangPeng XuLiangchen" 2 names = ["ZhangYang", "GuYun", "XiangPeng", "ChengRongHua","XuLiangchen"] 3 names.append("LeiHaiDong") #在列表最后追加一个元素 4 names.inse

Python读写文件

Python读写文件1.open使用open打开文件后一定要记得调用文件对象的close()方法.比如可以用try/finally语句来确保最后能关闭文件. file_object = open('thefile.txt')try: all_the_text = file_object.read( )finally: file_object.close( ) 注:不能把open语句放在try块里,因为当打开文件出现异常时,文件对象file_object无法执行close()方法.