Python中，关于读取文件编码解码的问题

UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xb1 in position 94: illegal multibyte sequence

            有时候用open()方法打开文件读取文件的时候会出现这个问题：‘GBK’编×××无法解码94号位置的字节0xb1：非法多字节序列。错误信息提示了使用“GBK”解码。
            1.分析
            pycharm自动使用的是‘UTF-8’编码，好像没有什么问题，为什么会出现这个错误呢。结果查了下open()函数的注解，里面又这么一段话：
             encoding is the name of the encoding used to decode or encode the  file. This should only be used in text mode. *The default encoding is platform dependent*, but any encoding supported by Python can be  passed.  See the codecs module for the list of supported encodings.
                 The default encoding is platform dependent：默认编码方式取决于平台。这也就不奇怪会用‘GBK’编码了，平台不一样，编码方式不一样，所以读取的时候回出现错误。
            2.解决方法
                    # 1.以byte读取，并以‘utf-8’解码
                    # fp = open(filename, ‘rb‘)
                    # content = fp.read()
                    # self.content = content.decode(‘utf-8‘)
                    # fp.close()
                    # 2.在打开文件时指定编码方式
                    fp = open(filename, encoding=‘utf-8‘)
                    content = fp.read()
                    self.content = content
                    fp.close()

                    如有不同见解，欢迎分享。

原文地址：http://blog.51cto.com/14094286/2323006

时间： 2025-01-01 12:33:20

Python中，关于读取文件编码解码的问题的相关文章

python中open读取文件编码错误，怎么办?

比如你只需要提取文件中的手机号,而这些文件的编码都不相同,这时,可以这样: f = open(filename,'r,encoding='utf-8',errors=ignore) content = f.read() #do something with content f.close() 原文地址:https://www.cnblogs.com/kimkat/p/11664795.html

用chardet检测编码 import chardet raw = open("model.json", 'rb').read() result = chardet.detect(raw) # 检测编码 encoding = result['encoding'] f = open("model.json", "r", encoding=encoding) lines = f.readlines() for line in lines: print

Python中常见的文件对象内建函数

文件对象内建方法列表文件对象的方法操作 file.close() 关闭文件 file.fileno() 返回文件的描述符(file descriptor,FD,整数值) file.flush() 刷新文件的内部缓冲区 file.isatty() 判断file是否是一个类设tty备 file.next() 返回文件的下一行,或在没有其它行时引发StopIteration异常 file.read(size=-1) 从文件读取size个字节,当未给定size或给定负值时读取剩余的所有字节,然后作为

python中关于本地文件的API

Python中关于本地文件的API */--> Python中关于本地文件的API #TITLE: python中关于本地文件的API #KEYWORDS: Python,文件,路径 #DATE: Fri Jul 1 21:24:04 2016 在Python中,文件操作主要来自os模块,主要方法如下: 函数描述 os.listdir(dirname) 列出dirname下的目录和文件 os.getcwd() 获得当前工作目录 os.curdir 返回当前目录('.') os.chdir(di

Linux中逐行读取文件的方法

Linux中逐行读取文件的方法在linux中有很多方法逐行读取一个文件的方法,其中最常用的就是下面的脚本里的方法,而且是效率最高,使用最多的方法.为了给大家一个直观的感受,我们将通过生成一个大的文件的方式来检验各种方法的执行效率. 方法1:while循环中执行效率最高,最常用的方法. function while_read_LINE_bottm(){ While read LINE do echo $LINE done < $FILENAME } 注释:我习惯把

Python按行读取文件、写文件

Python按行读取文件学习了:https://www.cnblogs.com/scse11061160/p/5605190.html file = open("sample.txt") for line in file: pass # do something file.close() 学习了:https://blog.csdn.net/ysdaniel/article/details/7970883 去除换行符 for line in file.readlines(): line

关于Python文档读取UTF-8编码文件问题

近来接到一个小项目,读取目标文件中每一行url,并逐个请求url,拿到想要的数据. #-*- coding:utf-8 -*- class IpUrlManager(object): def __init__(self): self.newipurls = set() #self.oldipurls = set() def Is_has_ipurl(self): return len(self.newipurls)!=0 def get_ipurl(self): if len(self.newi

在python中实现对文件的写入，读取，复制，批量重命名

1.写入内容至文件中 def write_file(): open_file = open("xxxx.txt","w") open_file.write("i want to open a file and write this.\n") open_file.close() write_file() 2.读取文件中的内容 #思路:1.以什么方式打开 2.读取文件 3.关闭文件 def read_file(): read_file = open(

python中烦人的编码问题

mysql数据中都是UTF编码,导出到文件称csv还是xls都是utf-8,用python的pandas读取可以,但每次写代码的时候都需要很小心看文件原来是什么编码比如如果在read_csv()中没用encoding转换为Unicode编码的话在后面的字段名什么都要用.decode('utf-8')来解码巨麻烦,而且在用to_csv()之类的保存时候还得再次用到encoding编码将其Unicode转换为utf-8,而且好像window都不认utf-8的,果然还是应该转换为gbk呢,,, 最最