Read Large Files in Python

I have a large file ( ~4G) to process in Python. I wonder whether it is OK to "read" such a large file. So I tried in the following several ways:

The original large file to deal with is not "./CentOS-6.5-i386.iso", I just take this file as an example here.

1:  Normal Method. (ignore try/except/finally)

def main():
    f = open(r"./CentOS-6.5-i386.iso", "rb")
    for line in f:
        print(line, end="")
    f.close()

if __name__ == "__main__":
    main()

2: "With" Method.

def main():
    with open(r"./CentOS-6.5-i386.iso", "rb") as f:
        for line in f:
            print(line, end="")

if __name__ == "__main__":
    main()

3:  "readlines" Method. [Bad Idea]

#NO. readlines() is really bad for large files.
#Memory Error.
def main():
    for line in open(r"./CentOS-6.5-i386.iso", "rb").readlines():
        print(line, end="")

if __name__ == "__main__":
    main()

4: "fileinput" Method.

import fileinput

def main():
    for line in fileinput.input(files=r"./CentOS-6.5-i386.iso", mode="rb"):
        print(line, end="")

if __name__ == "__main__":
    main()

5: "Generator" Method.

def readFile():
    with open(r"./CentOS-6.5-i386.iso", "rb") as f:
        for line in f:
            yield line

def main():
    for line in readFile():
        print(line, end="")

if __name__ == "__main__":
    main()

The methods above, all work well for small files, but not always for large files(readlines Method). The readlines() function loads the entire file into memory as it runs.

When I run the readlines Method, I got the following error message:

When using the readlines Method, the Percentage of Used CPU and Used Memory rises rapidly(in the following figure). And when the percentage of Used Memory reaches over 50%, I got the "MemoryError" in Python.

The other methods (Normal Method, With Method, fileinput Method, Generator Method) works well for large files. And when using these methods, the workload for CPU and memory which is shown in the following figure does not get a distinct rise.

By the way, I recommend the generator method, because it shows clearly that you have taken the file size into account.

Reference:

How to read large file, line by line in python

时间: 2024-10-10 16:49:38

Read Large Files in Python的相关文章

Huge CSV and XML Files in Python, Error: field larger than field limit (131072)

Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest linkedin google+ I, like most people, never realized I'd be dealing with large files. Oh, I knew there would be some files with megabytes of data, but I

【bugRecord4】Fatal error in launcher: Unable to create process using '""D:\Program Files\Python36\python.exe"" "D:\Program Files\Python36\Scripts\pip.exe" '

环境信息: python版本:V3.6.4 安装路径:D:\Program Files\python36 环境变量PATH:D:\Program Files\Python36;D:\Program Files\Python36\Scripts; 问题描述:命令行执行pip报错 解决方法: 1.切换到D:\Program Files\Python36\Scripts 2.执行python pip.exe install SomePackage进行安装 3.安装成功后执行pip仍报错 4.查看安装成

Working with Excel Files in Python

Working with Excel Files in Python from: http://www.python-excel.org/ This site contains pointers to the best information available about working with Excel files in the Python programming language. The Packages There are python packages available to

“Unicode Error ”unicodeescape" codec can't decode bytes… Cannot open text files in Python 3

“Unicode Error ”unicodeescape" codec can't decode bytes… Cannot open text files in Python 3 问题于字符串 "C:\Users\Eric\Desktop\beeline.txt" 在这里,\U启动一个八字符的Unicode转义,例如'\ U00014321`.在你的代码中,转义后面跟着字符's',这是无效的. 需要复制所有反斜杠,或者在字符串前加上r(以生成原始字符串). python

Read a large file with python

python读取大文件 较pythonic的方法,使用with结构 文件可以自动关闭 异常可以在with块内处理 with open(filename, 'rb') as f: for line in f: <do someting with the line> 最大的优点:对可迭代对象 f,进行迭代遍历:for line in f,会自动地使用缓冲IO(buffered IO)以及内存管理,而不必担心任何大文件的问题. There should be one – and preferably

Creating Excel files with Python and XlsxWriter——Introduction

XlsxWriter 是用来写Excel2007版本以上的xlsx文件的Python模块. XlsxWriter 在供选择的可以写Excel的Python模块中有自己的优缺点. #--------------------------------------------------------------------------------------- 优点: 1. 它支持相比其他模块而言更多的Excel 特征: 2. 它生成的Excel文件有很高的保真度,大多数情况下生成的文件和Excel生成

字符串以及文件的hashlib的md5和sha1等的运用

hashlib的md5和sha1等的运用 import hashlib print(hashlib.algorithms_available) print(hashlib.algorithms_guaranteed) #MD5 import hashlib hash_object = hashlib.md5(b'Hello World') print(hash_object.hexdigest()) # import hashlib mystring = input('Enter String

Python框架、库以及软件资源汇总

转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世界各地的程序员们都能够贡献他们的代码与创新. Python就是这样一门受到全世界各地开源社区支持的语言.Python可以用来开发各种小工具软件.web应用.科学计算.数据分析等等,Python拥有大量的流行框架,比如Django.使用Python框架时,可以根据自己的需求插入不同的模块,比如可以用S

Machine and Deep Learning with Python

Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstitions cheat sheet Introduction to Deep Learning with Python How to implement a neural network How to build and run your first deep learning network Neur