Java – Reading a Large File Efficiently--转

原文地址:http://www.baeldung.com/java-read-lines-large-file

1. Overview

This tutorial will show how to read all the lines from a large file in Java in an efficient manner.

This article is part of the “Java – Back to Basic” tutorial here on Baeldung.

2. Reading In Memory

The standard way of reading the lines of the file is in-memory – both Guava and Apache Commons IO provide a quick way to do just that:

?


1

Files.readLines(new File(path), Charsets.UTF_8);

?


1

FileUtils.readLines(new File(path));

The problem with this approach is that all the file lines are kept in memory – which will quickly lead to OutOfMemoryError if the File is large enough.

For example – reading a ~1Gb file:

?


1

2

3

4

5

@Test

public void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException {

    String path = ...

    Files.readLines(new File(path), Charsets.UTF_8);

}

This starts off with a small amount of memory being consumed: (~0 Mb consumed)

?


1

2

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb

However, after the full file has been processed, we have at the end: (~2 Gb consumed)

?


1

2

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb

Which means that about 2.1 Gb of memory are consumed by the process – the reason is simple – the lines of the file are all being stored in memory now.

It should be obvious by this point that keeping in-memory the contents of the file will quickly exhaust the available memory – regardless of how much that actually is.

What’s more, we usually don’t need all of the lines in the file in memory at once – instead, we just need to be able to iterate through each one, do some processing and throw it away. So, this is exactly what we’re going to do – iterate through the lines without holding the in memory.

3. Streaming Through the File

Let’s now look at a solution – we’re going to use a java.util.Scanner to run through the contents of the file and retrieve lines serially, one by one:

?


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

FileInputStream inputStream = null;

Scanner sc = null;

try {

    inputStream = new FileInputStream(path);

    sc = new Scanner(inputStream, "UTF-8");

    while (sc.hasNextLine()) {

        String line = sc.nextLine();

        // System.out.println(line);

    }

    // note that Scanner suppresses exceptions

    if (sc.ioException() != null) {

        throw sc.ioException();

    }

} finally {

    if (inputStream != null) {

        inputStream.close();

    }

    if (sc != null) {

        sc.close();

    }

}

This solution will iterate through all the lines in the file – allowing for processing of each line – without keeping references to them – and in conclusion, without keeping them in memory(~150 Mb consumed)

?


1

2

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 763 Mb

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 605 Mb

4. Streaming with Apache Commons IO

The same can be achieved using the Commons IO library as well, by using the customLineIterator provided by the library:

?


1

2

3

4

5

6

7

8

9

LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");

try {

    while (it.hasNext()) {

        String line = it.nextLine();

        // do something with line

    }

} finally {

    LineIterator.closeQuietly(it);

}

Since the entire file is not fully in memory – this will also result in pretty conservative memory consumption numbers(~150 Mb consumed)

?


1

2

[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Total Memory: 752 Mb

[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Free Memory: 564 Mb

5. Conclusion

This quick article shows how to process lines in a large file without iteratively, without exhausting the available memory – which proves quite useful when working with these large files.

The implementation of all these examples and code snippets can be found in my github project – this is an Eclipse based project, so it should be easy to import and run as it is.

时间: 2024-10-05 11:57:36

Java – Reading a Large File Efficiently--转的相关文章

有效地加载大尺寸位图(Loading Large Bitmaps Efficiently)

有效地加载大尺寸位图(Loading Large Bitmaps Efficiently) 图片有不同的形状与大小.在大多数情况下它们的实际大小都比需要呈现出来的要大很多.例如,系统的Gallery程序会显示那些你使用设备camera拍摄的图片,但是那些图片的分辨率通常都比你的设备屏幕分辨率要高很多. 考虑到程序是在有限的内存下工作,理想情况是你只需要在内存中加载一个低分辨率的版本即可.这个低分辨率的版本应该是与你的UI大小所匹配的,这样才便于显示.一个高分辨率的图片不会提供任何可见的好处,却会

Java学习记录(补充八:Date类;Java流(Stream),文件(File)和IO)

Date类,Calendar类package Box1; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Calendar; import java.util.Date; import java.util.Random; //Date类 public class DateTest { public static void main(String[] args) { Date

JAVA之IO技术File类的使用

package ioTest.io3; /* * File类是对文件和文件夹的抽象. * File类的常见方法: * 1.创建 * createNewFile():在指定的位置创建文件,如果该文件已经存在,则不创建,返回false. * 和输出流对象不一样,输出流对象,只要已建立就创建,如果文件存在,则覆盖. * mkdir():创建此抽象路径名指定的目录 * mkdir():创建此抽象路径名指定的目录,包括所有必需但不存在的父目录. * 2.删除 * delete() * deleteOnEx

-bash: /root/java/jdk/bin/java: cannot execute binary file

错误 -bash: /root/java/jdk/bin/java: cannot execute binary file 错误原因 安装的Linux的版本是32位的,下载的软件是64位,版本不兼容,需要换一个相同位数的版本 查看Linux的版本 file  /sbin/init 或者  file /bin/ls 这个显示你的版本是32位的 反之则是64位的

[解决]--java_out: User.proto: User.proto: Cannot generate Java output because the file 's

在使用 protocol buffer 的时候,用.proto文件生成代码文件时报错 使用命令 protoc.exe --java_out c:\logs\ User.proto User.proto文件内容格式如下 message User{     required string userName = 1[default=""];                                                                             

java之io之file类的常用操作

java io 中,file类是必须掌握的.它的常用api用法见实例. package com.westward.io; import java.io.File; import java.io.IOException; public class FileDemo { public static void main(String[] args) { File file= new File("d:\\javaio"); @SuppressWarnings("static-acce

Linux下出现/java: cannot execute binary file

这种情况一般虚拟机的位数(32和64)不匹配造成的.重新下一个和你虚拟机匹配的JDK版本就行了 其中带有X64的都是64位,其他32位 Linux下出现/java: cannot execute binary file

java.lang.IllegalStateException: Zip File is closed

最近在研究利用sax读取excel大文件时,出现了以下的错误: java.lang.IllegalStateException: Zip File is closed at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getEntries(ZipFileZipEntrySource.java:45) at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.ja

一个坑poi大数据量导入时的java.lang.IllegalStateException: Zip File is closed

本帖子没有答案,只是说一下坑,余下的自己决定. Caused by: java.lang.IllegalStateException: Zip File is closed at org.apache.poi3.openxml4j.util.ZipFileZipEntrySource.getEntries(ZipFileZipEntrySource.java:45) at org.apache.poi3.openxml4j.opc.ZipPackage.getPartsImpl(ZipPacka