Java – Reading a Large File Efficiently--转

原文地址：http://www.baeldung.com/java-read-lines-large-file

1. Overview

This tutorial will show how to read all the lines from a large file in Java in an efficient manner.

This article is part of the “Java – Back to Basic” tutorial here on Baeldung.

2. Reading In Memory

The standard way of reading the lines of the file is in-memory – both Guava and Apache Commons IO provide a quick way to do just that:

1	`Files.readLines(new` `File(path), Charsets.UTF_8);`

1	`FileUtils.readLines(new` `File(path));`

The problem with this approach is that all the file lines are kept in memory – which will quickly lead to OutOfMemoryError if the File is large enough.

For example – reading a ~1Gb file:

@Test

public void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException {

String path = ...

Files.readLines(new File(path), Charsets.UTF_8);

}

This starts off with a small amount of memory being consumed: (~0 Mb consumed)

1 2	`[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb` `[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb`

However, after the full file has been processed, we have at the end: (~2 Gb consumed)

1 2	`[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb` `[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb`

Which means that about 2.1 Gb of memory are consumed by the process – the reason is simple – the lines of the file are all being stored in memory now.

It should be obvious by this point that keeping in-memory the contents of the file will quickly exhaust the available memory – regardless of how much that actually is.

What’s more, we usually don’t need all of the lines in the file in memory at once – instead, we just need to be able to iterate through each one, do some processing and throw it away. So, this is exactly what we’re going to do – iterate through the lines without holding the in memory.

3. Streaming Through the File

Let’s now look at a solution – we’re going to use a java.util.Scanner to run through the contents of the file and retrieve lines serially, one by one:

FileInputStream inputStream = null;

Scanner sc = null;

try {

inputStream = new FileInputStream(path);

sc = new Scanner(inputStream, "UTF-8");

while (sc.hasNextLine()) {

String line = sc.nextLine();

// System.out.println(line);

}

// note that Scanner suppresses exceptions

if (sc.ioException() != null) {

throw sc.ioException();

}

} finally {

if (inputStream != null) {

inputStream.close();

}

if (sc != null) {

sc.close();

}

This solution will iterate through all the lines in the file – allowing for processing of each line – without keeping references to them – and in conclusion, without keeping them in memory: (~150 Mb consumed)

1 2	`[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 763 Mb` `[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 605 Mb`

4. Streaming with Apache Commons IO

The same can be achieved using the Commons IO library as well, by using the customLineIterator provided by the library:

LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");

try {

while (it.hasNext()) {

String line = it.nextLine();

// do something with line

}

} finally {

LineIterator.closeQuietly(it);

}

Since the entire file is not fully in memory – this will also result in pretty conservative memory consumption numbers: (~150 Mb consumed)

1 2	`[main] INFO o.b.java.CoreJavaIoIntegrationTest - Total Memory: 752 Mb` `[main] INFO o.b.java.CoreJavaIoIntegrationTest - Free Memory: 564 Mb`

5. Conclusion

This quick article shows how to process lines in a large file without iteratively, without exhausting the available memory – which proves quite useful when working with these large files.

The implementation of all these examples and code snippets can be found in my github project – this is an Eclipse based project, so it should be easy to import and run as it is.

时间： 2024-10-05 11:57:36

Java – Reading a Large File Efficiently--转

1. Overview

2. Reading In Memory

3. Streaming Through the File

4. Streaming with Apache Commons IO

5. Conclusion

Java – Reading a Large File Efficiently--转的相关文章

有效地加载大尺寸位图(Loading Large Bitmaps Efficiently)

Java学习记录(补充八:Date类;Java流(Stream),文件(File)和IO)

JAVA之IO技术File类的使用

-bash: /root/java/jdk/bin/java: cannot execute binary file

[解决]--java_out: User.proto: User.proto: Cannot generate Java output because the file 's

java之io之file类的常用操作

Linux下出现/java: cannot execute binary file

java.lang.IllegalStateException: Zip File is closed

一个坑poi大数据量导入时的java.lang.IllegalStateException: Zip File is closed