How to improve Java's I/O performance( 提升 java i/o 性能)

原文:http://www.javaworld.com/article/2077523/build-ci-sdlc/java-tip-26--how-to-improve-java-s-i-o-performance.html

JDK 1.0.2 的 java.io 包暴露了很多I/O性能问题,这里将介绍一个优化方案,附加一个关闭同步的方法。

Java的I/O性能曾经是很多Java应用的瓶颈,主要原因就是JDK1.0.2的java.io包的不良设计和实现。关键问题是缓冲,绝大多数java.io中的类都未做缓冲。事实上,只有BufferedInputStream 和 BufferedOutputStream两个类做了缓冲,但他们提供的方法有限。例如,在大多数涉及文件操作的应用中,你需要逐行解析一个文件。但是唯一提供了readLine方法的类是DataInputStream,可是它却没有内部缓冲。DataInputStream的readLine方法其实是从输入流中逐个读取字符直到遇到
“n” 或 “rn”字符。每个读取字符操作都涉及到一次文件I/O。这在读取一个大文件时是极其低效的。没有缓冲的情况下一个5兆字节的文件就需要至少5百万次读取字符的文件I/O操作。

新版本JDK1.1通过增加一套Reader、Writer类改进了I/O性能。在大文件读取中BufferedReader的readLine方法至少比以前的DataInputStream快10到20倍。不幸的是,JDK1.1没有解决所有的性能问题。比如,当你想解析一个大文件但是又不希望全部读到内存中时,需要使用到RandomAccessFile类,但是在JDK1.1里它也没有做缓冲,也没有提供其他类似的Reader类。

How to tackle the I/O problem

To tackle the problem of inefficient file I/O, we need a buffered RandomAccessFile class. A new class is derived from the RandomAccessFile class, in order to reuse all the methods in it. The new class is named Braf(Bufferedrandomaccessfile).

如何解决I/O难题?

解决低效的文件I/O,我们需要一个提供缓冲的RandomAccessFile类。有一个类继承自RandomAccessFile,并且重用了RandomAccessFile中的所有方法,它就是Braf(Bufferedrandomaccessfile)。

 public class Braf extends RandomAccessFile {
  }

出于效率原因,我们定义了一个字节缓冲区而不是字符缓冲区。使用buf_end、buf_pos和real_pos三个变量来记录缓冲区上有用的位置信息。

For efficiency reasons, we define a byte buffer instead of char buffer. The variables buf_end, buf_pos, and real_pos are used to record the effective positions on the buffer:

byte buffer[];

int buf_end = 0;

int buf_pos = 0;

long real_pos = 0;

增加了一个新的构造函数,里面多了一个指定缓冲区大小的参数:

A new constructor is added with an additional parameter to specify the size of the buffer:

public Braf(String filename, String mode, int bufsize)
   throws IOException{
    super(filename,mode);
    invalidate();
    BUF_SIZE = bufsize;
    buffer = new byte[BUF_SIZE];
  }

新写了一个read方法,它永远优先读取缓冲区。它覆盖了原来的read方法,在缓冲区读完时,会调用fillBuffer,它将调用父类的read方法读取字节,填充到缓冲区中。私有函数invalidate被用来判断缓冲区中是否包含合法数据,它在seek方法被调用、文件指针可能被定位到缓冲区之外时是非常有必要的。

The new read method is written such that it always reads from the buffer first. It overrides the native read method in the original class, which is never engaged until the buffer has run out of room. In that case, the fillBuffer method is called to fill
in the buffer. In fillBuffer, the original read is invoked. The private method invalidateis used to indicate that the buffer no longer contains valid contents. This is necessary when the seek method moves the file pointer out of the buffer.

public final int read() throws IOException{
    if(buf_pos >= buf_end) {
       if(fillBuffer() < 0)
       return -1;
    }
    if(buf_end == 0) {
         return -1;
    } else {
         return buffer[buf_pos++];
    }
  }
  private int fillBuffer() throws IOException {
    int n = super.read(buffer, 0, BUF_SIZE);
    if(n >= 0) {
      real_pos +=n;
      buf_end = n;
      buf_pos = 0;
    }
    return n;
  }
  private void invalidate() throws IOException {
    buf_end = 0;
    buf_pos = 0;
    real_pos = super.getFilePointer();
  }

另一个参数化的读取方法也被重载,代码如下。如果缓冲足够的话,它就会调用System.arraycopy 方法直接从缓冲中拷贝一部分到用户区。这个也能显著提升性能,因为getNextLine方法中read()方法被大量使用,getNextLine也是readLine的替代品。

The other parameterized read method also is overridden. The code for the new read is listed below. If there is enough buffer, it will simply call System.arraycopy to copy a portion of the buffer directly into the user-provided area. This presents the most
significant performance gain because the read method is heavily used in the getNextLine method, which is our replacement for readLine.

public int read(byte b[], int off, int len) throws IOException {
   int leftover = buf_end - buf_pos;
   if(len <= leftover) {
             System.arraycopy(buffer, buf_pos, b, off, len);
        buf_pos += len;
        return len;
   }
   for(int i = 0; i < len; i++) {
      int c = this.read();
      if(c != -1)
         b[off+i] = (byte)c;
      else {
         return i;
      }
   }
   return len;
  }

原来的getFilePointer和seek方法也需要被重载来配合缓冲。大多数情况下,两个方法只会简单的在缓冲中进行操作

The original methods getFilePointer and seek need to be overridden as well in order to take advantage of the buffer. Most of time, both methods will simply operate inside the buffer.

public long getFilePointer() throws IOException{
    long l = real_pos;
    return (l - buf_end + buf_pos) ;
  }
  public void seek(long pos) throws IOException {
    int n = (int)(real_pos - pos);
    if(n >= 0 && n <= buf_end) {
      buf_pos = buf_end - n;
    } else {
      super.seek(pos);
      invalidate();
    }
  }

最重要的,一个新的方法,getNextLine,被加入来替换readLine。我们不能简单的重载readLine,因为它是final定义的。getNextLine方法首先需要确定buffer是否有未读数据。如果没有,缓冲区需要被填满。读取时如果遇到换行符,新的一行就从缓冲区中读出转换为String对象。否则,将继续调用read方法逐个读取字节。尽管后面部分的代码和原来的readLine很像,但是由于read方法做了缓冲,它的性能也要优于以前。

Most important, a new method, getNextLine, is added to replace the readLine method. We can not simply override the readLine method because it is defined as final in the original class. The getNextLine method first decides if the buffer still contains unread
contents. If it doesn‘t, the buffer needs to be filled up. If the new line delimiter can be found in the buffer, then a new line is read from the buffer and converted into String. Otherwise, it will simply call the read method to read byte by byte. Although
the code of the latter portion is similar to the original readLine, performance is better here because the read method is buffered in the new class

/**
   * return a next line in String
   */
  public final String getNextLine() throws IOException {
   String str = null;
   if(buf_end-buf_pos <= 0) {
      if(fillBuffer() < 0) {
                throw new IOException("error in filling buffer!");
      }
   }
   int lineend = -1;
   for(int i = buf_pos; i < buf_end; i++) {
        if(buffer[i] == ‘\n‘) {
         lineend = i;
          break;
          }
   }
   if(lineend < 0) {
        StringBuffer input = new StringBuffer(256);
        int c;
             while (((c = read()) != -1) && (c != ‘\n‘)) {
                 input.append((char)c);
        }
        if ((c == -1) && (input.length() == 0)) {
          return null;
        }
        return input.toString();
   }
   if(lineend > 0 && buffer[lineend-1] == ‘\r‘)
        str = new String(buffer, 0, buf_pos, lineend - buf_pos -1);
   else str = new String(buffer, 0, buf_pos, lineend - buf_pos);
   buf_pos = lineend +1;
   return str;
   }

在Braf类的帮助下,我们在逐行读取大文件时至少能得到高过RandomAccessFile类25倍的性能提升。这个方案也应用在其他I/O操作密集的场景中。

关闭同步:额外的提示

除了I/O,另一个拖累Java性能的因素是同步,大体上,同步方法的成本大约是普通方法的6倍。如果你在写一个没有多线程的应用,或者是一个应用中肯定只会单线程运行的部分,你不需要做任何同步声明。当前,Java还没有机制来关闭同步。一个非正规的方法是拿到源码,去掉同步声明然后创建一个新类。例如,BufferedInputStream中两个read方法都是同步的,因为其他I/O方法都依赖它们。你可以在JavaSoft的JDK 1.1中拷贝BufferedInputStream.java 源码,创建一个新的NewBIS类,删掉同步声明,重新编译

With the new Braf class, we have experienced at least 25 times performance improvement over RandomAccessFile when a large file needs to be parsed line by line. The method described here also applies to other places where intensive file I/O operations are
involved.

Synchronization turn-off: An extra tip

Another factor responsible for slowing down Java‘s performance, besides the I/O problem discussed above, is the synchronized statement. Generally, the overhead of a synchronized method is about 6 times that of a conventional method. If you are writing an application
without multithreading -- or a part of an application in which you know for sure that only one thread is involved -- you don‘t need anything to be synchronized. Currently, there is no mechanism in Java to turn off synchronization. A simple trick is to get
the source code of a class, remove synchronized statements, and generate a new class. For example, in BufferedInputStream, both read methods are synchronized, whereas all other I/O methods depend on them. You can simply rename the class to NewBIS,for example,
copy the source code from BufferedInputStream.java provided by JavaSoft‘s JDK 1.1, remove synchronized statements from NewBIS.java, and recompile NewBIS.

How to improve Java's I/O performance( 提升 java i/o 性能)

时间: 2024-10-29 03:09:11

How to improve Java's I/O performance( 提升 java i/o 性能)的相关文章

How to improve Java&amp;#39;s I/O performance( 提升 java i/o 性能)

原文:http://www.javaworld.com/article/2077523/build-ci-sdlc/java-tip-26--how-to-improve-java-s-i-o-performance.html JDK 1.0.2 的 java.io 包暴露了非常多I/O性能问题.这里将介绍一个优化方案,附加一个关闭同步的方法. Java的I/O性能以前是非常多Java应用的瓶颈.主要原因就是JDK1.0.2的java.io包的不良设计和实现.关键问题是缓冲.绝大多数java.i

Monitor and diagnose performance in Java SE 6--转载

Java SE 6 provides an in-depth focus on performance, offering expanded tools for managing and monitoring applications and for diagnosing common problems. The improvements include: Monitoring and management API enhancements Official support for an imp

Java虚拟机(JVM)体系结构概述及各种性能参数优化总结

转自:http://blog.csdn.net/zhongwen7710/article/details/39213377 第一部分:相关的概念 数据类型 Java虚拟机中,数据类型可以分为两类:基本类型和引用类型.基本类型的变量保存原始值,即:他代表的值就是数值本身:而引用类型的变量保存引用值.“引用值”代表了某个对象的引用,而不是对象本身,对象本身存放在这个引用值所表示的地址的位置. 基本类型包括:byte,short,int,long,char,float,double,Boolean,r

成为Java GC专家(5)—Java性能调优原则

这是"成为Java GC专家"系列的第五篇文章.在第一篇深入浅出Java垃圾回收机制中,我们已经学习了不同的GC算法流程.GC的工作原理.新生代(Young Generation)和老年代(Old Generation)的概念.你应该了解了JDK7中5种GC类型以及各种类型对应用程序的影响. 在第二篇如何监控Java的垃圾回收中,阐述了JVM是怎样实际执行垃圾回收的,我们怎样去监控GC以及哪些工具能让这个过程更高效. 第三篇如何如何优化Java垃圾回收机制中展示了一些基于真实案例的最佳

spring Caused by: java.lang.SecurityException: Prohibited package name: java.time

六月 09, 2014 1:05:02 下午 org.apache.catalina.core.AprLifecycleListener init 信息: Loaded APR based Apache Tomcat Native library 1.1.29 using APR version 1.4.8. 六月 09, 2014 1:05:03 下午 org.apache.catalina.core.AprLifecycleListener init 信息: APR capabilities

[转帖]Java虚拟机(JVM)体系结构概述及各种性能参数优化总结

Java虚拟机(JVM)体系结构概述及各种性能参数优化总结 2014年09月11日 23:05:27 zhongwen7710 阅读数 1437 标签: JVM调优jvm 更多 个人分类: Java知识点总结技术架构原理 https://blog.csdn.net/zhongwen7710/article/details/39213377 写的很好.. 堆栈分不清楚的我 愧对计算机系毕业.. 第一部分:相关的概念 数据类型 Java虚拟机中,数据类型可以分为两类:基本类型和引用类型.基本类型的变

JAVA常见面试题及解答-java开发

JAVA常见面试题及解答 Java的垃圾回收总结  浅谈Java中的内部类 1)transient和volatile是java关键字吗? 如果用transient声明一个实例变量,当对象存储时,它的值不需要维持.例如: class T { transient int a;  //不需要维持 int b;  //需要维持 } 这里,如果T类的一个对象写入一个持久的存储区域,a的内容不被保存,但b的将被保存. volatile修饰符告诉编译器被volatile修饰的变量可以被程序的其他部分改变.在多

《java小应用程序(Applet)和java应用程序(Application)分别编写的简单计算器》

Application和Java Applet的区别.Java语言是一种半编译半解释的语言.Java的用户程序分为两类:Java Application和Java Applet.这两类程序在组成结构和执行机制上都有一定的差异,主要体现在以下几方面:(1)运行方式不同.Java Application是完整的程序,可以独立运行:Java Applet程序不能单独运行, 它必须嵌入到用HTML语言编写的Web页面中,通过与Java兼容的浏览器来控制执行.(2)运行工具不同.Java Applicat

Java GC专家系列2:Java 垃圾回收的监控

这是”成为GC专家系列”文章的第二篇.在第一篇理解Java垃圾回收中我们学习了几种不同的GC算法的处理过程,GC的工作方式,新生代与老年代的区别.到目前为止,你应该已经了解了JDK 7中的5种GC类型,以及每种GC对性能的影响. 在本篇中,我将介绍JVM在真实环境中如何运行GC的. 什么是GC监控 GC监控 指的是在运行时跟踪JVM运行GC的过程.例如,通过GC监控,我们能找出: 何时新生代的对象会被移动到老年代,有多少对象被移到了老年代. 何时stop-the-world发生以及持续时间. 通