Java知识探究一:关于IO类库 / 憋错料

经过组织考察，令我忽然发觉自己在最常用的Java中也有很多不明白的地方，实为平身一大憾事，今天特意抽时间将这些点滴记录下来，与大家一起分享

第一批想整理的知识点如下：

Java的IO探究，IO的整个结构与发展，顺带附上公司某小工写的断点续传代码学习。
Java的异常机制，关于编译时异常和运行时异常的探究。
JavaCommon包的理解，尤其是collection包的一些小看法，其实容器嘛，什么样的Utils也逃不出一些基本的范畴，比如存、取、排序、安全性、校验等等等。

闲话不多说，先开始今天的主题，研究一下IO的整个结构

从体系结构上划分，IO系统总共分为两大模块， IO和NIO（非阻塞），IO诞生于JDK1.4之前，JDK1.4时，产生了NIO，并且借用NIO重构了部分IO的代码，比如FileInputStream中增加了对NIO进行支持的getChannel()方法，再比如Reader和FileReader基本用nio全部重写了。

一、Think in IO

IO从实现上，大致分为字节流和字符流两种：

字节流。对文件的读写操纵以字节为单位，说的直白一点，就是操作byte，byte数组。对应无符号整数的话，就是read方法的正常返回值范围在[0，255]之间，范围有限的返回值有很多优点，比较有代表性的一个就是可以流来做一个简单的zip实现，算法的话，采用huffman树。当然，一个一个字节操作的话，效率不高，利用Buffer则效率提高不少。但是字节流有个问题，那就是在操作文本文件的时候，对于编码会有很多多余的代码，例子如下
```
FileInputStream is = new FileInputStream("F:\\books\\base\\vim常用指令.txt");
        byte[] buff = new byte[BUFFER_SIZE];
        int readSize = 0;
        while ((readSize = is.read(buff)) != -1)
        {
            System.out.println(readSize);
            if(readSize<1024){
                byte[] tmp = new byte[readSize];
                System.arraycopy(buff, 0, tmp, 0, readSize);
                System.out.print(new String(tmp, "GBK"));
            }else{
                System.out.print(new String(buff, "GBK"));
            }
        }
```

字符流。以字符作为单元进行操作，Reader内部实现其实就是以char或者char数组作为缓存容器的。操作文本文件时候方便许多。编码采用系统默认的编码格式。找了好久才找到代码的说+_+，代码隐藏的很深，从Reader找到ImputStreamReader，再到StreamDecoder再到nio包中的Charset，最终是优先获取系统中的环境变量，System.getProperties()也可以获取，windows7中文版的话，获取到的是“ file.encoding=GB18030”

/**
     * Returns the default charset of this Java virtual machine.
     *
     * <p> The default charset is determined during virtual-machine startup and
     * typically depends upon the locale and charset of the underlying
     * operating system.
     *
     * @return  A charset object for the default charset
     *
     * @since 1.5
     */
    public static Charset defaultCharset() {
        if (defaultCharset == null) {
        synchronized (Charset.class) {
        java.security.PrivilegedAction pa =
            new GetPropertyAction("file.encoding");
        String csn = (String)AccessController.doPrivileged(pa);
        Charset cs = lookup(csn);
        if (cs != null)
            defaultCharset = cs;
                else
            defaultCharset = forName("UTF-8");
            }
    }
    return defaultCharset;
    }

下面详细叙述一下字节流

InputStream 和 OutputStream 是两个 abstact 类，对于字节为导向的 stream 都扩展这两个鸡肋（基类 ^_^ ） ;

FileInputStream，打开本地文件的流，常用，有3个构造方法

public FileInputStream(File file)

public FileInputStream(String name)

public FileInputStream(FileDescriptor fdObj) 值得强调，这个构造是不能直接用的，FileDescriptor 相当于打开文件的句柄，可以用一个文件流创建另一个，这样创建的流相当于是一个。一个流关闭的话，另一个也不能读取。
PipedInputStream，必须与PipedOutputStream一起使用，必须是两个或者多个线程中使用，类似生产者消费者模型， PipedOutputStream将数据写到共享的buffer数组中，通知PipedInputStream读取。
有两点注意事项：

a）使用PipedInputStream的read方法时候要注意，如果缓冲区没有数据的话，会阻塞当前线程，在主线程中运行的话，会卡住不动。

b）PipedOutputStream所在的线程如果停止，那么PipedOutputStream所使用的资源也会回收，会造成pipe 的“broken”，PipedInputStream的read方法也会报错。

“A pipe is said to be broken if a thread that was providing data bytes to the connected piped output stream is no longer alive. ”
FilterInputStream，本身是不能被实例化的，是BufferedInputStream等的父类，其实不创建这个类也可以实现它的子类，这个类内部的方法几乎全部都是复用父类的方法。其实它存在的意义更多是代表一个抽象，意思是在InputStream的基础之上对返回数据进行了重新包装或者处理，处理原因可能各不相同，于是又了各不相同的子类。
LineNumberInputStream，这个类是字节流和字符流转换中的失败产物，已经确定为被废弃，废弃的理由是在字节流中强制的判断读取换行，不考虑编码方面的问题。先不管功能能不能实现，首先从抽象层次上面就有欠缺。挪到字符流里面就皆大欢喜。对应的有LineNumberReader这个类可以使用。具体参见LineNumberReader详解。
DataInputStream，直接读取目标文件的byte，拼接或转化byte为其他基本类型，比如下面方法
```
public final int readInt() throws IOException {
        int ch1 = in.read();
        int ch2 = in.read();
        int ch3 = in.read();
        int ch4 = in.read();
        if ((ch1 | ch2 | ch3 | ch4) < 0)
            throw new EOFException();
        return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + (ch4 << 0));
    }
```
对于基本类型可以这样转化，但是对于float和double，各自用了Float类和Double类中的native方法进行转化，想来与操作系统底层有关系。
```
public final double readDouble() throws IOException {
    return Double.longBitsToDouble(readLong());
    }
```
唯一实现的比较复杂的是readUTF方法，需要读取全部数据，必须是符合格式的，需要用DataOutputStream的writeUTF进行对应的写。DataInputStream在实际运用中，还是应该与DataOutputStream一起使用，不然的话，意义不是十分大。

BufferedInputStream，初始化一个8192大小的缓存，提高效率用，调用API上面没有任何不同，只是减少了直接读取系统数据的次数。内部持有一个普通的inputStream，只有缓冲区空了以后，才真正调用inputStream的read去写满缓冲区，所以直接用BufferedInputStream的read方法可以提高效率。

有点意思的是这个类里面用了一个AtomicReferenceFieldUpdater对象来进行对volatile类型缓冲byte数组的更新和替换，这个类的compareAndSet方法带有原子性质的比较和更新。

/**
     * Atomic updater to provide compareAndSet for buf. This is
     * necessary because closes can be asynchronous. We use nullness
     * of buf[] as primary indicator that this stream is closed. (The
     * "in" field is also nulled out on close.)
     */
    private static final
        AtomicReferenceFieldUpdater<BufferedInputStream, byte[]> bufUpdater =
        AtomicReferenceFieldUpdater.newUpdater
        (BufferedInputStream.class,  byte[].class, "buf");// 创建原子更新器
...
/**
     * Fills the buffer with more data, taking into account
     * shuffling and other tricks for dealing with marks.
     * Assumes that it is being called by a synchronized method.
     * This method also assumes that all data has already been read in,
     * hence pos > count.
     */
    private void fill() throws IOException {
        byte[] buffer = getBufIfOpen();
    if (markpos < 0)
        pos = 0;        /* no mark: throw away the buffer */
    else if (pos >= buffer.length)    /* no room left in buffer */
        if (markpos > 0) {    /* can throw away early part of the buffer */
        int sz = pos - markpos;
        System.arraycopy(buffer, markpos, buffer, 0, sz);
        pos = sz;
        markpos = 0;
        } else if (buffer.length >= marklimit) {
        markpos = -1;    /* buffer got too big, invalidate mark */
        pos = 0;    /* drop buffer contents */
        } else {        /* grow buffer */
        int nsz = pos * 2;
        if (nsz > marklimit)
            nsz = marklimit;
        byte nbuf[] = new byte[nsz];
        System.arraycopy(buffer, 0, nbuf, 0, pos);
                if (!bufUpdater.compareAndSet(this, buffer, nbuf)) {//进行更新比较， 如果buf对象和buffer相同， 那么进行更新，不同的话，不更新
                    // Can‘t replace buf if there was an async close.
                    // Note: This would need to be changed if fill()
                    // is ever made accessible to multiple threads.
                    // But for now, the only way CAS can fail is via close.
                    // assert buf == null;
                    throw new IOException("Stream closed");
                }
                buffer = nbuf;
        }
        count = pos;
    int n = getInIfOpen().read(buffer, pos, buffer.length - pos);
        if (n > 0)
            count = n + pos;
    }

PushBackInputStream，特点是unread()方法，作用是在读取流的过程中自行添加入字节或者字节数组，进行重新读取，小说中随机插入的广告url倒是可以用这个实现，冷不丁的在读取过程中插入一个urlbyte数组，倒也方便。
ByteArrayInputStream，特点是内存操作，读取的数据全部都在缓存数组中，构造方法如下
```
public ByteArrayInputStream(byte buf[])
public ByteArrayInputStream(byte buf[], int offset, int length)
```

SequenceInputStream，构造时候能见多个流进行拼接，依次进行read，其中包含的流会自动进行关闭，在调用时候进行关闭

public int read() throws IOException {
    if (in == null) {
        return -1;
    }
    int c = in.read();
    if (c == -1) {
        nextStream();// 读完一个流以后， 自动变更下一个，但是这个方法不是线程安全的， 两个一起调，后果十分严重
        return read();
    }
    return c;
    }

/**
     *  Continues reading in the next stream if an EOF is reached.
     */
    final void nextStream() throws IOException {
    if (in != null) {
        in.close();
    }

        if (e.hasMoreElements()) {
            in = (InputStream) e.nextElement();
            if (in == null)
                throw new NullPointerException();
        }
        else in = null;

    }

StringBufferInputStream，这个类已经被废弃，原因是错误的对字节流进行向字符流的转化，忽略了编码问题。值得一提的是，这个类里基本所有部分方法都是线程安全的。swing的某个类中还引用了这个方法。
ObjectInputStream，这个类可以说的比较多

实现了两个接口，ObjectInut：定义了可以read到的类型，ObjectStreamConstants：定义了读取文件类型的常量，使用readObject时候，区分读取到的对象是什么类型，从序列化的对象进行读取时候，需要通过标志位来判断读取到的是什么对象，这个常量里面定义了这些值，都是short的。
拥有一个内部类BlockDataInputStream，这个类的作用是读取基本类型数据时候进行缓存，以提高效率，但是也产生了问题，http://www.tuicool.com/articles/v6RNNr 反序列化和序列化一定注意，建议使用read(byte[]，start，end) 替代简单的read(byte[])，使用后者的话，可能出现读取乱码，内容错误等问题，尤其是音视频，可能出现杂音，因为ObjectInputStream是根据单个字节来判断数据类型的，所以一定要准确。

时间： 2024-12-25 12:57:07

Java知识探究一:关于IO类库

Java知识探究一:关于IO类库的相关文章

Java输入、输入、IO流类层次关系梳理

Retrofit – Java(Android) 的REST 接口封装类库

Java知多少（8）类库及其组织结构

转载：Java知多少（8）类库及其组织结构

Java知识图谱(附：阿里Java学习计划)

Java知识回顾（11）异常处理

Java知识总结

Caused by: java.lang.ClassNotFoundException: flex.messaging.io.BeanProxy

Java中的NIO和IO的对比分析