Java I/O theory

参考文章: JAVA NIO之浅谈内存映射文件原理与DirectMemory

     Java NIO 2.0 : Memory-Mapped Files | MappedByteBuffer Tutorial

How Java I/O Works Internally at Lower Level?

1. JAVA I/O theory at lower system level

Before this post, We assume you are fmailiar with basic JAVA I/O operations.

Here is the content:

  • Buffer Handling and Kernel vs User Space
  • Virtual Memory Memory
  • Paging
  • File/Block Oriented
  • I/O File Locking
  • Stream Oriented I/O

1.1 Buffer Handling and Kernel vs User Space

Buffers and how Buffers are handled are the basis of I/O. "Input/Output" means nothing more than moving data from buffer to somewhere or move data from somewhere to the buffer in user space.

Commonly, processes send I/O requests to the OS that data in user space Buffer will be drained to buffer in kernel space(write operation), and the OS performs an incredilby complex transfer. Here is data flow diagram:

The image above shows s simplified "logical" diagram of how block data moves from an external device, such as a hard disk, to user space memory. Firstly, the process send system call read(), and kernel catch the call and issuing a command to the disk controller to fetch data from disk. The disk controller writes the data directly into kernel memory buffer by DMA. Then the kernel copies the data from temporary buffer in kernel space to the buffer in the user space. Write operation is similar to read operation.

After first read operation, the kernel will cache and/or perfetch data, so the data you request may already in the kernel space. You can try read a big file 3 times, you will find that second and thrid time is far more fast than the first. Here is an example:

    static void CpFileStreamIO () throws IOException {
        String inFileStr = "***kimchi_v2.pdf";
        String outFileStr = "./kimchi_v2.pdf";
        long startTime, elapsedTime; // for speed benchmarking
        int bufferSizeKB = 4;
        int bufferSize = bufferSizeKB * 1024;

        int repeatedTimes = 5;
        System.out.println("Using Buffered Stream");
        try (BufferedInputStream in = new BufferedInputStream(new FileInputStream(inFileStr));
            BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(outFileStr))) {
            for (int i = 0; i < repeatedTimes; i++) {
                copyFile(in, out);
            }
        } catch (IOException ex) {
            ex.printStackTrace();
        }

    }

    private static void copyFile(BufferedInputStream in, BufferedOutputStream out) throws IOException {
        long startTime;
        long elapsedTime;
        startTime = System.nanoTime();
        int bytescount;
        while ((bytescount = in.read()) != -1) {
            out.write(bytescount);
        }
        elapsedTime = System.nanoTime() - startTime;
        System.out.println("Elapsed time is " + (elapsedTime / 1000000.0) + " msec");
    }

Firstly we create an BufferedInputStream and BufferedOutputStream instance then we copy strings in file from BufferedInputStream to BufferedOutputStream 5 times. Here is the console output:

Using Buffered Stream
Elapsed time is 85.175 msec
Elapsed time is 0.005 msec
Elapsed time is 0.003 msec
Elapsed time is 0.003 msec
Elapsed time is 0.004 msec

In this example, the first time cost is far more longer because kernel need to read data from harddisk, while from the second time, user space can fetch data from buffers in kernel space.

1.2 Virtual Memory

Virtual memory means that virtual addresses are used more than pgysical memory(RAM or other internal storage) . Virtual memory brings 2 advantages:

1. More than one virtual address can be mapped to the same physical memory location, Which can reduce redundance of data in memory.

2. Virtual memory space can be larger than the physical memory space available. For example a user process can allocate 4G memory even the RAM is 1G only.

So to transfering data between user space and kernel space, we can only map the physical address of the virtual address in kernel space to the virtual address in kernel. For DMA (which can access only physical memory addresses) can fill a buffer that is simultaneousy visible to both the kernel and a user space process. This eliminates copies between kernel and user space, but the kernel and user buffer should share teh same page alignment. Buffers must also be a multiple of the block size used by disk controller(512bytes usually). And the virtual and physical memory are divided into pages, and the virtual and physical memory page sizes are always the same.

1.3 Memory Paging

Aligning memory page sizes as mutiples of the disk block size allows the kernel to directly command the disk controller the write pages back to disks and reload them from disks. And all disk I/O are done at the page level. Modern CUPs contain a subsystem known as teh Memory Mangement Unit(MMU). This device logically sits between the CPU and physical memory. CPU needs it‘s mapping information needed to translate virtual addresses to physical memory addresses.

1.4 File I/O

File I/O always occures within the context of a filesystem. Filesystem is quite different concept from disk. Filesystem is an high level of abstracion. Filesystem is particular method of arranging and interpreting data. Our processes are always interacts with fs, not the disk directly. It defines the concept of names, paths, files, directories and other abstact object.

A filesystem organizes (in hard disk) a sequence of uniformy sized data blocks. Some blocks store inodes about where to find meta data, and other store real data. Adnd filesystems pages sizes range from 2KB to *KB as multiples of memory page size.

Here is the process to find a file from file system:

1. Determine which filesystem page needed(according the path of file, if an file path has  "/root" as prefix mean to find the file in disks mounted as "/root" mountpoint )

2. Allocate enough memory pages in kernel space to hold the identified fileysystem pages.

3. Establish mappings between memory pages and the filesystem pages stored in disk.

4. Instructions runs in CPU may need code in a virtual address which is in memory(MMU find the page desired not in memory), then CPU raise page faults for each of those memory pages.

5. Linux operating system will allocate more pages to the process, filling those pages with data from disk, configuring the MMU, and CPU continue works.

The Filesystem data is cached libk other memory pages. On subsequent I/O requests, some or all of the file data may still be present in physical and can be reused without rereading from disk. Just like the coying file example in 1.1.

1.4 FIle Locking

File locking is a scheme in which a process can prevent others from accessing data stored in the private space.

1.5 Stream I/O

Not all I/O is block-oriented, There is also stream I/O modled as on a pipeline. The bytes of an I/O stream must be accessed sequentially. TTY(console) devices, print ports, and network connections are common examples of streams.

Streams are generally, but not necessarily, slower than block devices and are often as the source of the intermittent input. Most OS allows streams to be placed into non-blocking mode, which permits a process to check if input is avavible on the stream without getting stuck to waiting input.

Another ability is for stream is readiness selection. This is similiar to non-blocking mode, but offloads the check for whether the process is ready to the operating system. The operating system can be told to watch a collection of streams and returns an indication to the process about which streams are ready.This ability permits a process to multiplex many active streams using common code and a single thread by leveraging the readiness information returned by the operating system. This is widely used in network servers to handle large numbers of network connections. Readiness selection is essential for high-volume scaling.

时间: 2024-12-19 22:09:53

Java I/O theory的相关文章

java 二叉树的遍历 为什么只给出前序以及后序遍历,不能生成唯一的二叉树

最近在学习java的数据结构与算法知识,看到数据结构 树的遍历的方式.在理解过程中.查看到一篇文章,视野非常有深度,在信息论的角度看待这个问题.在此贴出该文章的链接以及内容. [文章出处]http://www.binarythink.net/2012/12/binary-tree-info-theory/ 我们在学习二叉树的遍历时,都会不可避免的学到二叉树的三种遍历方式,分别是遵循(根-左-右)的前序遍历.遵循(左-根-右)的中序遍历以及遵循(左-右-根)的后序遍历.并且每一个二叉树都可以用这三

Intro to Jedis – the Java Redis Client Library

转自:http://www.baeldung.com/jedis-java-redis-client-library 1. Overview This article is an introduction to Jedis, a client library in Java for Redis – the popular in-memory data structure store that can persist on disk as well. It is driven by a keyst

深入理解Java内存模型-volatile

volatile的特性 当我们声明共享变量为volatile后,对这个变量的读/写将会很特别.理解volatile特性的一个好方法是:把对volatile变量的单个读/写,看成是使用同一个监视器锁对这些单个读/写操作做了同步.下面我们通过具体的示例来说明,请看下面的示例代码: class VolatileFeaturesExample { volatile long vl = 0L; //使用volatile声明64位的long型变量 public void set(long l) { vl =

Java多线程编程模式实战指南(二):Immutable Object模式--转载

本文由本人首次发布在infoq中文站上:http://www.infoq.com/cn/articles/java-multithreaded-programming-mode-immutable-object.转载请注明作者: 黄文海 出处:http://viscent.iteye.com. 多线程共享变量的情况下,为了保证数据一致性,往往需要对这些变量的访问进行加锁.而锁本身又会带来一些问题和开销.Immutable Object模式使得我们可以在不使用锁的情况下,既保证共享变量访问的线程安

Java多线程编程模式实战指南(一):Active Object模式--转载

本文由黄文海首次发布在infoq中文站上:http://www.infoq.com/cn/articles/Java-multithreaded-programming-mode-active-object-part1 .转载请注明作者: 黄文海 出处:http://viscent.iteye.com. Active Object模式简介 Active Object模式是一种异步编程模式.它通过对方法的调用与方法的执行进行解耦来提高并发性.若以任务的概念来说,Active Object模式的核心

[java]寻找最优线程数

1.前言  最近被问到一个问题,"我用java写了一个用到多线程的功能,但是线程数应该多少个比较好呢?".这个问题以前听的版本有:"CPU核心数的2倍","和CPU核心数一样","CPU核心数加1".但是因为一个"懒"字将这个问号埋在了心底.为了给这个故事画上一个完美的句号,所以就有了这篇博文. 2.线程定义    线程(英语:thread)是操作系统能够进行运算调度的最小单位.它被包含在进程之中,是进程中

On Memory Leaks in Java and in Android.

from:http://chaosinmotion.com/blog/?p=696 Just because it's a garbage collected language doesn't mean you can't leak memory or run out of it. Especially on Android where you get so little to begin with. Now of course sometimes the answer is that you

java线程阻塞中断与LockSupport使用介绍

上周五和周末,工作忙里偷闲,在看java cocurrent中也顺便再温故了一下Thread.interrupt和java 5之后的LockSupport的实现. 在介绍之前,先抛几个问题. Thread.interrupt()方法和InterruptedException异常的关系?是由interrupt触发产生了InterruptedException异常? Thread.interrupt()会中断线程什么状态的工作? RUNNING or BLOCKING? 一般Thread编程需要关注

Java Secure Socket Extension (JSSE) Reference Guide

Skip to Content Oracle Technology Network Software Downloads Documentation Search Java Secure Socket Extension (JSSE) Reference Guide This guide covers the following topics: Skip Navigation Links Introduction Features and Benefits JSSE Standard API S