IO的详细解释:It's all about buffers: zero-copy, mmap and Java NIO

There are use cases where data need to be read from source to a sink without modification. In code this might look quite simple: for example in Java, you may read data from one InputStream chunk by chunk into a small buffer (typically 8KB), and feed them into the OutputStream, or even better, you could create a PipedInputStream, which is basically just a util that maintains that buffer for you. However, if low latency is crucial to your software, this might be quite expensive from the OS perspective and I shall explain.

What happens under the hood

Well, here’s what happens when the above code is used:

  1. JVM sends read() syscall.
  2. OS context switches to kernel mode and reads data into the input socket buffer.
  3. OS kernel then copies data into user buffer, and context switches back to user mode. read() returns.
  4. JVM processes code logic and sends write() syscall.
  5. OS context switches to kernel mode and copies data from user buffer to output socket buffer.
  6. OS returns to user mode and logic in JVM continues.

This would be fine if latency and throughput aren’t your service’s concern or bottleneck, but it would be annoying if you do care, say for a static asset server. There are 4 context switches and 2 unnecessary copies for the above example.

OS-level zero copy for the rescue

Clearly in this use case, the copy from/to user space memory is totally unnecessary because we didn’t do anything other than dumping data to a different socket. Zero copy can thus be used here to save the 2 extra copies. The actual implementation doesn’t really have a standard and is up to the OS how to achieve that. Typically *nix systems will offer sendfile(). Its man page can be found here. Some say some operating systems have broken versions of that with one of them being OSX link. Honestly with such low-level feature, I wouldn’t trust Apple’s BSD-like system so never tested there.

With that, the diagram would be like this:

You may say OS still has to make a copy of the data in kernel memory space. Yes but from OS’s perspective this is already zero-copy because there’s no data copied from kernel space to user space. The reason why kernel needs to make a copy is because general hardware DMA access expects consecutive memory space (and hence the buffer). However this is avoidable if the hardware supports scatter-n-gather:

A lot of web servers do support zero-copy such as Tomcat and Apache. For example apache’s related doc can be found here but by default it’s off.

Note: Java’s NIO offers this through transferTo (doc).

mmap

The problem with the above zero-copy approach is that because there’s no user mode actually involved, code cannot do anything other than piping the stream. However, there’s a more expensive yet more useful approach - mmap, short for memory-map.

Mmap allows code to map file to kernel memory and access that directly as if it were in the application user space, thus avoiding the unnecessary copy. As a tradeoff, that will still involve 4 context switches. But since OS maps certain chunk of file into memory, you get all benefits from OS virtual memory management - hot content can be intelligently cached efficiently, and all data are page-aligned thus no buffer copying is needed to write stuff back.

However, nothing comes for free - while mmap does avoid that extra copy, it doesn’t guarantee the code will always be faster - depending on the OS implementation, there may be quite a bit of setup and teardown overhead (since it needs to find the space and maintain it in the TLB and make sure to flush it after unmapping) and page fault gets much more expensive since kernel now needs to read from hardware (like disk) to update the memory space and TLB. Hence, if performance is this critical, benchmark is always needed as abusing mmap() may yield worse performance than simply doing the copy.

The corresponding class in Java is MappedByteBuffer from NIO package. It’s actually a variation of DirectByteBuffer though there’s no direct relationship between classes. The actual usage is out of scope of this post.

NIO DirectByteBuffer

Java NIO introduces ByteBuffer which represents the buffer area used for channels. There are 3 main implementations of ByteBuffer:

  1. HeapByteBuffer

    This is used when ByteBuffer.allocate() is called. It’s called heap because it’s maintained in JVM’s heap space and hence you get all benefits like GC support and caching optimization. However, it’s not page aligned, which means if you need to talk to native code through JNI, JVM would have to make a copy to the aligned buffer space.

  2. DirectByteBuffer

    Used when ByteBuffer.allocateDirect() is called. JVM will allocate memory space outside the heap space using malloc(). Because it’s not managed by JVM, your memory space is page-aligned and not subject to GC, which makes it perfect candidate for working with native code (e.g. when writing OpenGL stuff). However, you are then “deteriorated” to C programmer as you’ll have to allocate and deallocate memory yourself to prevent memory leak.

  3. MappedByteBuffer

    Used when FileChannel.map() is called. Similar to DirectByteBuffer this is also outside of JVM heap. It essentially functions as a wrapper around OS mmap() system call in order for code to directly manipulate mapped physical memory data.

Conclusion

sendfile() and mmap() offer efficient, low-latency low-level solutions to data manipulation across sockets. Again, no code should assume these are silver bullets as real world scenarios may be complex and it might not be worth the effort to switch code to them if this is not the true bottleneck. For software engineering to get the most ROI, in most cases, it’s better to “make it right” and then “make it fast”. Without the guardrails offered by JVM, it’s easy to make software much more vulnerable to crashing (I literally mean crashing, not exceptions) when it comes to complicated logic.

https://xunnanxu.github.io/2016/09/10/It-s-all-about-buffers-zero-copy-mmap-and-Java-NIO/

IO的详细解释:It's all about buffers: zero-copy, mmap and Java NIO

原文地址:https://www.cnblogs.com/feng9exe/p/10737378.html

时间: 2024-10-26 13:29:14

IO的详细解释:It's all about buffers: zero-copy, mmap and Java NIO的相关文章

Java Nio 十四、Java NIO vs. IO

最后更新时间:2014-06-23 当学习Java NIO和IO的API的时候,一个问题很快的就会出现中我们的脑中: 我什么时候应该使用IO,什么时候应该使用NIO? 在这篇文章中我将会尝试着写出中NIO和IO之间不同的地方,他们的使用场景,以及他们怎么影响你的代码设计. Java NIO和IO的主要不同 下面的表格总结了Java NIO和IO的主要不同.针对这个表格中的不同点我将会给予更加详细的说明. IO NIO 基于流的 基于缓冲区的 堵塞IO 非堵塞IO   Selectors(选择器)

Java NIO和IO的区别(转)

原文链接:Java NIO和IO的区别 下表总结了Java NIO和IO之间的主要差别,我会更详细地描述表中每部分的差异. 复制代码代码如下: IO                NIO面向流            面向缓冲阻塞IO            非阻塞IO无                选择器 面向流与面向缓冲 Java NIO和IO之间第一个最大的区别是,IO是面向流的,NIO是面向缓冲区的. Java IO面向流意味着每次从流中读一个或多个字节,直至读取所有字节,它们没有被缓存在

java nio(non-blocking io)简介及和io

在 Java1.4之前的I/O系统中,提供的都是面向流的I/O系统,系统一次一个字节地处理数据,一个输入流产生一个字节的数据,一个输出流消费一个字节 的数据,面向流的I/O速度非常慢,而在Java 1.4中推出了NIO,这是一个面向块的I/O系统,系统以块的方式处理处理,每一个操作在一步中产生或者消费一个数据库,按块处理要比按字节处理数据快 的多. 在NIO中有几个核心对象需要掌握:缓冲区(Buffer).通道(Channel).选择器(Selector). 缓冲区Buffer 缓 冲区实际上是

Android中多线程编程(四)AsyncTask类的详细解释(附源码)

Android中多线程编程中AsyncTask类的详细解释 1.Android单线程模型 2.耗时操作放在非主线程中执行 Android主线程和子线程之间的通信封装类:AsyncTask类 1.子线程中更新UI 2.封装.简化异步操作. 3.AsyncTask机制:底层是通过线程池来工作的,当一个线程没有执行完毕,后边的线程是无法执行的.必须等前边的线程执行完毕后,后边的线程才能执行. AsyncTask类使用注意事项: 1.在UI线程中创建AsyncTask的实例 2.必须在UI线程中调用As

Android客户端请求服务器端的详细解释

Android客户端请求服务器端的详细解释 1. Android客户端与服务器端通信方式: Android与服务器通信通常采用HTTP通信方式和Socket通信方式,而HTTP通信方式又分get和post两种方式. 2. 解析服务器端返回数据的解释: (1).对于服务器端来说,返回给客户端的数据格式一般分为html.xml和json这三种格式. (2). JSON(Javascript Object Notation)是一种轻量级的数据交换格式,相比于xml这种数据交换格式来说,因为解析xml比

我对CONTAINING_RECORD宏的详细解释

宏CONTAINING_RECORD的用处其实还是相当大的, 而且很是方便, 它的主要作用是: 根据结构体中的某成员的指针来推算出该结构体的指针! 下面从一个简单的例子开始说起: 我们定义一个结构体, 同时类型化: typedef struct{ int a; int b; int c; }ss; 这是一个很简单的结构体, 没什么特殊的, 稍微分析下该结构体: 结构体的大小(字节):4+4+4=12字节 成员a的偏移:0 成员b的偏移:4 成员c的偏移:8 我们用ss来定义一个变量: ss s

Atitit .jvm 虚拟机指令详细解释

Atitit .jvm 虚拟机指令详细解释 1. 一.未归类系列A1 2. 数据mov系列2 2.1. 二.const系列2 2.2. 三.push系列2 2.3. ldc系列 该系列命令负责把数值常量或String常量值从常量池中推送至栈顶.3 2.4. 5.1.load系列A 该系列命令负责把本地变量的送到栈顶.3 2.5. 5.2.load系列B 该系列命令负责把数组的某项送到栈顶.4 2.6. 6.1.store系列A 该系列命令负责把栈顶的值存入本地变量.5 2.7. 6.2.stor

Sed命令的使用详细解释

Sed命令的使用详细解释 一:sed命令的简介 sed是一种在线编辑器,它一次处理一行内容.处理时,把当前处理的行存储在临时缓冲区中,称为"模式空间"(pattern space),接着用sed命令处理缓冲区中的内容,处理完成后,把缓冲区的内容送往屏幕.接着处理下一行,这样不断重复,直到文件末尾.文件内容并没有改变,除非你使用重定向存储输出.Sed主要用来自动编辑一个或多个文件:简化对文件的反复操作:编写转换程序等.     二:Sed的用法格式 Sed [options] 'scri

设计模式 - 迭代模式(iterator pattern) Java 迭代器(Iterator) 详细解释

迭代模式(iterator pattern) Java 迭代器(Iterator) 详细解释 本文地址: http://blog.csdn.net/caroline_wendy 參考迭代器模式(iterator pattern): http://blog.csdn.net/caroline_wendy/article/details/35254643 Java的标准库(util)中包括迭代器接口(iterator interface), import java.util.Iterator; 继承