Off-heap Memory in Apache Flink and the curious JIT compiler

https://flink.apache.org/news/2015/09/16/off-heap-memory.html

 

Running data-intensive code in the JVM and making it well-behaved is tricky. Systems that put billions of data objects naively onto the JVM heap face unpredictable OutOfMemoryErrors and Garbage Collection stalls. Of course, you still want to to keep your data in memory as much as possible, for speed and responsiveness of the processing applications. In that context, “off-heap” has become almost something like a magic word to solve these problems.

 

In this blog post, we will look at how Flink exploits off-heap memory.
The feature is part of the upcoming release, but you can try it out with the latest nightly builds. We will also give a few interesting insights into the behavior for Java’s JIT compiler for highly optimized methods and loops.

 

Why actually bother with off-heap memory?

Given that Flink has a sophisticated level of managing on-heap memory, why do we even bother with off-heap memory? It is true that “out of memory” has been much less of a problem for Flink because of its heap memory management techniques. Nonetheless, there are a few good reasons to offer the possibility to move Flink’s managed memory out of the JVM heap:

  • Very large JVMs (100s of GBytes heap memory) tend to be tricky. It takes long to start them (allocate and initialize heap) and garbage collection stalls can be huge (minutes). While newer incremental garbage collectors (like G1) mitigate this problem to some extend, an even better solution is to just make the heap much smaller and allocate Flink’s managed memory chunks outside the heap.
  • I/O and network efficiency: In many cases, we write MemorySegments to disk (spilling) or to the network (data transfer). Off-heap memory can be written/transferred with zero copies, while heap memory always incurs an additional memory copy.
  • Off-heap memory can actually be owned by other processes. That way, cached data survives process crashes (due to user code exceptions) and can be used for recovery. Flink does not exploit that, yet, but it is interesting future work.

Flink传统的基于‘on-heap’ 内存管理机制,已经可以解决很多的java关于‘out of memory’或gc的问题,那我们为何还要用 ‘off-heap’的技术,

1. very large的JVM会要很长的启动时间,并且gc的代价也会很大
2. heap在写磁盘或network时,至少要一次copy,而off-heap可以实现zero copy
3. off-heap内存是进程共享的,JVM进程crash不会丢失数据

 

The opposite question is also valid. Why should Flink ever not use off-heap memory?

  • On-heap is easier and interplays better with tools. Some container environments and monitoring tools get confused when the monitored heap size does not remotely reflect the amount of memory used by the process.
  • Short lived memory segments are cheaper on the heap. Flink sometimes needs to allocate some short lived buffers, which works cheaper on the heap than off-heap.
  • Some operations are actually a bit faster on heap memory (or the JIT compiler understands them better).

为何Flink不直接用off-heap memory?

越强大的东西,一般都越麻烦,

所以一般case下,用on-heap就够了

 

The off-heap Memory Implementation

Given that all memory intensive internal algorithms are already implemented against the MemorySegment, our implementation to switch to off-heap memory is actually trivial.
You can compare it to replacing allByteBuffer.allocate(numBytes) calls with ByteBuffer.allocateDirect(numBytes).
In Flink’s case it meant that we made the MemorySegment abstract and added the HeapMemorySegment and OffHeapMemorySegment subclasses.
TheOffHeapMemorySegment takes the off-heap memory pointer from a java.nio.DirectByteBuffer and implements its specialized access methods using sun.misc.Unsafe.
We also made a few adjustments to the startup scripts and the deployment code to make sure that the JVM is permitted enough off-heap memory (direct memory, -XX:MaxDirectMemorySize).

使用off-heap在内存管理机制上和使用on-heap并没有太大的区别,

相比于NIO,使用ByteBuffer.allocate(numBytes)来分配heap内存,而用ByteBuffer.allocateDirect(numBytes)来分配off-heap内存

Flink,对MemorySegment,生成两个子类,HeapMemorySegment and OffHeapMemorySegment

其中OffHeapMemorySegment,以java.nio.DirectByteBuffer的形式使用off-heap memory, 通过sun.misc.Unsafe接口来操作这些memory

 

Understanding the JIT and tuning the implementation

The MemorySegment was (before our change) a standalone class, it was final (had no subclasses). Via Class Hierarchy Analysis (CHA), the JIT compiler was able to determine that all of the accessor method calls go to one specific implementation. That way, all method calls can be perfectly de-virtualized and inlined, which is essential to performance, and the basis for all further optimizations (like vectorization of the calling loop).

With two different memory segments loaded at the same time, the JIT compiler cannot perform the same level of optimization any more, which results in a noticeable difference in performance: A slowdown of about 2.7 x in the following example:

 

这里是考虑性能优化问题,

这里提出的一个问题就是,如果MemorySegment是standalone class类,没有之类,那么会比较高效,因为编译的时候,他所调研的method都是确定的,可以提前做优化;
如果具有两个子类,那么只有到真正运行到时候才知道是哪个子类,这样就不能提前做优化;

实际测,性能的差距在2.7倍左右

解决方法:

Approach 1: Make sure that only one memory segment implementation is ever loaded.

We re-structured the code a bit to make sure that all places that produce long-lived and short-lived memory segments instantiate the same MemorySegment subclass (Heap- or Off-Heap segment). Using factories rather than directly instantiating the memory segment classes, this was straightforward.

如果在代码里面只可能实例化其中的一个子类,另一个子类根本就没有被实例化过,那么JIT会意识到,并做优化;我们可以用factories来实例化对象,这样更方便;

Approach 2: Write one segment that handles both heap and off-heap memory

We created a class HybridMemorySegment which handles transparently both heap- and off-heap memory. It can be initialized either with a byte array (heap memory), or with a pointer to a memory region outside the heap (off-heap memory).

第二种方法就是用HybridMemorySegment,同时处理heap和off-heap,这样就不需要子类
并且有tricky的方式,可以做到透明的处理两种memory

细节看原文

时间: 2024-08-26 13:10:00

Off-heap Memory in Apache Flink and the curious JIT compiler的相关文章

Flink监控:Monitoring Apache Flink Applications

This post originally appeared on the Apache Flink blog. It was reproduced here under the Apache License, Version 2.0. This blog post provides an introduction to Apache Flink’s built-in monitoring and metrics system, that allows developers to effectiv

Apache Flink 零基础入门(一):基础概念解析

作者:陈守元.戴资力 一.Apache Flink 的定义.架构及原理 Apache Flink 是一个分布式大数据处理引擎,可对有限数据流和无限数据流进行有状态或无状态的计算,能够部署在各种集群环境,对各种规模大小的数据进行快速计算. 1. Flink Application 了解 Flink 应用开发需要先理解 Flink 的 Streams.State.Time 等基础处理语义以及 Flink 兼顾灵活性和方便性的多层次 API. Streams:流,分为有限数据流与无限数据流,unbou

Apache Flink fault tolerance源码剖析完结篇

这篇文章是对Flinkfault tolerance的一个总结.虽然还有些细节没有涉及到,但是基本的实现要点在这个系列中都已提及. 回顾这个系列,每篇文章都至少涉及一个知识点.我们来挨个总结一下. 恢复机制实现 Flink中通常需要进行状态恢复的对象是operator以及function.它们通过不同的方式来达到状态快照以及状态恢复的能力.其中function通过实现Checkpointed的接口,而operator通过实现StreamOpeator接口.这两个接口的行为是类似的. 当然对于数据

Apache Flink

Flink 剖析 1.概述 在如今数据爆炸的时代,企业的数据量与日俱增,大数据产品层出不穷.今天给大家分享一款产品—— Apache Flink,目前,已是 Apache 顶级项目之一.那么,接下来,笔者为大家介绍Flink 的相关内容. 2.内容 2.1 What's Flink Apache Flink 是一个面向分布式数据流处理和批量数据处理的开源计算平台,它能够基于同一个Flink运行时(Flink Runtime),提供支持流处理和批处理两种类型应用的功能.现有的开源计算方案,会把流处

新一代大数据处理引擎 Apache Flink

https://www.ibm.com/developerworks/cn/opensource/os-cn-apache-flink/index.html 大数据计算引擎的发展 这几年大数据的飞速发展,出现了很多热门的开源社区,其中著名的有 Hadoop.Storm,以及后来的 Spark,他们都有着各自专注的应用场景.Spark 掀开了内存计算的先河,也以内存为赌注,赢得了内存计算的飞速发展.Spark 的火热或多或少的掩盖了其他分布式计算的系统身影.就像 Flink,也就在这个时候默默的发

Apache Flink流分区器剖析

这篇文章介绍Flink的分区器,在流进行转换操作后,Flink通过分区器来精确得控制数据流向. StreamPartitioner StreamPartitioner是Flink流分区器的基类,它只定义了一个抽象方法: public abstract StreamPartitioner<T> copy(); 但这个方法并不是各个分区器之间互相区别的地方,定义不同的分区器的核心在于--各个分区器需要实现channel选择的接口方法: int[] selectChannels(T record,

a堆内存与栈内存异同(Java Heap Memory vs Stack Memory Difference)

--reference Java Heap Memory vs Stack Memory Difference 在数据结构中,堆和栈可以说是两种最基础的数据结构,而Java中的栈内存空间和堆内存空间有什么异同,以及和数据结构中的堆栈有何关系? 一.Java 堆存储空间 堆内存(堆存储空间)会在Java运行时分配给对象(Object)或者JRE的类.只要我们创建了一个对象,那么在堆中肯定会分配一块存储空间给这个对象.而我们熟知的Java垃圾回收就是在堆存储空间上进行的,用以释放那些没有任何引用指向

Apache Flink fault tolerance源码剖析(四)

上篇文章我们探讨了Zookeeper在Flink的fault tolerance中发挥的作用(存储/恢复已完成的检查点以及检查点编号生成器). 这篇文章会谈论一种特殊的检查点,Flink将之命名为--Savepoint(保存点). 因为保存点只不过是一种特殊的检查点,所以在Flink中并没有太多代码实现.但作为一个特性,值得花费一个篇幅来介绍. 检查点VS保存点 使用数据流API编写的程序可以从保存点来恢复执行.保存点允许你在更新程序的同时还能保证Flink集群不丢失任何状态. 保存点是人工触发

Apache Flink fault tolerance源码剖析(一)

因某些童鞋的建议,从这篇文章开始结合源码谈谈Flink Fault Tolerance相关的话题.上篇官方介绍的翻译是理解这个话题的前提,所以如果你想更深入得了解Flink Fault Tolerance的机制,推荐先读一下前篇文章理解它的实现原理.当然原理归原理,原理体现在代码实现里并不是想象中的那么直观.这里的源码剖析也是我学习以及理解的过程. 作为源码解析Flink Fault Tolerance的首篇文章,我们先暂且不谈太有深度的东西,先来了解一下:Flink哪里涉及到检查点/快照机制来