DTrace to Troubleshoot Java Native Memory Problems

How to Use DTrace to Troubleshoot Java Native Memory Problems on Oracle Solaris 11

Hands-On Labs of the System Admin and Developer Community of OTN

by Wang Yu, Vincent Liu, and Gary Wang

This lab will introduce the basic concepts of DTrace and provide exercises for using DTrace to identify common Java native memory problems, such as direct buffer overflow, Inflater/Deflater memory leak, and thread stack overflow.

Published February 2014


Want to comment on this article? Post the link on Facebook‘s OTN Garage page.  Have a similar article to share? Bring it up on Facebook or Twitterand let‘s discuss.

Table of Contents
Introduction
Intended Audience
System Requirements
Prerequisites
Notes for Users
Installation and Configuration (15 Minutes)
Exercise 1: Introduction to DTrace (30 Minutes)
Exercise 2: Identifying and Solving "Inflater/Deflater Memory Leak" Problems (30 Minutes)
Exercise 3: Identifying and Solving "Cannot Create Native Threads" Problems (40 Minutes)
Exercise 4: Identifying and Solving "Direct ByteBuffer Memory Leak" Problems (40 Minutes)
Summary
See Also
About the Authors

Introduction

DTrace is a comprehensive dynamic tracing framework for the Oracle Solaris operating system. DTrace provides a powerful infrastructure for troubleshooting kernel and application problems in production environments. It allows administrators and developers to concisely answer arbitrary questions about the behavior of user applications and operating systems.

Java native memory leaks are one of the hardest technical issues to solve. JProfiler, VisualVM, heap dump, and thread dump tools have limited capabilities for handling this kind of issue. Google Performance Tools are very popular for identifying issues caused by native memory allocation using malloc. These tools provide a high-performance multithreaded malloc() implementation, with checkpoints for collecting needed information.

The DTrace facility provided by Oracle Solaris 11 is able to detect and troubleshoot various Java native memory problems. Moreover, DTrace is easy to learn and lightweight, and it can be safely used in a production environment.

This lab will introduce the basic concepts of DTrace and provide exercises for using DTrace to identify common Java native memory problems, such as direct buffer overflow, Inflater/Deflater memory leak, and thread stack overflow. It will take you approximately 155 minutes to finish this lab.

Intended Audience

This hands-on lab assumes you have some basic knowledge about the following technologies:

  • Java language programming
  • Oracle Solaris or a similar UNIX or Linux OS

System Requirements

Ensure your system meets the following requirements:

  • Operating system: Solaris 11.1
  • Memory: 2 GB
  • Available disk space: 20 GB

Prerequisites

Please download the following software.

Then go here for instructions on how to create the directory structure, .d files, .java files, and build.xml files used in the lab exercises.

Notes for Users

Be aware of the following:

  • For this lab, the GNOME desktop environment is preferred over Oracle Solaris 11 (with desktop packages installed).
  • In order to open a terminal window in GNOME, right-click any point on the background of the desktop, and selectOpen Terminal in the pop-up menu.
  • The following source code editors are provided by the Oracle Solaris 11 desktop environment:
    • vi (type vi in a terminal window)
    • gedit (type gedit in a terminal window)

Installation and Configuration (15 Minutes)

In this section, we are going to prepare our hands-on lab environment. Before you start, make sure you have downloaded the required software, as described in the Prerequisites section.

Install JDK 7

Use the following procedure to install JDK 7 on Oracle Solaris.

  1. Open a command-line console, and move the jdk-7u45-solaris-i586.gz (or later) file you downloaded to the directory where you want to install the JDK.
  2. Uncompress the file, for example using a command similar to the following:
    $ tar -xvfz jdk-7u45-solaris-i586.gz
    

    This command creates a directory named jdk1.7.0_45 in which the JDK is installed.

  3. Add <JDK_7_Installation>/bin to your PATH variable:

    For example, for an Oracle Solaris system, type the following (if you are using the bash shell), but replace<JDK_7_Installation> with the directory that corresponds to your environment:

    $ export PATH=<JDK_7_Installation>/bin:$PATH
    
  4. Add <JDK_7_Installation> to your JAVA_HOME variable:

    For example, for an Oracle Solaris system, type the following (if you are using the bash shell), but replace<JDK_7_Installation> with the directory that corresponds to your environment:

    $ export JAVA_HOME=<JDK_7_Installation>
    

    Note: The variable setting above affects only the current console; putting it in your profile file ($HOME/.profile for the bash shell) will make this variable available for all the consoles.

Install Ant

Use the following procedure to install Ant on Oracle Solaris.

  1. Open a command-line console and move the apache-ant-1.9.2-bin.zip (or later) file you downloaded to the directory where you want to install Ant.
  2. Unzip the file, for example, by typing a command similar to the following:
    $ unzip apache-ant-1.9.2-bin.zip
    

    This unzip command creates a directory called apache-ant-1.9.2 into which Ant is installed.

  3. Add <ant_Installation>/bin to your PATH variable, and add <ant_Installation> to your ANT_HOME variable:

    For example, for an Oracle Solaris system, type the following (if you are using the bash shell), but replace<ant_Installation> with the directory that corresponds to your environment:

    $ export PATH=<ant_Installation>/bin:$PATH
    $ export ANT_HOME=<ant_Installation>
    

    Note: The variable settings above affect only the current console; putting them in your profile file ($HOME/.profile for the bash shell) will make these variables available for all the consoles.

Verify the Installations

  1. Open a command-line console, and type the jps command. If you see a Java process list similar to the following, the JDK is installed successfully.

    $ jps
    1935 Jps
    1859 App
    
  2. Open a command-line console, and type the ant command. If you see output similar to the following, Ant is installed successfully.
    $ ant
    Buildfile: build.xml does not exist!
    Build failed
    

Exercise 1: Introduction to DTrace (30 Minutes)

In this exercise, we are going to learn basic concepts about D language. The D script files used in this exercise are located at <lab_root>/dtrace_javanative/DScripts.

Background Information: What Is DTrace?

DTrace is a comprehensive dynamic tracing facility that is built into Oracle Solaris and can be used by administrators and developers on live production systems to examine the behavior of both user programs and the operating system itself. DTrace enables you to explore your system to understand how it works, track down performance problems across many layers of software, or locate the cause of aberrant behavior. DTrace lets you create your own custom programs to dynamically instrument the system and provide immediate, concise answers to arbitrary questions you can formulate using the DTrace D programming language.

A D program source file consists of one or more probe clauses that describe the instrumentation to be enabled by DTrace. Each probe clause has the following general form:

probe descriptions
/ predicates /
{
     action statements;
}

Exercise 1, Step 1: Write a ‘Hello, world!‘ Program

  1. After logging in, click the right mouse button on the desktop and choose Open Terminal to bring up a terminal window.
  2. DTrace execution requires root user privilege, so switch to root before trying DTrace commands:
    [email protected]:~$ su
    Password:
    [email protected]:~#
    
  3. Type dtrace to print the different options for the DTrace command:
    [email protected]:~# dtrace
    Usage: dtrace [-32|-64] [-aACeFGhHlqSvVwZ] [-b bufsz] [-c cmd] [-D name[=def]]
        [-I path] [-L path] [-o output] [-p pid] [-s script] [-U name]
        [-x opt[=val]] [-X a|c|s|t]
    
        [-P provider [[ predicate ] action ]]
        [-m [ provider: ] module [[ predicate ] action ]]
        [-f [[ provider: ] module: ] func [[ predicate ] action ]]
        [-n [[[ provider: ] module: ] func: ] name [[ predicate ] action ]]
        [-i probe-id [[ predicate ] action ]] [ args ... ]
    
        predicate -> ‘/‘ D-expression ‘/‘
           action -> ‘{‘ D-statements ‘}‘
    
        -32 generate 32-bit D programs and ELF files
        -64 generate 64-bit D programs and ELF files
    
        -a  claim anonymous tracing state
        -A  generate driver.conf(4) directives for anonymous tracing
        -b  set trace buffer size
        -c  run specified command and exit upon its completion
        -C  run cpp(1) preprocessor on script files
        -D  define symbol when invoking preprocessor
        -e  exit after compiling request but prior to enabling probes
        -f  enable or list probes matching the specified function name
        -F  coalesce trace output by function
        -G  generate an ELF file containing embedded dtrace program
        -h  generate a header file with definitions for static probes
        -H  print included files when invoking preprocessor
        -i  enable or list probes matching the specified probe id
        -I  add include directory to preprocessor search path
        -l  list probes matching specified criteria
        -L  add library directory to library search path
        -m  enable or list probes matching the specified module name
        -n  enable or list probes matching the specified probe name
        -o  set output file
        -p  grab specified process-ID and cache its symbol tables
        -P  enable or list probes matching the specified provider name
        -q  set quiet mode (only output explicitly traced data)
        -s  enable or list probes according to the specified D script
        -S  print D compiler intermediate code
        -U  undefine symbol when invoking preprocessor
        -v  set verbose mode (report stability attributes, arguments)
        -V  report DTrace API version
        -w  permit destructive actions
        -x  enable or modify compiler and tracing options
        -X  specify ISO C conformance settings for preprocessor
        -Z  permit probe descriptions that match zero probes
    
  4. Now let‘s try to write a "Hello, world!" program using D language:
    [email protected]:~# dtrace -n ‘BEGIN {trace("hello,world");exit(0);}‘
    dtrace: description ‘BEGIN ‘ matched 1 probe
     CPU     ID                    FUNCTION:NAME
       0      1                           :BEGIN   hello,world
    

    Alternatively, you can compose a script like the <lab_root>/dtrace_javanative/DScripts/ex1-hello.d file:

    BEGIN
    {
        trace("hello, world");
        exit(0);
    }
    

    Then run the script file:

    [email protected]:~# dtrace -s ex1-hello.d
    dtrace: script ‘ex1-hello.d‘ matched 1 probe
     CPU     ID                    FUNCTION:NAME
       0      1                           :BEGIN   hello,world
    

Exercise 1, Step 2: Explore Probes

A probe is a location or activity to which DTrace can bind a request to perform a set of actions, such as recording a stack trace, a time stamp, or the argument to a function. Probes are like programmable sensors scattered all over your Oracle Solaris system in interesting places. If you want to figure out what‘s going on, you use DTrace to program the appropriate sensors to record the information that is of interest to you.

  1. Let‘s see what kinds of probes there are in Oracle Solaris 11.1 by running the following command:

    [email protected]:~# dtrace -l | more
       ID   PROVIDER            MODULE                          FUNCTION NAME
        1     dtrace                                                     BEGIN
        2     dtrace                                                     END
        3     dtrace                                                     ERROR
        4 javascript2010         libxul.so __1cCjsHExecute6FpnJJSContext_pnIJSScript_
    rnIJSObject_pnCJSFValue__b_ execute-done
        5 javascript2010         libxul.so __1cCjsNExecuteKernel6FpnJJSContext_pnIJSS
    cript_rnIJSObject_rknCJSFValue_n0ALExecuteType_pn0AKStackFrame_p8_b_ execute-done
        6 javascript2010         libxul.so __1cCjsHExecute6FpnJJSContext_pnIJSScript_
    rnIJSObject_pnCJSFValue__b_ execute-start
        7 javascript2010         libxul.so __1cCjsNExecuteKernel6FpnJJSContext_pnIJSS
    cript_rnIJSObject_rknCJSFValue_n0ALExecuteType_pn0AKStackFrame_p8_b_ execute-start
        8 nfsmapid1010          nfsmapid                  cb_update_domain daemon-domain
        9 kerberos995    mech_krb5.so.1                         k5_mk_rep krb_ap_rep-make
       10 kerberos995    mech_krb5.so.1                       krb5_rd_rep krb_ap_rep-read
       11 kerberos995    mech_krb5.so.1              krb5_mk_req_extended krb_ap_req-make
       12 kerberos995    mech_krb5.so.1                rd_req_decoded_opt krb_ap_req-read
       13 kerberos995    mech_krb5.so.1                     krb5_mk_ncred krb_cred-make
       14 kerberos995    mech_krb5.so.1                krb5_rd_cred_basic krb_cred-read
       15 kerberos995    mech_krb5.so.1                     krb5_mk_error krb_error-make
       16 kerberos995    mech_krb5.so.1                     krb5_rd_error krb_error-read
       17 kerberos995    mech_krb5.so.1               krb5_encode_kdc_rep krb_kdc_rep-make
    --More--
    

    Every probe in DTrace has two names: a unique integer ID and a human-readable string name that is composed of four parts, shown as separate columns in the dtrace output. When writing out the full human-readable name of a probe, write all four parts of the name separated by colons like this:

    provider:module:function:name
    
  2. Type the following command to determine how many probes are available:
    [email protected]:~# dtrace -l | wc -l
       81183
    

    Please note that the number of probes varies depending on your OS and the software you have installed.

  3. In order to locate a DTrace probe, we can also use the following options to filter the list: 
    • -P for provider
    • -m for module
    • -f for function
    • -n for name

    For instance, if you want to list all the probes from the syscall provider, run the following command:

    [email protected]:~# dtrace -l -P syscall | more
       ID   PROVIDER            MODULE                          FUNCTION NAME
     7067    syscall                                               nosys entry
     7068    syscall                                               nosys return
     7069    syscall                                               rexit entry
     7070    syscall                                               rexit return
     7071    syscall                                                read entry
     7072    syscall                                                read return
     7073    syscall                                               write entry
     7074    syscall                                               write return
     7075    syscall                                               close entry
     7076    syscall                                               close return
     7077    syscall                                              linkat entry
     7078    syscall                                              linkat return
     7079    syscall                                           symlinkat entry
     7080    syscall                                           symlinkat return
     7081    syscall                                               chdir entry
     7082    syscall                                               chdir return
     7083    syscall                                               gtime entry
     7084    syscall                                               gtime return
     7085    syscall                                                 brk entry
     7086    syscall                                                 brk return
     7087    syscall                                               lseek entry
    --More--
    

Exercise 1, Step 3: Explore Predicates and Actions

Predicates are expressions enclosed between slashes (/ and /) that are evaluated at probe-firing time to determine whether the associated actions should be executed. Predicates are the primary conditional construct used for building more complex control flow in a D program. You can omit the predicate section of the probe clause entirely for any probe, in which case the actions are always executed when the probe fires.

Probe actions are described by a list of statements separated by semicolons (;) and enclosed between braces ({ and }). If you want to note only that a particular probe fired on a particular CPU without tracing any data or performing any additional actions, you can specify an empty set of braces with no statements inside.

  1. Now you will use a script to list all system calls, excluding write, for all the processes running on your current system. See the following code in <lab_root>/dtrace_javanative/DScripts/ex1-predicate-1.d:

    syscall:::
    /probefunc!="write"/
    {
        printf("probefunc:%s, pid:%d, execname:%s\n",probefunc, pid, execname);
    }
    
  2. Run ex1-predicate-1.d. You might get a result similar to the following. Press Ctrl-c to stop this program.
    [email protected]:~# dtrace -qs ex1-predicate-1.d
    ^C
    probefunc:ioctl, pid:2084, execname:dtrace
    probefunc:ioctl, pid:2084, execname:dtrace
    probefunc:ioctl, pid:2084, execname:dtrace
    probefunc:ioctl, pid:2084, execname:dtrace
    probefunc:portfs, pid:1829, execname:nautilus
    probefunc:portfs, pid:1829, execname:nautilus
    probefunc:portfs, pid:1829, execname:nautilus
    probefunc:portfs, pid:1829, execname:nautilus
    probefunc:fstatat64, pid:1829, execname:nautilus
    probefunc:fstatat64, pid:1829, execname:nautilus
    probefunc:clock_gettime, pid:1829, execname:nautilus
    probefunc:clock_gettime, pid:1829, execname:nautilus
    probefunc:recv, pid:1829, execname:nautilus
    probefunc:recv, pid:1829, execname:nautilus
    probefunc:clock_gettime, pid:1829, execname:nautilus
    probefunc:clock_gettime, pid:1829, execname:nautilus
    probefunc:pollsys, pid:1829, execname:nautilus
    probefunc:mmap, pid:2084, execname:dtrace
    probefunc:mmap, pid:2084, execname:dtrace
    probefunc:lwp_park, pid:2084, execname:dtrace
    probefunc:recv, pid:1871, execname:VBoxClient
    probefunc:recv, pid:1871, execname:VBoxClient
    probefunc:nanosleep, pid:1871, execname:VBoxClient
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:nanosleep, pid:1871, execname:VBoxClient
    probefunc:recv, pid:1871, execname:VBoxClient
    probefunc:recv, pid:1871, execname:VBoxClient
    probefunc:nanosleep, pid:1871, execname:VBoxClient
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_signal, pid:1844, execname:java
    probefunc:lwp_cond_signal, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_signal, pid:1844, execname:java
    probefunc:lwp_cond_signal, pid:1844, execname:java
    
  3. In order to discover all of the no-write system calls caused by Java, we need to update the original file to be like ex1-predicate-2.d, as shown below:
    syscall:::
    /probefunc!="write" && execname=="java"/
    {
        printf("probefunc:%s, pid:%d, execname:%s\n",probefunc, pid, execname);
    }
    
  4. Run ex1-predicate-2.d. You will get results similar to this:
    [email protected]:~# dtrace -qs ex1-predicate-2.d
    ^C
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_signal, pid:1844, execname:java
    probefunc:lwp_cond_signal, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_signal, pid:1844, execname:java
    probefunc:lwp_cond_signal, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_signal, pid:1844, execname:java
    probefunc:lwp_cond_signal, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_signal, pid:1844, execname:java
    probefunc:lwp_cond_signal, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_wait, pid:1844, execname:java
    probefunc:lwp_cond_signal, pid:1844, execname:java
    probefunc:lwp_cond_signal, pid:1844, execname:java
    

Exercise 1, Step 4: Explore Aggregations

When instrumenting the system to answer performance-related questions, it is useful to consider how data can be aggregated to answer a specific question rather than thinking in terms of data gathered by individual probes. For example, if you wanted to know the number of system calls by user ID, you would not necessarily care about the data collected at each system call. You simply want to see a table of user IDs and system calls. In DTrace, you will find it is very easy to achieve this by using aggregations.

In D language, the syntax for an aggregation is the following:

@name[ keys ] = aggfunc ( args );

where:

  • name is the name of the aggregation.
  • keys is a comma-separated list of D expressions.
  • aggfunc is a DTrace aggregating function.
  • args is a comma-separated list of arguments appropriate for the aggregating function.
  1. In this example, we‘d like to know how many system calls are invoked within five seconds. Here‘s the code in the<lab_root>/dtrace_javanative/DScripts/ex1-aggregation-1.d file.

    syscall:::entry
    {
       @counts["syscall numbers"]=count();
    }
    tick-5sec
    {
        exit(0);
    }
    
  2. Run ex1-aggregation-1.d, which will exit in five seconds; you will see results similar to this:
    [email protected]:~# dtrace -s ex1-aggregation-1.d
    dtrace: script ‘ex1-aggregation-1.d‘ matched 215 probes
     CPU     ID                    FUNCTION:NAME
       0  81183                       :tick-5sec 
    
      syscall numbers                                                3420
    
  3. To take a further step to check which application is making the most system calls, we need to update the original file to be like ex1-aggregation-2.d, as shown below:
    syscall:::entry
    {
        @counts[execname]=count();
    }
    tick-5sec
    {
        exit(0);
    }
    
  4. Run the ex1-aggregation-2.d file, which will give you results like this:
    [email protected]:~# dtrace -s ex1-aggregation-2.d
    dtrace: script ‘ex1-aggregation-2.d‘ matched 215 probes
     CPU     ID                    FUNCTION:NAME
       0  81183                       :tick-5sec 
    
      automountd                                                        1
      dhcpagent                                                         3
      nwam-manager                                                      4
      devfsadm                                                          6
      gnome-settings-d                                                  7
      isapython2.6                                                      8
      sendmail                                                         10
      metacity                                                         11
      xscreensaver                                                     12
      gnome-power-mana                                                 15
      updatemanagernot                                                 16
      VBoxService                                                      37
      VBoxClient                                                      354
      gnome-terminal                                                  361
      Xorg                                                            393
      java                                                            596
      dtrace                                                         1573
    
  5. The following example, ex1-aggregation-3.d, displays the average time spent in the write system call, organized by process name. This example uses the avg aggregating function, specifying time stamp - self->ts as the argument. The example averages the wall clock time spent in the system call:
    syscall:::entry
    {
        self->ts=timestamp;
    }
    syscall:::return
    /self->ts/
    {
        @time[execname]=avg(timestamp-self->ts);
        self->ts=0;
    }
    
    About self->: Thread-Local Variables
    DTrace provides the ability to declare variable storage that is local to each operating system thread, as well as the global variables mentioned previously. Thread-local variables are useful when we want to enable a probe and mark every thread that fires the probe with some tags or data. In a D program, we can use thread-local variables to share a common name but refer to different data storage associated with different threads. Thread-local variables are referenced by applying the -> operator to the special identifier self.
  6. Run the ex1-aggregation-3.d file. You need to wait for several seconds, and then press Ctrl-c. A result similar to Listing 1 will be shown:
    [email protected]:~# dtrace -s ex1-aggregation-3.d
    dtrace: script ‘ex1-aggregation-3.d‘ matched 428 probes
    ^C
    
      fmd                                                           18894
      nscd                                                          25965
      utmpd                                                         28892
      ssh-agent                                                     37801
      gconfd-2                                                      79348
      nwam-manager                                                  79479
      hald                                                          88817
      hald-addon-acpi                                               95590
      dtrace                                                      3307590
      VBoxClient                                                 13981578
      Xorg                                                       15435316
      gnome-terminal                                             20763581
      java                                                       40776641
      gnome-power-mana                                           58499707
      xscreensaver                                              184487447
      updatemanagernot                                          241129100
      sendmail                                                  280901476
      VBoxService                                               358442957
      gnome-settings-d                                          392415577
      metacity                                                  409615593
      isapython2.6                                              502919529
      dhcpagent                                                 800092245
      devfsadm                                                  914232179
    

    Listing 1

    The average time records shown in Listing 1 are displayed in nanoseconds.

  7. If you want to know the distribution of system call duration from different applications, you need to use the quantizeaggregation function in ex1-aggregation-4.d:
    syscall:::entry
    {
        self->ts=timestamp;
    }
    syscall:::return
    /self->ts/
    {
        @time[execname]=quantize(timestamp-self->ts);
        self->ts=0;
    }
    
  8. Run ex1-aggregation-4.d and wait for several seconds. You should get a result like Listing 2: 
    [email protected]:~# dtrace -s ex1-aggregation-4.d
    dtrace: script ‘ex1-aggregation-4.d‘ matched 428 probes
    ^C
    
      nwam-manager
               value  ------------- Distribution ------------- count
               16384 |                                         0
               32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3
               65536 |                                         0        
    
      sendmail
               value  ------------- Distribution ------------- count
                 512 |                                         0
                1024 |@@@@                                     1
                2048 |@@@@@@@@@@@@@                            3
                4096 |                                         0
                8192 |@@@@                                     1
               16384 |@@@@                                     1
               32768 |@@@@@@@@@@@@@                            3
               65536 |                                         0        
    
      gnome-power-mana
               value  ------------- Distribution ------------- count
                1024 |                                         0
                2048 |@@@@@@@@@@@                              4
                4096 |@@@@@@@@@                                3
                8192 |@@@@@@@@@                                3
               16384 |                                         0
               32768 |                                         0
               65536 |                                         0
              131072 |@@@                                      1
              262144 |@@@                                      1
              524288 |                                         0
             1048576 |@@@                                      1
             2097152 |                                         0
             4194304 |                                         0
             8388608 |                                         0
            16777216 |                                         0
            33554432 |                                         0
            67108864 |                                         0
           134217728 |                                         0
           268435456 |@@@                                      1
           536870912 |                                         0
        ...
    

    Listing 2

    Note that the rows for the frequency distribution are always power-of-two values. Each row indicates the count of the number of elements greater than or equal to the corresponding value, but less than the next larger row value. For example, the output in Listing 2 shows that sendmail had three system calls taking between 2,048 nanoseconds and 4,096 nanoseconds, inclusive.

Exercise 2: Identifying and Solving "Inflater/Deflater Memory Leak" Problems (30 Minutes)

Java Native Interface (JNI) is used to call native code from Java. This native code can allocate native OS resources and also create Java objects. The memory allocated by the JNI code is not managed by the GC (Garbage Collection) threads, so it should be freed by the native code itself. Therefore, memory leaks caused by JNI are particularly nasty and hard to find out.

In JDK, the java.util.zip.Inflater(Deflater) class provides support for general-purpose decompression using the popular zlib compression library. The Inflater(Deflater) class passes its input and output buffers directly tozlib after acquiring them via the GetPrimitiveArrayCritical JNI call. If these classes are not used properly, there will be native memory leaks that cannot be found by common profiling tools.

In this exercise, we will learn how to use a simple DTrace script to trace the malloc native syscalls at run time, in order to identify the root cause.

The D-script files used in this exercise are located in <lab_root>/dtrace_javanative/DScripts. The Java source code files used in this exercise are located in <lab_root>/dtrace_javanative/exercises/ex2.

Background Information

Check the following Java classes, which compress and uncompress data by using java.util.zip.Deflater andjava.util.zip.Inflater. The source files (Compresser.java and Decompresser.java) are located in<lab_root>/dtrace_javanative/ exercises/ex2/src/nativememory/compresser.

package nativememory.compresser;

import java.util.zip.Deflater;

public class Compresser {
    private Deflater deflater;

    public Compresser(){
        deflater = new Deflater();
    }

    public byte[] compress(byte[] input) {
        byte[] output = new byte[8000];
        deflater.setInput(input);
        deflater.finish();
        int compressedDataLength = deflater.deflate(output);
        //deflater.end();
        byte[] result = new byte[compressedDataLength];
        System.arraycopy(output, 0, result, 0, compressedDataLength);
        return result;
    }
}

package nativememory.compresser;

import java.util.zip.DataFormatException;
import java.util.zip.Inflater;

public class Decompresser {
    private Inflater inflater;

    public Decompresser(){
        inflater = new Inflater();
    }

    public byte[] decompress(byte[] input) throws DataFormatException {
        byte[] output = new byte[8000];
        inflater.setInput(input);
        int uncompressedDataLength = inflater.inflate(output);
        //inflater.end();
        byte[] result = new byte[uncompressedDataLength];
        System.arraycopy(output, 0, result, 0, uncompressedDataLength);
        return result;
    }
}

In this exercise, we will have a running thread which does a loop of work compressing and decompressing data in the background. The InflatorWorkingObject.java file is located in <lab_root>/dtrace_javanative/ exercises/ex2/src/nativememory.

public class InflatorWorkingObject extends DefaultWorkingObject{

    @Override
    public void loopWork() {
        try {
            // Encode a long String into bytes
            Random generator = new Random();
            int messageID = generator.nextInt(10000);
            String inputString = "This is the data with mmessageID: "+messageID;
            byte[] input = inputString.getBytes("UTF-8");
            byte[] result_compress;
            byte[] result_uncompress;

            //compress
            Compresser compresser = new Compresser();
            result_compress = compresser.compress(input);
            System.out.println("Data compressed!");
            //decompress
            Decompresser decompresser = new Decompresser();
            result_uncompress = decompresser.decompress(result_compress); 

            // Decode the bytes into a String
            String outputString = new String(result_uncompress,"UTF-8");
            System.out.println("Data Decompressed: "+outputString);
            Thread.sleep(200);

        } catch (InterruptedException ex) {
        } catch (java.io.UnsupportedEncodingException ex) {
        } catch (java.util.zip.DataFormatException ex) {
        }
    }
}

Exercise 2, Step 1: Run a Data Compressing/Decompressing Program that Has Native Memory Leaks

  1. Go to the directory <lab_root>/dtrace_javanative/ exercises/ex2, and execute the ant command as follows:

    [email protected]:~$ cd dtrace_javanative/exercises/ex2
    [email protected]:~/dtrace_javanative/exercises/ex2$ ant
    
  2. The ant command runs a multithread program that simulates multiple users compressing and uncompressing data. After running for a while, the program produces output similar to the following. You can stop this program by pressing the Enter key at any time, and you can rerun it by entering the ant command. 

    Note: In order to finish the following steps, you must keep this program running.

    [java] Data Decompressed: This is the data with mmessageID: 8036
    [java] Data compressed!
    [java] Data Decompressed: This is the data with mmessageID: 6029
    [java] Data compressed!
    [java] Data Decompressed: This is the data with mmessageID: 4046
    [java] Data compressed!
    [java] Data Decompressed: This is the data with mmessageID: 7587
    [java] Data compressed!
    [java] Data Decompressed: This is the data with mmessageID: 6213
    [java] Data compressed!
    [java] Data Decompressed: This is the data with mmessageID: 205
    [java] Data compressed!
    [java] Data Decompressed: This is the data with mmessageID: 5249
    [java] Data compressed!
    [java] Data Decompressed: This is the data with mmessageID: 2687
    [java] Data compressed!
    [java] Data Decompressed: This is the data with mmessageID: 4588
    [java] Data compressed!
    [java] Data Decompressed: This is the data with mmessageID: 1971
    [java] Data compressed!
    [java] Data Decompressed: This is the data with mmessageID: 3686
    [java] Data compressed!
    [java] Data Decompressed: This is the data with mmessageID: 3259
    [java] Data compressed!
    

Exercise 2, Step 2: Monitor the Program with JDK Tools

  1. Use the jps tool to find the process ID (pid) for this Java program; you will get output like Listing 3:

    [email protected]:~$ jps -v
    1839 App -Djava.security.policy=/usr/share/vpanels/java.policy
    2010 OOW -DDescription=OOWDtraceNativeLab_ex2 -Xms256m -Xmx256m
    2011 Jps -Dapplication.home=/home/gary/sf_OOW/jdk1.7.0_25 -Xms8m
    2008 Launcher -Dant.home=/home/gary/sf_OOW/apache-ant-1.9.2 -Dant.library.dir=/home/gary/sf_OOW/apache-ant-1.9.2/lib
    

    Listing 3

    By using jps -v, we can find each Java processes with its process ID and startup arguments. From the output in Listing 3, we can tell that the process ID of the above Java program on my machine is 2010 (it might different on your machine), because it has a name of OOW with other arguments. Then we can pick it out from other Java processes.

  2. Use the prstat -p <pid> system command to monitor the memory usage of this process:
    PID  USERNAME  SIZE   RSS STATE   PRI NICE      TIME  CPU PROCESS/NLWP
    2010 gary      723M  209M sleep    59    0   0:00:05 4.8% java/11
    

    Listing 4

    From the output of prstat in Listing 4, we can see that the process with PID 2010 is consuming more than 700 MB of virtual memory, which is still increasing.

  3. Use the jmap -heap <pid> command to monitor the Java heap usage of this program. (You need to use your process ID to replace <pid>):
    Heap Usage:
    PS Young Generation
    Eden Space:
       capacity = 71630848 (68.3125MB)
       used     = 45988288 (43.85784912109375MB)
       free     = 25642560 (24.45465087890625MB)
       64.20179194304666% used
    From Space:
       capacity = 8912896 (8.5MB)
       used     = 0 (0.0MB)
       free     = 8912896 (8.5MB)
       0.0% used
    To Space:
       capacity = 8912896 (8.5MB)
       used     = 0 (0.0MB)
       free     = 8912896 (8.5MB)
       0.0% used
    tenured generation:
       capacity = 178978816 (170.6875MB)
       used     = 0 (0.0MB)
       free     = 178978816 (170.6875MB)
       0.0% used
    Perm Generation:
       capacity = 12582912 (12.0MB)
       used     = 1838216 (1.7530593872070312MB)
       free     = 10744696 (10.246940612792969MB)
       14.60882822672526% used
    
    690 interned Strings occupying 41984 bytes.
    

    Listing 5

    From the output of jmap in Listing 5, we can see that process with PID 2010 has consumed 68.3 MB (Eden) + 8.5 MB (From) + 8.5 MB (To) + 170.69 MB (tenured) + 12 MB (Perm) of memory, which is less than the memory size (723 MB) detected by prstat. So there must be some native memory leaks in this Java process.

Exercise 2, Step 3: Write a DTrace Program to Identify the Problem

  1. We know that malloc is the most common way to allocate memory from the system. For this reason, we can first use DTrace to monitor malloc usage to find a clue. In Oracle Solaris 11, malloc is integrated in standard C library functions. So we consider using the pid provider to analyze the behavior of the process.

    About the pid Provider
    The DTrace pid provider allows you to trace the internal execution of processes such as a Java application or a database. It can trace the entry and return of any function in a user process as well as any instruction specified by an absolute address or function offset. The pid provider is so powerful that you‘re very likely to use it in your daily work.

    The pid provider actually defines a class of providers. Each process can potentially have its own associated pidprovider. A process with an ID of 123, for example, would be traced by using the pid123 provider. Usually we use a variable such as $target or $1 as a parameter to define the provider dynamically.

    First we will determine how frequently malloc is being called by this Java process. As shown in Listing 6, the script<lab_root>/dtrace_javanative/DScripts/ex2.d calculates the number of malloc and free calls by the process:

    #!/usr/sbin/dtrace -qs
    
    pid$1::malloc:entry
    {
       printf("process(%d) is calling malloc %d times\n",$1,self->malloc++);
    }
    
    pid$1::free:entry
    {
       printf("process(%d) is calling %d times\n",$1,self->free++);
    }
    

    Listing 6

    Note$1 is the first argument for running the script, $2 is the second argument, and so on.

  2. Switch to the root user before trying DTrace commands:
    [email protected]:~$ su
    Password:
    [email protected]:~#
    
  3. Execute ex2.d with the Java PID. You will get a result like the result shown in Listing 7. Type Ctrl-c to stop the script.
    [email protected]: ./ex2.d 2010
    process(2010) is calling malloc 0 times
    process(2010) is calling malloc 1 times
    process(2010) is calling malloc 2 times
    process(2010) is calling malloc 3 times
    process(2010) is calling malloc 4 times
    process(2010) is calling malloc 5 times
    process(2010) is calling malloc 6 times
    process(2010) is calling malloc 7 times
    process(2010) is calling malloc 8 times
    process(2010) is calling malloc 9 times
    process(2010) is calling malloc 10 times
    process(2010) is calling malloc 11 times
    process(2010) is calling malloc 12 times
    process(2010) is calling malloc 13 times
    

    Listing 7

    We can see in Listing 7 that most of the output is about malloc calls. Given more time, you might see only a few freecalls.

  4. In order to see more clearly, we can run ex2.1.d, using an aggregate function and the profile provider to observe the result every two seconds. Listing 8 is the source code in which the profile provider is used.
    #!/usr/sbin/dtrace -qs
    pid$1:libc:malloc:entry
    {
        @malloc_count["number of malloc"] = count() ;
    }
    
    pid$1:libc:free:entry
    {
        @free_count["number of free"] = count() ;
    }
    
    profile:::tick-2sec
    {
        printa(@malloc_count) ;
        printa(@free_count) ;
        trunc(@malloc_count) ;
        trunc(@free_count) ;
    }
    

    Listing 8

    About profile-n Probes
    profile-n probe fires every fixed interval on every CPU at high interrupt level. The probe‘s firing interval is denoted by the value of n: the interrupt source will fire n times per second. n may also have an optional suffix, in which case nis interpreted to be in the units denoted by the suffix. Valid suffixes and the units they denote are as follows:

    Suffix Time Units
    nsec or ns nanoseconds
    usec or us microseconds
    msec or ms milliseconds
    sec or s seconds
    min or m minutes
    hour or h hours
    day or d days
    hz hertz (frequency per second)

    In ex2.1.dprobe profile:::tick-2sec fires every two seconds, prints the number of malloc and free calls recorded during the past two seconds, and clears the buffer using the trunc function.

  5. Run ex2.1.d with the Java PID. You will get a result similar to the output shown in Listing 9:
    [email protected]:~/DScripts# ./ex2.1.d 2010
    
      number of malloc                                                312
    
      number of malloc                                                304
    
      number of malloc                                                304
    
      number of malloc                                                304
    
      number of malloc                                                304
    
        ^C
    

    Listing 9

    The output in Listing 9 shows that the process with PID 2010 is issuing about 300 malloc calls every two seconds, and there is no free call during this period. (Note that if @free_count is 0dtrace will not allocate buffer for it, so it is not displayed.)

    Now we can say that this should account for the increasing usage of virtual memory for this Java process. Then how we can find the cause and fix the problem?

  6. The ustack action in DTrace can be used to trace the process‘s stack. So we can use ustack to discover the code path that executes the malloc call. Running ex2.2.d will print out the process‘s thread stack when malloc is called.
    #!/usr/sbin/dtrace -qs
    
    pid$1:libc:malloc:entry
    {
        @stack_count[ustack(4)] = count() ;
    }
    

    Please note that the input argument of ustack denotes stack frame depth. Here we use 4, which is enough to locate the statements that make malloc calls. You can try different numbers in order to get further information.

  7. Run ex2.2d with the Java PID for a few seconds, and then type Ctrl-c to stop the script. You will see a result similar to Listing 10:
    [email protected]:~/DScripts# ./ex2.2.d 2010
    ^C
    
                  libc.so.1`malloc
                  libzip.so`zcalloc+0x26
                  libzip.so`deflateInit2_+0x19b
                  libzip.so`Java_java_util_zip_Deflater_init+0x5d
                  145
    
                  libc.so.1`malloc
                  libzip.so`zcalloc+0x26
                  libzip.so`deflateInit2_+0x1b5
                  libzip.so`Java_java_util_zip_Deflater_init+0x5d
                  145
    
                  libc.so.1`malloc
                  libzip.so`zcalloc+0x26
                  libzip.so`deflateInit2_+0x1cf
                  libzip.so`Java_java_util_zip_Deflater_init+0x5d
                  145
    
                  libc.so.1`malloc
                  libzip.so`zcalloc+0x26
                  libzip.so`inflateInit2_+0x88
                  libzip.so`Java_java_util_zip_Inflater_init+0x4f
                  145
    

    Listing 10

    In Listing 10, we can tell that the Inflater and Deflater classes in the java.util.zip package are makingmalloc calls. This is a well-known problem with the Inflater and Deflater classes. Now we can locate the root cause in the Java source code.

    About Inflater and Deflater Memory Usage
    The Inflater and Deflater classes pass their input and output buffers directly to the zlib compression library after acquiring them via the GetPrimitiveArrayCritical JNI call. Both of these classes have finalize()methods that release c-heap buffers allocated by native code.

    However, the finalizable objects can be discovered only by GC, which occurs when there is pressure on the Java heap. The c-heap memory allocated by the Inflater and Deflater classes will not trigger GC, so the c-heap may be filled up before GC detects and enqueues unreachable finalizable objects.

    The same buffers can be freed before Inflater and Deflater objects are finalized by calling end(). So the preferred way to use Deflater and Inflater is to call the end() method after use in order to avoid native memory leaks.

Exercise 2, Step 4: Modify the Java Program and Run It Again

  1. Stop the Java program by pressing the Enter key in the console that was started in Exercise 2, Step 1.
  2. Use an editor to open the files named Compresser.java and Decompresser.java, which are located in<lab_root>/dtrace_javanative/ exercises/ex2/src/nativememory/compresser.
  3. Add in the Java statements highlighted in bold in Listing 11:
    package nativememory.compresser;
    
    import java.util.zip.Deflater;
    
    public class Compresser {
        private Deflater deflater;
    
        public Compresser(){
            deflater = new Deflater();
        }
    
        public byte[] compress(byte[] input) {
            byte[] output = new byte[8000];
            deflater.setInput(input);
            deflater.finish();
            int compressedDataLength = deflater.deflate(output);
            deflater.end();
            byte[] result = new byte[compressedDataLength];
            System.arraycopy(output, 0, result, 0, compressedDataLength);
            return result;
        }
    }
    
    package nativememory.compresser;
    
    import java.util.zip.DataFormatException;
    import java.util.zip.Inflater;
    
    public class Decompresser {
        private Inflater inflater;
    
        public Decompresser(){
            inflater = new Inflater();
        }
    
        public byte[] decompress(byte[] input) throws DataFormatException {
            byte[] output = new byte[8000];
            inflater.setInput(input);
            int uncompressedDataLength = inflater.inflate(output);
            inflater.end();
            byte[] result = new byte[uncompressedDataLength];
            System.arraycopy(output, 0, result, 0, uncompressedDataLength);
            return result;
        }
    }
    

    Listing 11

  4. Repeat Exercise 2, Step 1 through Step 3 to see the difference.

Exercise 3: Identifying and Solving "Cannot Create Native Threads" Problems (40 Minutes)

One of the common native memory problems of Java EE production systems is OutOfMemoryError: unable to create new native thread, thrown by Java HotSpot VM when it is unable to create new Java threads.

Java HotSpot VM memory is split into three memory spaces:

  • The Java heap
  • The PermGen (permanent generation) space
  • The native memory

These three parts of the memory space compete with each other for the limited amount of memory. If the Java heap and PermGen occupy too much space, there will be little left for the native memory. Please note that the maximum amount of virtual memory space for a 32-bit Java Virtual Machine (JVM) process is 4 GB (perhaps much lower on Windows or Linux).

Java threads require native memory from the OS. If this space is not big enough, the OS will refuse any further native thread and memory allocation and "unable to create new native thread" errors will be thrown. In this exercise, we will use a DTrace script to detect the native memory leaks caused by Java thread creation.

The D-script files used in this exercise are located in <lab_root>/dtrace_javanative/DScripts. The Java source code files used in this exercise are located in <lab_root>/dtrace_javanative/exercises/ex3.

Background Information

The ThreadMemoryWorkingObject Java class shown in Listing 12 does a loop of work accepting a business request and processing it in a running thread.

Refer to<lab_root>/dtrace_javanative/exercises/ex3/src/nativememory/ThreadMemoryWorkingObject.java.

public class ThreadMemoryWorkingObject extends DefaultWorkingObject{

    @Override
    public void loopWork() {
        try {
            BusinessRequest request = RequestAccept.accept();
            RequestProcess.process(request);
            Thread.sleep(200);
        } catch (InterruptedException ex) {
          ex.printStackTrace();
        }
    }
}

Listing 12

The RequestProcess class shown in Listing 13 is a business request processor that processes each request by starting a dedicated thread.

Refer to <lab_root>/exercises/ex2/src/nativememory/threadcreation/RequestProcess.java.

public class RequestProcess {
    static public void process(final BusinessRequest request) {
        new Thread(new Runnable() {
            public void run() {
                System.out.println(request.getRequestString());
                try {
                    Thread.sleep(300000);
                } catch (Exception e) {
                    System.err.println(e);
                }
            }
        }).start();
    }
}

Listing 13

Exercise 3, Step 1: Run a Request Processing Program that Causes Native Memory Leaks

  1. Go to the directory of <lab_root>/dtrace_javanative/exercises/ex3, and execute the ant command, as follows:

    [email protected]:~$ cd dtrace_javanative/exercises/ex3
    [email protected]:~/dtrace_javanative/exercises/ex3$ ant
    
  2. The ant command runs a program consisting of the Java classes mentioned in Listing 12 and Listing 13. This program accepts and processes incoming requests. The processing logic is to create a dedicated Java thread for each request, which continuously generates new Java threads and releases them later. 

    After running for a while (about one minute, depending on your environment), the program might generate the output shown in Listing 14 or something similar.

    You can stop this program by pressing the Enter key at any time, and you can restart it by running ant again.

    [java] requestID is: 6768
    [java] requestID is: 7425
    [java] requestID is: 2676
    [java] requestID is: 1701
    [java] requestID is: 1624
    [java] requestID is: 7856
    [java] requestID is: 7230
    [java] requestID is: 2313
    [java] requestID is: 3956
    [java] requestID is: 9852
    [java] Exception in thread "pool-1-thread-2" java.lang.OutOfMemoryError: unable to create new native thread
    [java]     at java.lang.Thread.start0(Native Method)
    [java]     at java.lang.Thread.start(Thread.java:693)
    [java]     at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
    [java]     at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
    [java]     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
    [java]     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    [java]     at java.lang.Thread.run(Thread.java:724)
    

    Listing 14

    From the output in Listing 14, you can see the OutOfMemoryError exception thrown by the JVM. Now, let‘s try to find out what caused the error.

Exercise 3, Step 2: Monitor the Program with JDK Tools

  1. After getting the exception, press the Enter key to stop the program, and then restart it by running ant again. Keep the program running, and use the jps tool to find the process ID for this Java program. You will get the output like Listing 15:

    [email protected]:~$ jps -v
    1839 App -Djava.security.policy=/usr/share/vpanels/java.policy
    3641 Jps -Dapplication.home=/home/gary/sf_OOW/jdk1.7.0_25 -Xms8m
    3640 OOW -DDescription=OOWDtraceNativeLab_ex3 -Xms128m -Xmx128m -XX:MaxPermSize=128m -XX:PermSize=128m -Xss2m
    3638 Launcher -Dant.home=/home/gary/sf_OOW/apache-ant-1.9.2 -Dant.library.dir=/home/gary/sf_OOW/apache-ant-1.9.2/lib
    

    Listing 15

    By using jps -v, we can find each Java processes with its process ID and startup arguments. From the output in Listing 15, we can see that the process ID of the Java program on my machine is 3640 (it might be different on your machine), because it has a name of OOW with other arguments that can help us to differentiate it from other Java processes.

  2. Use the prstat -p <pid> system command to monitor the memory usage of this process. You will get output similar to Listing 16:
    PID  USERNAME  SIZE   RSS STATE   PRI NICE   TIME    CPU  PROCESS/NLWP
    3640 ouser    276M   36M sleep    59    0   0:00:39 4.6% java/845
    

    Listing 16

    From the output in Listing 16, we see that the process with PID 3640 is consuming 276 MB virtual memory and this number is stable. Since such small memory usage should not cause OutOfMemoryError, we need to do further tracing.

  3. Use the jmap -heap <pid> command to monitor the Java heap usage of this program. (Use your process ID to replace <pid>).
    Heap Usage:
    PS Young Generation
    Eden Space:
       capacity = 34603008 (33.0MB)
       used     = 4815664 (4.5925750732421875MB)
       free     = 29787344 (28.407424926757812MB)
       13.916894161339963% used
    From Space:
       capacity = 5767168 (5.5MB)
       used     = 303200 (0.289154052734375MB)
       free     = 5463968 (5.210845947265625MB)
       5.2573464133522725% used
    To Space:
       capacity = 5767168 (5.5MB)
       used     = 0 (0.0MB)
       free     = 5767168 (5.5MB)
       0.0% used
    PS Old Generation
       capacity = 90177536 (86.0MB)
       used     = 0 (0.0MB)
       free     = 90177536 (86.0MB)
       0.0% used
    PS Perm Generation
       capacity = 134217728 (128.0MB)
       used     = 1881216 (1.7940673828125MB)
       free     = 132336512 (126.2059326171875MB)
       1.4016151428222656% used
    

    Listing 17

    From the output in Listing 17, we see that process with this PID has consumed about 33 MB (Eden) + 5.5 MB (From) +5.5 MB (To) + 86 MB (PS Old Generation) Java heap memory and 128 MB PermGen space, which is very small. There is no reason the program should be hitting the memory limit and causing OutOfMemoryError. Then, what on earth is causing the memory leaks in this Java process?

Exercise 3, Step 3: Write a DTrace Program to Identify the Problem

After getting OutOfMemoryError, from the output of prstat shown in Listing 16, we can see that the values in columnsSIZE and RSS are small, and there are about 845 threads in the process. This cannot explain why we cannot create new native threads.

The memory usage of a process can be categorized into two categories: virtual memory usage and physical memory usage (RSS). The virtual memory size is the amount of virtual address space allocated to the process. The physical memory is the amount of real memory pages allocated to a process.

  1. Run the ps command to display the process‘s virtual and physical memory usage (use your own PID number to replace 3640):

    [email protected]:~$ ps -eo pid,vsz,rss | grep 3640
     3640 282624 68608
    

    Listing 18

    The output shown in Listing 18 is the same as with prstat: virtual memory is about 276 MB and RSS is about 67 MB.

  2. We use the following ex3.d script to monitor the behavior of malloc and free for the process at a fixed time interval (refer to <lab_root>/dtrace_javanative/DScripts/ex3.d):
    pid$1::malloc:entry
    {
            @size["malloc size"] = quantize(arg0);
            @heap["total_size"]  = sum(arg0) ;
            count_time++ ;
    }
    
    pid$1::free:entry
    {
            count_time--;
    }
    
    profile:::tick-2sec
    {
            printf("(malloc-free) times is %d\n", count_time);
            count_time =0 ;
            printa(@size) ;
            printa(@heap) ;
            trunc(@size) ;
            trunc(@heap) ;
    }
    
  3. Switch to the root user before trying DTrace commands:
    [email protected]:~$ su
    Password:
    [email protected]:~#
    
  4. Run ex3.d with the PID. You will get output similar to Listing 19:
    [email protected]:~/dtrace_javanative/DScripts# ./ex3.d 3640
    Tracing... Hit Ctrl-C to end.
    (malloc-free) times is 162
    
      malloc size
               value  ------------- Distribution ------------- count
                   8 |                                         0
                  16 |@@@@@@@@@@@                              50
                  32 |@@@@                                     20
                  64 |@@                                       10
                 128 |@@@@@@@                                  30
                 256 |@@@@@@@@@                                40
                 512 |@@@@                                     20
                1024 |@@                                       10
                2048 |                                         0        
    
      total_size                                                    60400
    

    Listing 19

    The output in Listing 19 shows that the amount of memory used by malloc every two seconds (total_size) is usually about 60 K. Therefore, the program will use about 60 K x 300 = 18 MB of memory for malloc every 10 minutes. So we can say that OutOfMemoryError is not caused by malloc.

  5. Another way to check memory allocation is to use mmap (void *mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t off)). The mmap() function establishes a mapping between a process‘s address space and a file or a shared memory object. The munmap() function does the reverse; it deletes the mapping relationship and releases the memory resources.

    So we next want to check the use of mmap and munmap for the process. The following script (ex3.1.d) can do this for you:

    syscall::mmap:entry
    /pid == $1 /
    {
       printf("calling mmap, with size %d\n", arg1) ;
    }
    
    syscall::munmap:entry
    /pid == $1 /
    {
       printf("calling munmap(%d)\n",arg1);
    }
    

    arg1 is the second argument of the mmap system call, which is the memory size.

  6. Run ex3.1.d with <pid>:
    [email protected]:~/DScripts# ./ex3.1.d 3640
    calling mmap, with size 2097152
    calling mmap, with size 2097152
    calling mmap, with size 2097152
    calling mmap, with size 2097152
    calling mmap, with size 2097152
    calling mmap, with size 2097152
    calling mmap, with size 2097152
    calling mmap, with size 2097152
    calling mmap, with size 2097152
    calling mmap, with size 2097152
    

    Listing 20

    After monitoring this process for a while, we notice in Listing 20 that there are no munmap calls, but there are continuous mmap calls with a size of 2 MB. This might be the cause of OutOfMemoryError.

  7. To verify this, we use the following ex3.2.d script to check the frequency of mmap calls and the mmap call stack every four seconds.
    syscall::mmap:entry
    /pid == $1/
    {
            @mmap_num["number of mmaps"] = count();
            @mmap_size["size of mmaps"]  = quantize(arg1) ;
            @mmap_path[ustack()] = count() ;
    }
    
    profile:::tick-4sec
    {
            printa(@mmap_num);
            printa(@mmap_size);
            printa(@mmap_path);
    
            clear(@mmap_num);
            clear(@mmap_size);
            clear(@mmap_path);
    }
    

    The quantize aggregate function is used to show the frequency distribution of the memory size.

  8. Run ex3.2.d with <pid>. The output will look like Listing 21:
    [email protected]:~/DScripts# ./ex3.2.d 3640
    
      number of mmaps                                                  28
    
      size of mmaps
               value  ------------- Distribution ------------- count
             1048576 |                                         0
             2097152 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 28
             4194304 |                                         0        
    
                  libc.so.1`mmap+0x15
                  libc.so.1`_thrp_create+0xd2
                  libc.so.1`thr_create+0x32
                  libjvm.so`__1cCosNcreate_thread6FpnGThread_n0AKThreadType_I_b_+0x29d
                  libjvm.so`__1cKJavaThread2t5B6MpFp0pnGThread__vI_v_+0x4c1
                  libjvm.so`JVM_StartThread+0x133
                  0xfb4af9d0
                  0xfb4afcb3
                  0xfb403ab9
                  0xfb40353b
                  0xfb403ab9
                  0xfb4003d1
    

    Listing 21

    There are 28 mmap calls every four seconds. The size of each call is about 2 MB. According to the user stack, the mmapcalls originated from thr_create, which creates a new OS thread. Thus, when the program creates a new thread, at least 2 MB memory will be mapped to the process‘s address space. Since this is a 32-bit JVM, the maximum address space is 4 GB, including kernel space and other address spaces for code segments, shared libraries, and so on. It seems that this is the cause for OutOfMemoryError.

  9. Next, we will trace the thr_create system call to check how each system call consumes 2 MB of address space.thr_create is defined as follows:
    int thr_create(void *stack_base,
                        size_t stack_size,
                        void *(*start_func) (void*),
                        void *arg,
                        long flags,
                        thread_t *new_thread_ID);
    

    The following script (ex3.3.d) shows the behavior of the thr_create call for a process by checking the distribution ofstack_size (arg1) of the newly created threads.

    #!/usr/sbin/dtrace -qs 
    
    pid$1::thr_create:entry
    {
       @thread_num["number of threads"] = count();
       @thread["thread_stack"] = quantize(arg1);
    }
    profile:::tick-4sec
    {
            printa(@thread_num);
            printa(@thread);
            clear(@thread_num);
            clear(@thread);
    }
    
  10. Run ex3.3.d with <pid>:
    [email protected]:~/DScripts# ./ex3.3.d 3640
    
      number of threads                                                 184
    
      thread_stack
               value  ------------- Distribution ------------- count
             1048576 |                                         0
             2097152 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 184
             4194304 |                                         0
    

    Listing 22

    From the output in Listing 22, we can see that 2 MB is the stack size for every newly created thread. So the program will throw OutOfMemoryError after creating about 900 threads, because the total virtual address space of this Java process will reach the limitation of 4 GB of memory.

  11. So, why didn‘t the RSS and SIZE fields in the output of prstat (Listing 16) reflect the memory usage? Let‘s turn to mmapfor an explanation by using the following script (ex3.4.d) to check the argument of mmap:
    syscall::mmap:entry
    /pid == $1 /
    {
       printf("calling mmap, with flag set to %d\n", arg3) ;
       /*arg3 is the 4th argument, which is flag*/
        }
    

    Run ex3.4.d with <pid>. The output will be like this:

    [email protected]:~/DScripts/ex3# ./ex3.4.d 3640
    calling mmap, with flag set to 322
    calling mmap, with flag set to 322
    calling mmap, with flag set to 322
    calling mmap, with flag set to 322
    calling mmap, with flag set to 322
    calling mmap, with flag set to 322
    ^C
    

    By consulting the man page of mmap, we know that the flag value is the bitwise inclusive OR of the options that are defined in </usr/include/sys/mman.h>. That is, the flag value (322) equals MAP_PRIVATE | MAP_ANON | MAP_NORESERVE.

    Since each mmap with MAP_NORESERVE will not allocate physical memory, the memory usage cannot be seen from the output of prstat. However, mmap system calls consume the virtual memory space, which is limited (about 4 GB) in a 32-bit process. And this leads to native memory leaks.

    About Java Thread Stack Size
    In a Java application, each thread has its own stack. The stack is used to hold return addresses, function/method call arguments, and so on. As a result, if a thread tends to process larger structures via recursive algorithms, it will need a larger stack. The -Xss JVM option is used to define the limit of the Java thread stack size, which also indirectly limits how deep a stack can get.

    When a Java thread is created, the defined size of virtual memory is reserved; that is, the address space is reserved. The actual memory is allocated in pages and they are allocated only when used.

    If the -Xss option is not defined, the default size will be used. This default size is platform-specific:

    • Oracle Solaris SPARC 32-bit JVM: 512 KB
    • Oracle Solaris SPARC 64-bit JVM: 1024 KB
    • Oracle Solaris x86 32-bit JVM: 320 KB
    • Linux AMD64 64-bit JVM: 1024 KB

    In this exercise, we defined the stack size to be 2 MB. Please refer to the build file<lab_root>/dtrace_javanative/exercises/ ex3/build.xml, where you will find that this runtime parameter is defined as follows:

    <jvmarg value="-Xss2m"/>

    Please feel free to modify it to see the different effects.

Exercise 3, Step 4: Modify the Program and Run It Again

  1. Stop the program by pressing Enter in the console that was started in Exercise 3, Step 1.
  2. Use an editor to open the file<lab_root>/dtrace_javanative/exercises/ex3/src/nativememory/threadcreation/RequestProcess.java.
  3. Modify the code line highlighted in bold in Listing 23. This code will change the sleep time from 30 seconds to 30 ms, which will release the thread resource more frequently.
    public class RequestProcess {
        static public void process(final BusinessRequest request) {
            new Thread(new Runnable() {
                public void run() {
                    System.out.println(request.getRequestString());
                    try {
                        Thread.sleep(30);
                    } catch (Exception e) {
                        System.err.println(e);
                    }
                }
            }).start();
    
        }
    }
    

    Listing 23

  4. Repeat Exercise 3, Step 1 through Exercise 3, Step 3 to see the difference.

Exercise 4: Identifying and Solving "Direct ByteBuffer Memory Leak" Problems (40 Minutes)

In Java, almost all the objects exist in the heap. Usually, the Java heap is managed by the garbage collector. Automatic garbage collector makes Java one of the most productive languages. However, automatic garbage collection also causes unpredictable halt time for Java processes; even the latest garbage collection algorithm will pause the JVM for a relatively long time (varying from several seconds to tens of seconds) if more than a 10 GB heap size is used. Therefore, even though many servers are equipped with dozens of gigabytes (even terabytes) of RAM, we still prefer to use only a few gigabytes of them because of the possible garbage collection performance hits.

With the introduction of the direct ByteBuffer in Java‘s new I/O (NIO) package, memory can be allocated without being tracked by the garbage collector, because it can be available to native code for tasks such as avoiding the copying of data to and from the kernel for I/O. Java HotSpot VM has a "back door" that provides a number of low-level operations for manipulating memory directly. This back door—sun.misc.Unsafe—is widely used by the JDK itself in packages such asjava.nio and java.util.concurrentUnsafe allows us to allocate and deallocate memory explicitly via theallocateMemory and freeMemory methods. Allocated memory is not under management by GC and is not limited to the maximum JVM heap size.

Functionalities based on NIO‘s off-heap buffers have been widely used among popular products such as Oracle Coherence, Terracotta BigMemory, and Apache DirectMemory. Obviously, the drawback of this approach is that you have to take responsibility for memory management yourself, which might lead to native memory leaks.

In this exercise, we will use a DTrace script to detect the native memory leaks caused by "NIO DirectBuffer" management.

The D-script files used in this exercise are located in <lab_root>/dtrace_javanative/DScripts. The Java source code files used in this exercise are located in <lab_root>/dtrace_javanative/exercises/ex4.

Background Information

The following Java class (QueueWriterWorkingObject) runs a thread that does a loop of work for creating a data structure and putting it into a queue. The queue data is stored in DirectBuffer by writing the data structure into the location pointed to by the memory address. (Refer to<lab_root>/dtrace_javanative/exercises/ex4/src/nativememory/QueueWriterWorkingObject.java.)

public class QueueWriterWorkingObject extends DefaultWorkingObject{

    public static long idIncreatment = 0;
    public static QueueDataAccess qda = new QueueDataAccess();

    @Override
    public void loopWork() {
        try {
            qda.setObjectOffset(NativeMemoryManager.createOneMessageSpace());
            qda.setMessageId(idIncreatment);
            qda.setMessageType(‘B‘);
            qda.setReceiverId(idIncreatment+100);
            qda.setSenderId(idIncreatment+300);
            qda.setMessage("hello,OOW");
            idIncreatment++;
            Thread.sleep(20);
        } catch (InterruptedException ex) {
          ex.printStackTrace();
        }
    }

}

The following Java class (QueueReaderWorkingObject) runs a thread that does a loop of work for consuming messages from a queue. The queue message is read from DirectBuffer by locating the memory address of the data structure. (Refer to<lab_root>/dtrace_javanative/exercises/ex4/src/nativememory/QueueReaderWorkingObject.java.)

public class QueueReaderWorkingObject extends DefaultWorkingObject{

    public static long idIncreatment = 0;
    public static QueueDataAccess qda = new QueueDataAccess();

    @Override
    public void loopWork() {
        try {
            Thread.sleep(20);
            long offset = NativeMemoryManager.getMessageAddress(idIncreatment);
            if (offset >= 0) {
                qda.setObjectOffset(NativeMemoryManager.getMessageAddress(idIncreatment));
                long messageid = qda.getMessageId();
                char messageType = qda.getMessageType();
                long receiverid = qda.getReceiverId();
                long senderid = qda.getSenderId();
                byte[] message = qda.getMessage();
                idIncreatment++;
                System.out.println("we got messageid: " + messageid + " with type " + messageType
                  + " from user "+ senderid + " to user " + receiverid + " with message: "
                  + new String(message));
                NativeMemoryManager.destroyMessage(idIncreatment);
            } else {
                Thread.sleep(10);
            }

        } catch (InterruptedException ex) {
          ex.printStackTrace();
        }
    }

}

Exercise 4, Step 1: Run a Program that Produces and Consumes Queue Messages and Causes Native Memory Leaks

  1. Go to the directory <lab_root>/dtrace_javanative/exercises/ex4, and execute the ant command, as follows:

    [email protected]:~$ cd dtrace_javanative/exercises/ex4
    [email protected]:~/dtrace_javanative/exercises/ex4$ ant
    
  2. The ant command runs a program that prints the queue message reader‘s log information. After running for a while, the program might produce output similar to the following. You can stop this program by pressing the Enter key at any time, and you can restart it by running ant again.
    [java] we got messageid: 0 with type B from user 300 to user 100 with message: hello,OOW
    [java] we got messageid: 1 with type B from user 301 to user 101 with message: hello,OOW
    [java] we got messageid: 2 with type B from user 302 to user 102 with message: hello,OOW
    [java] we got messageid: 3 with type B from user 303 to user 103 with message: hello,OOW
    [java] we got messageid: 4 with type B from user 304 to user 104 with message: hello,OOW
    [java] we got messageid: 5 with type B from user 305 to user 105 with message: hello,OOW
    [java] we got messageid: 6 with type B from user 306 to user 106 with message: hello,OOW
    [java] we got messageid: 7 with type B from user 307 to user 107 with message: hello,OOW
    [java] we got messageid: 8 with type B from user 308 to user 108 with message: hello,OOW
    [java] we got messageid: 9 with type B from user 309 to user 109 with message: hello,OOW
    [java] we got messageid: 10 with type B from user 310 to user 110 with message: hello,OOW
    [java] we got messageid: 11 with type B from user 311 to user 111 with message: hello,OOW
    [java] we got messageid: 12 with type B from user 312 to user 112 with message: hello,OOW
    [java] we got messageid: 13 with type B from user 313 to user 113 with message: hello,OOW
    

Exercise 4, Step 2: Monitor the Program with JDK Tools

  1. While running the program started in Exercise 4, Step 1, use the jps tool to find the PID of the program. You should get output similar to Listing 24:

    [email protected]:~$ jps -v
    1839 App -Djava.security.policy=/usr/share/vpanels/java.policy
    3952 Launcher -Dant.home=/home/gary/sf_OOW/apache-ant-1.9.2 -Dant.library.dir=/home/gary/sf_OOW/apache-ant-1.9.2/lib
    3955 Jps -Dapplication.home=/home/gary/sf_OOW/jdk1.7.0_25 -Xms8m
    3954 OOW -DDescription=OOWDtraceNativeLab_ex4 -Xms128m -Xmx128m -XX:+ExtendedDTraceProbes
    

    Listing 24

    By using jps -v, we can find each Java process with its PID and startup arguments. From the output in Listing 24, we can tell that the PID of the Java program on my machine is 3954 (it might be different on your machine), because it has a name of OOW with other arguments.

  2. Use the prstat -p <pid> system command to monitor the memory usage of this process, as shown in Listing 25.
     PID USERNAME  SIZE    RSS STATE   PRI NICE      TIME  CPU PROCESS/NLWP
    3954 ouser      201M   29M sleep    59    0   0:00:03 2.5% java/12
    

    Listing 25

    From the output shown in Listing 25, we can see that the process with PID 3954 is consuming about 201 MB of virtual memory and this number is still increasing.

  3. Use the jmap -heap <pid> command to monitor the Java heap usage of this program (you can use your PID to replace <pid>). 
    Heap Usage:
    PS Young Generation
    Eden Space:
       capacity = 34630848 (33.3125MB)
       used     = 4848624 (4.564041137695312MB)
       free     = 29702224 (28.34845886230469MB)
       14.000426185098354% used
    From Space:
       capacity = 5762896 (5.5MB)
       used     = 0 (0.0MB)
       free     = 5762896 (5.5MB)
       0.0% used
    To Space:
       capacity = 5762896 (5.5MB)
       used     = 0 (0.0MB)
       free     = 5762896 (5.5MB)
       0.0% used
    tenured generation:
       capacity = 90178816 (86.0MB)
       used     = 0 (0.0MB)
       free     = 90178816 (86.0MB)
       0.0% used
    Perm Generation:
       capacity = 12582912 (12.0MB)
       used     = 1869608 (1.7829971313476562MB)
       free     = 10713304 (10.217002868652344MB)
       14.858309427897135% used
    
    760 interned Strings occupying 47136 bytes.
    

    Listing 26

    From the output in Listing 26, we see that the process with PID 3954 had consumed about 128 MB of heap memory and 12 MB of PermGen space, which is less than the result of the prstat output shown in Listing 25. So there must be some native memory leaks in this Java process.

Exercise 4, Step 3: Use DTrace to Trace the Problem

Now we will explore how to use the Java HotSpot VM provider to analyze a Java problem and quickly locate the root cause in the Java source code.

  1. In the output of prstat shown in Listing 25, the values in both the SIZE and RSS fields are increasing. Therefore, we need to observe the malloc and mmapcalls of this Java process by using ex4.d:

    pid$1::malloc:entry
    {
       printf("calling malloc(%d) %d times\n",arg0,malloc++);
    }
    
    pid$1::free:entry
    {
       printf("calling free %d times\n",free++);
    }
    
    syscall::mmap:entry,
    syscall::munmap:entry
    /pid == $1/
    {
       printf("calling %s with size %d\n",probefunc, arg1) ;
    }
    
  2. Switch to the root user before trying any DTrace commands:
    [email protected]:~$ su
    Password:
    [email protected]:~#
    
  3. Run ex4.d with the PID. You will get output similar to Listing 27:
    [email protected]:~/dtrace_javanative/DScripts# ./ex4.d 3954
    Tracing... Hit Ctrl-C to end.
    calling malloc(2003000) 0 times
    calling malloc(2003000) 1 times
    calling malloc(2003000) 2 times
    calling malloc(2003000) 3 times
    calling malloc(2003000) 4 times
    calling malloc(2003000) 5 times
    calling malloc(2003000) 6 times
    calling malloc(2003000) 7 times
    calling malloc(2003000) 8 times
    calling malloc(2003000) 9 times
    calling malloc(2003000) 10 times
    ^C
    

    Listing 27

    As you can see in Listing 27, there is no mmap call. This process uses only malloc for memory allocation with a fixed size of about 2 MB and without calling free. Therefore, it is the malloc calls that continuously eat up memory.

  4. Now we will use the ustack function to check the user-level stack when malloc is being called. Refer to the following script (ex4.1.d):
    pid$1:libc:malloc:entry
    {
       @stack_count[ustack(),arg0] = count() ;
    }
    
  5. Run ex4.1.d with the PID. You will get the output similar to Listing 28:
    [email protected]:~/dtrace_javanative/DScripts# ./ex4.1.d 3954
    ^C
    
                  libc.so.1`malloc
                  libjvm.so`__1cCosGmalloc6FI_pv_+0x2d
                  libjvm.so`Unsafe_AllocateMemory+0x17e
                  0xfa20ab12
                  0xfa262960
              2003000                12
    

    Listing 28

    As you can see in Listing 28, all of the malloc calls have a size of about 2 MB and occurred 12 times during the running time of this script. And all of these malloc calls come from Unsafe_AllocateMemory. Therefore, we can easily locate the root cause in the Java source code.

    However, the real world is not that simple. Usually there are lots of malloc calls from different segments of source code. As a result, the output of running ex4.1.d will be unreadable and it will be difficult for us to locate the cause.

  6. Actually we have a better choice—to use the DTrace hotspot provider—which provides probes to track the lifespan of the VM, thread start and stop events, GC memory pool statistics, method compilations, and monitor activity. With a startup flag, additional probes—such as method-entry probes and method-return probes—are enabled that can be used to monitor the running Java program and object allocations.

    Now we will use method-entry and method-return probes for tracking this Java process. These probes can provide a fine-grained examination of Java thread execution. They will fire any time a method is entered or returned. The method-entry and method-return probes take the following arguments:

    • args[0]: The Java thread ID of the thread that is entering or leaving the method
    • args[1]: A pointer to UTF-8 string data that contains the name of the class of the method
    • args[2]: The length of the class name data (in bytes)
    • args[3]: A pointer to UTF-8 string data that contains the name of the method
    • args[4]: The length of the method name data (in bytes)
    • args[5]: A pointer to UTF-8 string data that contains the signature of the method
    • args[6]: The length of the signature (in bytes)

    The following script (ex4.2.d) uses the probes mentioned above to find out which classes and methods makemalloc calls.

    hotspot$1:::method-entry
    / stringof(copyin(arg1,6) )== "native"/
    {
            self->class = stringof(copyin(arg1, arg2 + 1));
            self->class[arg2] = ‘\0‘;
            self->method = stringof(copyin(arg3, arg4 + 1));
            self->method[arg4] = ‘\0‘;
    }
    
    hotspot$1:::method-return
    / self->class != NULL /
    {
       self->class = 0 ;
       self->method = 0 ;
    }
    
    pid$1::malloc:entry
    / self->class != NULL /
    {
       printf("calling malloc(%d) in %s.%s\n",arg0,self->class,self->method);
       @malloc_size[self->class,self->method] = sum(arg0) ;
       @malloc_dist[self->class,self->method] = quantize(arg0) ;
    }
    
    dtrace:::END
    {
            printf("\nJava malloc byte distributions by class and method,\n\n");
            printa("   %s, %s, bytes total = %@d %@d\n", @malloc_size,
                @malloc_dist) ;
    }
    

    We added / stringof(copyin(arg1,6) )== "native"/ in the predicate field in order to focus on the classes of interest without being distracted by other irrelevant pieces of code.

  7. Run ex4.2.d with the PID. You will get the output similar to Listing 29:
    [email protected]:~/dtrace_javanative/DScripts# ./ex4.2.d 3954
    calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
    calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
    calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
    calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
    calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
    calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
    calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
    calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
    calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
    calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
    calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
    calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
    calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
    ^C
    
    Java malloc byte distributions by class and method,
    
       nativememory/DataAccess/NativeMemoryManager, createOneMessageSpace, bytes total = 26039000
               value  ------------- Distribution ------------- count
              524288 |                                         0
             1048576 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 13
             2097152 |                                         0
    

    Listing 29

    As you can see in Listing 29, all of the malloc calls are called by the createOneMessageSpace method of theNativeMemoryManager class.

Exercise 4, Step 4: Modify the Program and Run It Again

  1. Stop the program by pressing the Enter key in the console that was started in Exercise 4, Step 1.
  2. Use an editor to open the file<lab_root>/dtrace_javanative/exercises/ex4/src/nativememory/DataAccess/NativeMemoryManager.java.

    In this file, we can find that NativeMemoryManager allocates one page of memory space for every 100 records.
  3. Add the code lines highlighted in bold in Listing 30. This code will free a native memory page if all of the 100 records in it are consumed. 
    public class NativeMemoryManager {
    
        public static int recordPerPage = 100;
        public static volatile long nextMessageid;
        public static long[] memoryPagesOffset = new long[1000];
        public static final long recordsize = QueueDataAccess.getObjectSize();
        public static final long requiredHeap = recordPerPage * QueueDataAccess.getObjectSize();
    
        public static long getMessageAddress(final long messageid){
            if (messageid>nextMessageid-2) return -1;
            long reminder = messageid%recordPerPage;
            int index = (int) ((messageid-reminder)/recordPerPage);
            return memoryPagesOffset[index]+reminder*recordsize;
        }
    
        synchronized public static long createOneMessageSpace(){
            long reminder = nextMessageid%recordPerPage;
            int index = (int) ((nextMessageid-reminder)/recordPerPage);
            if(reminder ==0) {
                long address = Util.unsafe.allocateMemory(requiredHeap);
                memoryPagesOffset[index]=address;
            }
    
            nextMessageid++;
            return memoryPagesOffset[index]+reminder*recordsize;
    
        }
    
        public static void destroyMessage(final long messageid) {
            long reminder = messageid%recordPerPage;
            int index = (int) ((messageid-reminder)/recordPerPage);
            if (reminder == 0 && index >1) Util.unsafe.freeMemory(memoryPagesOffset[index-1]);
    
        }
    
    }
    

    Listing 30

  4. Repeat the process from Exercise 4, Step 1 through Exercise 4, Step 3 to see the difference.

Summary

You have successfully completed the "How to Use DTrace to Troubleshoot Java Native Memory Problems on Oracle Solaris 11" hands-on lab! Starting by learning basic concepts, we used a set of powerful DTrace scripts to solve different Java native memory problems. The usage of predicates, aggregations, and the pidprofile, and hotspot providers were also demonstrated.

See Also

Here are some additional resources:

About the Authors

Wang Yu presently works for Oracle‘s ISV Engineering group as an Oracle Solaris (Java) technology engineer and technology architecture consultant. His duties include supporting local independent software vendors (ISVs) and evangelizing about and consulting on important Oracle Solaris and Java technologies.

Wensheng (Vincent) Liu is a technical consultant in Oracle‘s ISV Engineering group. His responsibility includes providing technical support to local ISVs and evangelizing about and consulting on important Oracle Solaris and system-related technologies.

Gang (Gary) Wang, manager of Oracle‘s ISV Engineering group, leads ISV Engineering teams in Beijing helping ISVs in China, Japan, and Korea to make their applications run best on Oracle Solaris, Oracle servers, and Oracle storage.

Revision 1.0, 02/25/2014

Follow us:
Blog | Facebook | Twitter | YouTube

时间: 2024-10-11 00:54:29

DTrace to Troubleshoot Java Native Memory Problems的相关文章

PatentTips – Java native function calling

BACKGROUND OF INVENTION This invention relates to a system and method for providing a native function call facility. In particular it relates to a system and method for providing a native function call facility in a Java Virtual Machine (JVM) for pla

Java Programming Tutorial Java Native Interface (JNI)

1.  Introduction At times, it is necessary to use native codes (C/C++) to overcome the memory management and performance constraints in Java. Java supports native codes via the Java Native Interface (JNI). JNI is difficult, as it involves two languag

Native memory allocation (mmap) failed to map 142606336 bytes for committing reserved memory.

这里写链接内容 问题描述 Java程序运行过程中抛出java.lang.OutOfMemoryError: unable to create new native thread,如下所示: [java] view plain copy java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(T

Java Native

以下引用自http://blog.csdn.net/jiakw_1981/article/details/3073613,作者:jiakw_1981 java native关键字 一. 什么是Native Method   简单地讲,一个Native Method就是一个java调用非java代码的接口.一个Native Method是这样一个java的方法:该方法的实现由非java语言实现,比如C.这个特征并非java所特有,很多其它的编程语言都有这一机制,比如在C++中,你可以用extern

ndk-gdb 对java/native code联合调试(升级版)

之前写过一篇 关于android native 开发,调试的文章(http://www.cnblogs.com/yaozhongxiao/archive/2012/03/13/2393959.html) 最近参与项目也涉及到这部分内容,发现随着不断的发展,开发调试变得越来越简单,应证了科学技术是第一生产力啊,你阅读完后发现调试native渐渐变得不是问题的问题了 ndk  :   android-ndk-r9n eclipse: android 官网下载集成包,已经包含了adt等所需的完整环境

java native interface JNI 调用Java方法

在上一篇文章中介绍了JNI,以及java调用JNI,这篇讲一下 JNI调用java方法. 通过使用合适的JNI函数,你可以创建Java对象,get.set 静态(static)和 实例(instance)的域,调用静态(static)和实例(instance)函数.JNI通过ID识别域和方法,一个域或方法的ID是任何处理域和方法的函数的必须参数. 下表列出了用以得到静态(static)和实例(instance)的域与方法的JNI函数.每个函数接受(作为参数)域或方法的类,它们的名称,符号和它们对

a堆内存与栈内存异同(Java Heap Memory vs Stack Memory Difference)

--reference Java Heap Memory vs Stack Memory Difference 在数据结构中,堆和栈可以说是两种最基础的数据结构,而Java中的栈内存空间和堆内存空间有什么异同,以及和数据结构中的堆栈有何关系? 一.Java 堆存储空间 堆内存(堆存储空间)会在Java运行时分配给对象(Object)或者JRE的类.只要我们创建了一个对象,那么在堆中肯定会分配一块存储空间给这个对象.而我们熟知的Java垃圾回收就是在堆存储空间上进行的,用以释放那些没有任何引用指向

JNI/NDK开发指南(二)——JVM查找java native方法的规则

通过第一篇文章,大家明白了调用native方法之前,首先要调用System.loadLibrary接口加载一个实现了native方法的动态库才能正常访问,否则就会抛出java.lang.UnsatisfiedLinkError异常,找不到XX方法的提示.现在我们想想,在Java中调用某个native方法时,JVM是通过什么方式,能正确的找到动态库中C/C++实现的那个native函数呢? JVM查找native方法有两种方式: 1> 按照JNI规范的命名规则 2> 调用JNI提供的Regist

+Java中的native关键字浅析(Java+Native+Interface)++

JNI是Java Native Interface的 缩写.从Java 1.1开始,Java Native Interface (JNI)标准成为java平台的一部分,它允许Java代码和其他语言写的代码进行交互.JNI一开始是为了本地已编译语言,尤其是C和C++而设计的,但是它并不妨碍你使用其他语言,只要调用约定受支持就可以了. 使用java与本地已编译的代码交互,通常会丧失平台可移植性.但是,有些情况下这样做是可以接受的,甚至是必须的,比如,使用一些旧的库,与硬件.操作系统进行交互,或者为了