Tag Archives: hadoop

Taking memory dumps of Hadoop tasks

This post originally appeared in my older blog. I am carrying over a few of those posts which seem to be popular.


On the Hadoop users mailing list, I was recently working with a user on a recurring problem faced by users of Hadoop. The user’s tasks were running out of memory and he wanted to know why. The typical approach to debug this problem is to use a diagnostic tool like jmap, or hook up a profiler to the running task. However, for most Hadoop users, this is not feasible. In real applications, the tasks run on a cluster to which users do not have login access and they cannot debug the task as it runs. I was aware of an option that is provided by the JVM – -XX:+HeapDumpOnOutOfMemoryError that can be passed to the task’s Java command line. This option makes the JVM to take a dump when the task runs out of memory. However, the dump is saved by default to the current working directory of the task, which in Hadoop’s case is a temporary scratch space managed by the framework and it would become inaccessible once the task completes.

At this point, when we were pretty much giving up on options, Koji Noguchi, an ex-colleague of mine and a brilliant systems engineer and Hadoop debugger, responded with a way out. You can read about it here. The few lines mentioned feel almost like magic, so I thought I would write an explanation about how it is working, for people who are interested.

The requirement is to save the dump to a location which is accessible by the user running the task. Such a location, in Hadoop’s case, is … HDFS. So, what we want is to be able to save the dump to HDFS when the scenario happens.

To get there, the first observation is that the JVM offers hooks that can be passed to the task’s command line, for us to act on an OutOfMemory scenario. The options Koji used are:

These options instruct the JVM to take a dump on OutOfMemory, save it with a name of our choice, and importantly, run a custom script . It should be obvious now that the custom script can copy the generated dump to DFS, using regular DFS commands.

There are a couple of gaps to plug though – as the devil is in the details. How does the script (copy-dump.sh, in the example above) get to the cluster nodes ? For that, we use Hadoop’s feature – Distributed Cache. This feature allows arbitrary files to be packaged and distributed to cluster nodes where tasks require them – using HDFS as an intermediate store. The cached files are usually available to tasks via a Java API. However, since we need this outside the task’s context, we use a powerful option of the Distributed Cache – creating symlinks. This option not only makes the files available to the tasks via the API, but also creates a symbolic link of that file into the current working directory of the task. Hence, when we refer to the script in the task’s command line, we can refer to it relative to the current working directory of the task, i.e. ‘.’.

The specific options to set up all of the above are as follows:

The last detail in the whole solution is about how to save the dump on HDFS with a name that is unique. Because, realize that multiple tasks are running together at the same time, and more than one of them could run out of memory. Koji’s scripting brilliance for solving this problem was to use the following script:

The expression ${PWD//\//_} takes the current working directory of the task (from the environment), and replaces every occurrence of ‘/’ with an ‘_’. Nice !

So, using these options, features and diagnostics of Hadoop and the JVM, users can now get memory dumps of their tasks to locations that they can easily access and analyse. Thanks a lot to Koji for sharing this technique, and happy debugging !!!