Wednesday, February 22, 2012

Figure out why is JAVA eating CPU?

In a production environment profiling is not an option, we have seen several times that our CPU has reached almost 100%, we are running tomcat on Redhat 5.5, so what is happening?

Fortunately, java comes with some great debugging tools, cooperating those tools with Linux built-in tools will let you know what is happening.

Here is what i am going to explain:
  • An introduction about java threads and its relation to what-so-called Linux LWP
  • A step-by-step
1- An introduction

as you may know, any java program starts when the JVM calls the main method, this creates a thread called the main thread and any thread you create using java code will be derived from the main thread (out-of-focus: it is always a good practice to give your threads a name, it is very useful in debugging, however, it is not required here) the same exact behavior occurs on the Linux level, the main thread for java means a process for the OS, and every thread you create using java the OS will create a Light-weight-process or LWP.
to cut it short, Java main thread = Linux process and Java thread = Linux LWP.

the idea here:
  • Ask Linux which LWP is eating the CPU.
  • Ask Java for a Thread Dump.
  • Map this LWP to a Java thread.
  • Get the part of code causing the issue.

2- A Step-By-Step:
  • Get the PID: the very first step is to know what is the Java process ID, we will use Linux tools ps and grep
    ps -A|grep java
    if you are running multiple java processes we can execute
    ps -ef|grep java

  • the next step is to get CPU usage per each LWP related to the main process, again we will use  ps  and  grep
    ps -eLo pid,lwp,nlwp,ruser,pcpu,stime,etime,args|grep {pid} > lwpthread.txt
    the file lwpthread.txt will contain some thing similar:
    PID NLWP SZ RUSER %CPU STIME ELAPSED COMMAND
    8234 8234 1110 admin 0.3 08:11 30:15 /usr/java/jdk1.6.0_24/bin/java
    8234 8245 1110  admin 99.0 08:45 10:15 /usr/java/jdk1.6.0_24/bin/java 

    as you can see, we have an LWP(mapped to a java thread) eating the CPU, get the NLWP, that will be our lead in the next step.

  • the next step is to generate a java thread dump, there are two main ways to create a thread dump, the first one is to use the JDK tool jstack and pass the PID to it, the second way is to send a kill signal to the JVM, however in the first way you will have control over where you want to save the thread dump, while on the second way you will have the thread dump written on the java process standard output stream.
     as said, we are using tomcat, so the thread dump will be in catalina.out
    kill -3 {pid}
    the thread dump will be printed to the file with full stack trace. the file will contain the thread you are after, but first convert your LWP id from DEC to HEX so 8245 would be 2035, now open the thread dump with text editor and search for 2035, you will find something similar:
    "TP-Processor234786" daemon prio=10 tid=0x00002aaad8024800 nid=0x2035 runnable [0x00002aaadef29000]
    java.lang.Thread.State: RUNNABLE
    at java.util.HashMap.get(HashMap.java:303)
    at ......

and you are done!