What is HAManager

WebSphere V6 introduces a new concept for advanced failover and thus higher availability, called the High Availability Manager (HAManager). The HAManager enhances the availability of WebSphere singleton services like transaction services or JMS message services. It runs as a service within each application server process that monitors the health of WebSphere clusters. In the event of a server failure, the HAManager will failover the singleton service and recover any in-flight transactions.

What is failover

The proposition to have multiple servers (potentially on multiple independent machines) naturally leads to the potential for the system to provide failover. That is, if any one machine or server in the system were to fail for any reason, the system should continue to operate with the remaining servers. The load balancing property should ensure that the client load gets redistributed to the remaining servers, each of which will take on a proportionately slightly higher percentage of the total load. Of course, such an arrangement assumes that the system is designed with some degree of overcapacity, so that the remaining servers are indeed sufficient to process the total expected client load.

Ideally, the failover aspect should be totally transparent to clients of the system. When a server fails, any client that is currently interacting with that server should be automatically redirected to one of the remaining servers, without any interruption of service and without requiring any special action on the part of that client. In practice, however, most failover solutions may not be completely transparent. For example, a client that is currently in the middle of an operation when a server fails may receive an error from that operation, and may be required to retry (at which point the client would be connected to another, still available server). Or the client may observe a pause or delay in processing, before the processing of its requests resumes automatically with a different server. The important point in failover is that each client, and the set of clients as a whole, is able to eventually continue to take advantage of the system and receive service, even if some of the servers fail and become unavailable.

Conversely, when a previously failed server is repaired and again becomes available, the system may transparently start using that server again to process a portion of the total client load.

What is availability or resiliency

Availability is the description of the system’s ability to respond to requests no matter the circumstances. Availability requires that the topology provide some degree of process redundancy in order to eliminate single points of failure. While vertical scalability can provide this by creating multiple processes, the physical machine then becomes a single point of failure. For this reason, a high availability topology typically involves horizontal scaling across multiple machines.

Using a WebSphere Application Server multiple machine configuration eliminates a given application server process as a single point of failure. In WebSphere Application Server V5.0 and higher, the removal of the application dependencies on the administrative server process for security, naming and transactions further reduces the potential that a single process failure can disrupt processing on a given node. In fact, the only single point of failure in a WebSphere cell is the Deployment Manager, where all central administration is performed. However, a failure at the Deployment Manager only impacts the ability to change the cell configuration and to run the Tivoli Performance Viewer which is now included in the Administrative Console; application servers are more self-sufficient in WebSphere Application Server V5.0 and higher compared to WebSphere Application Server V4.x.

What is work load management WLM

Workload management is a WebSphere Facility to provide load balancing and affinity
between application servers in a WebSphere clustered environment. Websphere uses
workload management to send requests to alternate members of the cluster. WebSphere also routes the concurrent requests from a user to the application server that serviced the first request, as EJB calls and session state will be memory of this application server.

The proposed configuration should ensure that each machine or server in the configuration processes a fair share of the overall client load that is being processed by the system as a whole. In other words, it is not efficient to have one machine overloaded while another machine is mostly idle. If all machines have roughly the same capacity (for example, CPU power), each should process a roughly equal share of the load. Otherwise, there likely needs to be a provision for workload to be distributed in proportion to the processing power available on each machine.

Furthermore, if the total load changes over time, the system should automatically adapt itself; for example, all machines may use 50% of their capacity, or all machines may use 100% of their capacity. But not one machine uses 100% of its capacity while the rest uses 15% of their capacity.

What is scalability

Scalability defines how easily a site will expand. Web sites must expand, sometimes with little warning, and grow to support an increased load. The increased load may come from many sources:


    
  1. New markets

  2. Normal growth

  3. Extreme peaks



An application and infrastructure that is architected for good scalability makes site growth possible and easy.

Most often, one achieves scalability by adding hardware resources to improve throughput. A more complex configuration, employing additional hardware, should allow one to service a higher client load than that provided by the simple basic configuration. Ideally, it should be possible to service any given load simply by adding additional servers or machines (or upgrading existing resources).

However, adding new systems or processing power does not always provide a linear increase in throughput. For example, doubling the number of processors in your system will not necessarily result in twice the processing capacity. Nor will adding an additional horizontal server in the Application Server tier necessarily result in twice the request serving capacity. Adding additional resources introduces additional overhead for resource management and request distribution. While the overhead and corresponding degradation may be small, you need to remember that adding n additional machines does not always result in n times the throughput.

Also, you should not simply add hardware without doing some investigation and possible software tuning first to identify potential bottlenecks in your application
or any other performance-related software configurations. Adding more hardware may not necessarily improve the performance if the software is badly designed or not tuned correctly. Once the software optimization has been done, then the hardware resources should be considered as the next step for improving performance

IBM Support Assistant - Collect Data

You can also use IBM Support Assistant tool for collecting data (Same Functionality as that of IBM Support Assistant Lite), though collecting data with ISA asks for same inputs as ISA Lite the look and feel of ISA Collector tool is much better

You can reach the Collect tool by clicking on Launch Activity -> Collect and Send Data, In my sample screen i am collecting General data.




This is the view that i get after data is collected


As you can see the collected data can be related to a case and uploaded from here.

IBM Support Assistant - Find Information

The IBM SUpport assistant provides Find Information functionality that you can use to find more information about the your product or the problem that your facing

Search Information


This view lets you search for particular keyword across different sources. Ex. I was working on this problem where WAS was throwing Http 409 error code. SO i can search for information on that error code in IBM Support site, Developer Works site, Google,..



Media Viewer



The Media Viewer tab lets you find the available media for that product such as WAS 6.1 Performance related information from Education Assistant, related .pdf flash files and red books



Product Information



The Product Information tab shows product related information such as recently recommended fixes, Flash, APARS,..

Using ISALite in GUI Mode

Use the following steps to use the ISALite in the GUI mode


  • Execute the runISALite.bat file and you will get a GUI tool like this


  • Now select the type of problem for which you want to collect the data. Lets assume that i am having a problem in Starting server so, i selected problem type as Start problem and i want to store the collected data on desktop in serverStartProblem.zip

    Now click on the Collect Data button to start the actual process for collecting the data

  • I got this warning message that ISA Lite Should be run on the DMGr machine. In my case its standalone environment so its ok.


  • Next it asked me for the root directory where WebSphere APplication server is installed.


  • After that it asked me for the name of the server


  • On the next screen i had to enter my WAS Admin user id and password like this


  • I had to answer few more questions about the server startup








  • Once the data collection process is started, it will run for few minutes and at the end it will ask you if you want to transfer the collected data to IBM Support site

    I did select Do Not Transfer data here.

  • When i opened the startServerProblem.zip file i could see that it collected lots of data related to the problem such as log files, ffdc folder, some important properties files...


    It also turned trace on for

    [7/14/09 8:22:48:920 PDT] 00000041 ManagerAdmin I TRAS0018I: The trace state has changed. The new trace state is *=info.
    [7/14/09 8:23:11:669 PDT] 00000041 ManagerAdmin I TRAS0018I: The trace state has changed. The new trace state is *=info:com.ibm.websphere.*=finest.

IBM SUpport Assistant Lite Quick Data Collection

The IBM Support Assistant Lite (Quick Data Collection), tool lets you collect all the necessary data that support might need to debug your problem.

You can download WebSphere Application Server specific version of ISA Lite from Download IBM Support Assistant (ISA) Lite for WebSphere Application Server page. After downloading the .zip file you can extract it in a directory and you will something like this.




You can use the ISALIte either in Console mode by executing runISALiteConsole.bat or run ISA Lite in the GUI mode by executing runISALite.bat.

IBM Support Assistant

The IBM Support assistant product comes in three flavors

  1. IBM Support assistant Lite: This product is just for collecting all the necessary data such as configuration, logs and traces, version and history information. When you open a support request with IBM they might ask you collect diagnostic data from your system using IBM Support assistance lite

  2. Serviceability Workbench: This product is also known as IBM Support assistant, its a Eclipse based product which can not only collect the necessary data but also has some trouble shooting tools such as Thread Dump Analyzer, Heap Dump Analyzer,..

  3. Enterprise Solution: Deploy the IBM Support Assistant Agent onto systems in your enterprise to enable remote problem determination. Use Agents to run remote data collections, take inventory of remote systems, filter and collect remote log files, and perform other remote problem determination tasks. The Agent component is optional and works in conjunction with the IBM Support Assistant Workbench.



You can download any of the three products from ISA Home Page

Common base event model

The Log Analyzer maps the currently supported proprietary log formats into a common event model called Common Base Event. This allows the analyzer to use a common format for any log records from any supported proprietary log files. The parsers provided with the Log Analyzer map the log records from their current output format to this common model.

For example it will take this message from trace.log file

[7/10/09 19:46:19:007 EDT] 00000014 WSChannelFram A CHFW0019I: The Transport Channel Service has started chain chain_0.


and convert it into

<?xml version="1.0" encoding="UTF-8"?>
<CommonBaseEvents xmlns="http://www.ibm.com/AC/commonbaseevent1_0_1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/AC/commonbaseevent1_0_1 commonbaseevent1_0_1.xsd">
<CommonBaseEvent creationTime="2009-07-10T16:46:19.007000-07:00" extensionName="CBECommonBaseEvent" globalInstanceId="CEA1DE700FE6279E34CF32E43733313566" msg="CHFW0019I: The Transport Channel Service has started chain chain_0." elapsedTime="0" priority="0" repeatCount="0" sequenceNumber="0" severity="10" version="1.0.1">
<extendedDataElements name="category" type="string">
<values>AUDIT</values>
</extendedDataElements>
<sourceComponentId component="IBM WebSphere Application Server Platform 6.0 [ND 6.0.2.25 cf250801.02] [XD 6.0.2.1 cf10726.21183]" componentIdType="ProductName" location="10.21.127.197" locationType="IPV4" processId="13381" subComponent="WSChannelFram" threadId="00000014" componentType="WebSphereApplicationServer"/>
<msgDataElement msgLocale="en-US">
<msgId>CHFW0019I</msgId>
<msgIdType>IBM4.4.1</msgIdType>
</msgDataElement>
<situation categoryName="ReportSituation">
<situationType xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ReportSituation" reasoningScope="INTERNAL" reportCategory="LOG"/>
</situation>
</CommonBaseEvent>
</CommonBaseEvents>

What is Log Analyzer

Determining the root cause of a problem in a system that consists of a collection of products can be difficult. All products produce problem determination data, such as trace records, log records and messages. However, the problem determination data cannot be easily correlated across different products and products on different servers. Each product's problem determination data can only provide a view through a small window into the overall system problem. Timestamps are not sufficient: they are not granular enough and often clocks are not often sufficiently synchronized between servers. All of these problems make the job of problem isolation (that is, determining which server, which product, and what the root cause of the problem was) very difficult, and this complexity increases with the complexity and size of a system.

The Log Analyzer, which enables you to import various log files as well as symptom catalogs against which log files can be analyzed, decreases this complexity. The core issue in problem isolation in today's solutions is that problem determination data between products is not correlated, that is, you cannot easily determine the relationship of events captured by one product with the events captured by another. The Log Analyzer addresses this problem by now allowing you to import and analyze log files from multiple products, as well as to determine the relationship between the events captured by these products (correlation).

You have two different options for Log Analyzer
1) WebSphere Application Server Toolkit: You can switch to Profiling and Monitoring view in WebSphere Application Server Toolkit to access the Log Analyzer



2) The IBM Support Assistance also has Log Analyzer.

Enabling trace using WSAdmin Script

You can use the following WSAdmin script to enable Runtime and Server startup time trace

Runtime trace




def enableRunTimeTrace(serverName, traceString):
ts = AdminControl.queryNames("type=TraceService,process="+serverName+",*")
AdminControl.setAttribute(ts,"traceSpecification",traceString)

enableRunTimeTrace("server1","com.ibm.websphere.*=finest")


Configuration/ Server Startup Time trace




def enableConfigurationTimeTrace(serverId, traceString):
server = AdminConfig.getid('/Cell:sunpatil-wxp02Node01Cell/Node:sunpatil-wxp02Node01/Server:server1/')
print server
ts = AdminConfig.list("TraceService",server)
print ts
AdminConfig.modify(ts,[['startupTraceSpecification',traceString]])
AdminConfig.save()

enableConfigurationTimeTrace("server1","com.ibm.websphere.*=finest")

Startup time vs. Run time trace


After enabling trace, you will have to decide the java packages for which you want to turn the trace on. You can do that from WAS Admin Console by going to TroubleShooting -< Logging and Tracing -< servername. Then on the Logging and Tracing page click on Change Log Details Level



For changing Log Detail level, you can change them either on the COnfiguration tab or on the Runtime tab


  • Configuration: The log details level that you set here will be persisted in the server.xml file and the changes that you make wont take effect until you restart the server. Also since the changes are persisted in server.xml they will be effective across server restart unless you change them

  • Runtime: The Changes that you make on the Runtime tab take effect immediately but these changes wont be saved to server.xml configuration file instead they will be kept in memory, as a result your changes would be lost when you restart the server

Trace Output Format

On the Diagnostic Trace Services page there is Trace Format select box that lets you decide what format should be used for generating trace.



As you can see you have three options for what format you want to choose

  1. Basic: The events displayed in the basic format use the following format

    <timestamp><threadId><shortName><eventType>[className][methodName]<textmessage>
    [parameter 1]
    [parameter 2]

    This is sample output generated for basic format

    [7/5/09 21:04:13:984 PDT] 00000025 HttpRequestMe 3 setMethod(v): GET
    [7/5/09 21:04:13:984 PDT] 00000025 HttpRequestMe 3 setRequestURL input [/ibm/console/configSpecDetail.do?EditAction=true&perspective=tab.runtime]
    [7/5/09 21:04:13:984 PDT] 00000025 HttpRequestMe 3 setQueryString(byte[]): set query to [EditAction=true&perspective=tab.runtime]
    [7/5/09 21:04:13:984 PDT] 00000025 HttpRequestMe 3 setRequestURL: set URI to /ibm/console/configSpecDetail.do
    [7/5/09 21:04:13:984 PDT] 00000025 HttpBaseMessa 3 Called setVersion(b): HTTP/1.1
    values

    The Basic format is recommended format that makes it easier to read the trace message. When your submitting trace to IBM Support team you should use Basic format.

  2. Advanced: The Advanced format trace uses little more complex format than the basic format. This is the format used for Advanced format

    <timestamp><threadId><eventType><UOW><source=longName>[className][methodName]
    <Organization><Product><Component>[thread=threadName]
    <textMessage>[parameter 1=parameterValue][parameter 2=parameterValue]

    These are couple of sample trace messages generated for the advanced format

    [7/5/09 21:50:49:031 PDT] 00000029 I UOW=null source=com.ibm.ws.webcontainer.servlet.ServletWrapper org=IBM prod=WebSphere component=Application Server thread=[WebContainer : 1]
    SRVE0242I: [ConnectionLeakEAR] [/connleak] [FormPostServlet]: Initialization successful.
    [7/5/09 21:50:49:046 PDT] 00000029 O UOW= source=SystemOut org=IBM prod=WebSphere component=Application Server thread=[WebContainer : 1]
    Inside FormPostServlet.doPost() Exit


  3. Log Analyzer: If you select the Log Analyzer format then it will generate the trace in the same form as that generated by showlog tool, when it converts the binary activity.log file to text format.


    ---------------------------------------------------------------
    ComponentId: Application Server
    ProcessId: 5368
    ThreadId: 00000029
    ThreadName: WebContainer : 3
    SourceId: com.ibm.ws.http.channel.impl.HttpResponseMessageImpl
    ClassName:
    MethodName:
    Manufacturer: IBM
    Product: WebSphere
    Version: Platform 6.1 [BASE 6.1.0.25 cf250922.06]
    ServerName: sunpatil-wxp02Node01Cell\sunpatil-wxp02Node01\server1
    TimeStamp: 2009-07-05 21:11:24.062000005
    UnitOfWork:
    Severity: 3
    Category: FINEST
    PrimaryMessage: setReasonPhrase(byte[]): set to [OK]
    ExtendedMessage:
    ---------------------------------------------------------------
    ComponentId: Application Server
    ProcessId: 5368
    ThreadId: 00000029
    ThreadName: WebContainer : 3
    SourceId: com.ibm.ws.webcontainer.servlet.ServletWrapper
    ClassName:
    MethodName:
    Manufacturer: IBM
    Product: WebSphere
    Version: Platform 6.1 [BASE 6.1.0.25 cf250922.06]
    ServerName: sunpatil-wxp02Node01Cell\sunpatil-wxp02Node01\server1
    TimeStamp: 2009-07-05 21:11:24.078000000
    UnitOfWork:
    Severity: 3
    Category: INFO
    PrimaryMessage: SRVE0242I: [ConnectionLeakEAR] [/connleak] [FormPostServlet]: Initialization successful.
    ExtendedMessage:
    ---------------------------------------------------------------




Please note that you can read trace in any of these formats using Log ANalyzer tool.

Basic and Advanced Formats use many of the same fields and formatting techniques. The fields that can be used in these formats include:

  • TimeStamp: The timestamp is formatted using the locale of the process where it is formatted. It includes a fully qualified date (YYMMDD), 24 hour time with millisecond precision and the time zone.

  • ThreadId: An 8 character hexadecimal value generated from the hash code of the thread that issued the trace event.

  • ThreadName: The name of the Java thread that issued the message or trace event.

  • ShortName :The abbreviated name of the logging component that issued the trace event. This is typically the class name for WebSphere Application Server internal components, but may be some other identifier for user applications.

  • LongName: The full name of the logging component that issued the trace event. This is typically the fully qualified class name for WebSphere Application Server internal components, but may be some other identifier for user applications.

  • EventType: A one character field that indicates the type of the trace event. Trace types are in lower case. Possible values include:

    >
    a trace entry of type method entry.
    <
    a trace entry of type method exit.
    1
    a trace entry of type fine or event.
    2
    a trace entry of type finer.
    3
    a trace entry of type finest, debug or dump.
    Z
    a placeholder to indicate that the trace type was not recognized.

  • ClassName: The class that issued the message or trace event.

  • MethodName:The method that issued the message or trace event.

  • Organization:The organization that owns the application that issued the message or trace event.

  • Product:The product that issued the message or trace event.

  • Component: The component within the product that issued the message or trace event.

What is diagnostic trace

WebSphere Application server components as well as most of the enterprise level applications use Logging framework to generate detailed trace for execution. By default this trace is disabled because it causes performance overhead but you can turn the trace on for either your application or the particular websphere component when you want to debug issue.

When you open a support request with IBM support, most of the time they will ask you to enable the trace for particular area and send the generated trace to them for further analysis of the problem.

You can turn the trace on using the WAS Admin Console GUI by going to TroubleShooting -> Logs and Trace -< Servrname. On the Logging and Tracing screen, select the Diagnostic trace.



On the Diagnostic Trace Services screen you can enable the trace by checking Enable Log checkbox.



On this screen you can also define where the output of the trace should go, you have following two options


  • Memory Buffer: If you select the Memory buffer option then your messages wont be written to any file instead they will be kept in memory. But in order for you to view the messages you will have to dump the memory in file and then view it. This is not recommended option

  • File: Once you decide to send the log messages to the file system it allows you to configure following properties

    • Maximum File Size: What should be maximum size of the trace file. Once the trace file size reaches this limit, the trace.log file would be renamed to trace+timestamp.log file and a new trace.log file would be created

    • Maximum Number of Historical Files: What is the maximum no. of historical files that should be preserved. When the trace.log reaches maximum first the WAS server would check if no. of historical files is reached if yes delete the oldest trace file.

    • File Name: Fully qualified path name where the trace file should be generated





Whatever changes you make on Configuration tab are persisted in server.xml and they require server restart for those changes to take effect. But if you want to change trace file location for running server then you can make those changes on the Runtime tab.

Generating thread dump/java core using WSAdmin script

You can generate Thread Dump or Java core manually using the following WSAdmin Script

def generateThreadDump(serverName):
serverJVM = AdminControl.queryNames("type=JVM,process="+serverName+",*")
AdminControl.invoke(serverJVM,"dumpThreads")

generateThreadDump("server1")


The generateThreadDump() method takes server name and generates thread dump for that server. Once the thread dump is generated you can find out the location of the thread dump from native_stderr.log file. This is sample of the messages from my native_stderr.log


JVMDUMP007I JVM Requesting Java Dump using 'C:\Cert\WebSphere\AppServer\profiles\AppSrv01\javacore.20090705.212408.5480.0001.txt'
JVMDUMP010I Java Dump written to C:\Cert\WebSphere\AppServer\profiles\AppSrv01\javacore.20090705.212408.5480.0001.txt

Manually generating heap dump

You can use the WSAdmin script to generate heapdump for the JVM. Use this Jython script to do that

def generateHeapDump(serverName):
serverJVM = AdminControl.queryNames("type=JVM,process="+serverName+",*")
print serverJVM
AdminControl.invoke(serverJVM,"generateHeapDump")


generateHeapDump("server1")


Just call the generateHeapDump() function with name of the server that you want to generate heap dump for.

Once this script is executed open the native_stderr.log file to find out the location of heap dump file. In my case i see these lines in the native_stderr.log file

JVMDUMP007I JVM Requesting Heap Dump using 'C:\Cert\WebSphere\AppServer\profiles\AppSrv01\heapdump.20090705.140924.8660.0001.phd'
JVMDUMP010I Heap Dump written to C:\Cert\WebSphere\AppServer\profiles\AppSrv01\heapdump.20090705.140924.8660.0001.phd


You can open the file name ending with .phd in the HeapAnalyzer tool

IBM HeapAnalyzer

If you get OutOfMemory exception or your server is performing slow and the JVM Memory utilization is going up continously then you might want to take a look at JVM Memory Heap. THis is two step process first you should generate heap dump on your server and then analyze what all objects are there on the heap and if there is a memory leak

The IBM HeapAnalyzer tool lets you analyze the heap dump generated by WebSphere Application Server. This tool is shipped with IBM Support Assistant or you can download it from the Alpha works site. I tried using the HeapAnalyzer that is part of the ISA 4.1 but it kept crashing so i had to download it from Alpha Works Site

I wanted to learn how to use Heap Analyzer Tool to identify Memory Leaks, so i started by creating a MemoryLeakServlet and then i used the JMeter tool to generate load on this server. The load test ran for few minutes and then WAS started throwing OutOfMemory exception. Please note that when the WAS server throws OutOfMemory exception it generates Heap Dump.

Since i am using standalone version of Heap ANalyzer i had to start using the command line like this


The HeapANalyzer tool is very memory intensive so you should not run it on production and run it on a machine which has lot of memory.

Once the tool is opend select the .hpd file by clicking on Open and then selecting the heap dump file like this


The tool will take few minutes to analyze the heap data and then it will generate view like this.




As you can see the Reference Tree section is showing the objects on the heap in tree format. Now if you click on Subponea Leak Suspect button you will see that there is one suspected memory leak so select the button and it will take you the object which is suspected to have the memory leak.



As you can see it is showing the ArrayList object which is inside MemoryLeakServlet as memory leak.

IBM Pattern modeling and Analysis tool for Java

The IBM Pattern modeling and Analysis tool(PMAT) for Java lets analyze the data from garbage collection and in case of problems such as heap exhaustion (In case of OutOfMemory error) it also gives you recommendations. Please note one important point that the GC analysis tool alone cant help you locate the memory leak and fix it. You will have to use IBM Heap analyzer for that.

The PMAT tool is part of IBM Support Assistant 4.1, you can launch it by going to Launch Activities -< Analyze Problems. Then in the Tools Catalog select the IBM Pattern Modeling and Analysis tool and click on launch



It will ask you for the location of native_stderr.log file for your server. Select the file and click on Ok


Once opened it will show you the summary of gc analysis like this



  • File name : Location and file name of verbosegc trace

  • Number of verboseGC cycles : Number of JVM restart

  • Number of Garbage Collections : GC frequency

  • Number of Allocation failures : AF frequency

  • First Garbage Collection : Timestamp of the first GC

  • Last Garbage Collection : Timestamp of the last GC

  • Number of Java heap exhaustion : Number of OutOfMemoryError

  • Maximum AF overhead : Ratio of time spent in AF and time between AFs

  • Number of 100% overhead : Number of AF overhead 100%

  • Maximum size of Large Object Request : The largest object request and timestamp

  • Number of Large Object Requests : Number of object request (>900KB)

  • List of Java heap failure : Timestamp, Requested Java heap size,Type of failure and available Java heap size.



During the first few minutes of executing memory leak servlet. It did not throw OutOfMemory exception so in the recommendation it says that no memory exhaustion but there seems to be increase in java heap size, this can be taken as early warning sign of memory leak in the application.

After few minutes when the memory was exhausted and was started throwing OutOfMemory exception i looked at the gc analysis again in the PMAT tool and this is what i see


As you can see there are three errors related to java heap exhaustion that means the GC not able to free up memory and heap is full. One other thing is that whenever there is OutOfMemory error was would generate heap dump automatically. You can get location of the heapdump file from native_stderr.log file.


You can also look at the graphical memory analysis of the data by right clicking on the File and clicking Graphical view all


It will show you the graph of memory like this


As you can see the memory usage went really up some time after 12:33:35 and the red line shows the memory used and blue line show the heap freed after gc. As you can see red line is at near about 100 % and blue line at 0% that means memory is used and gc cant free up space.The vertical black dotted line where blue and red line terminate shows server restart.

Memory Leak Servlet

I wanted to learn how to identify the Memory Leaks in WAS and how to analyze the problem using
1) The IBM Pattern modeling and Analysis tool for Java
2) IBM HeapAnalyzer tool

So i started by creating a sample MemoryLeakServlet like this.

public class MemoryLeakServlet extends HttpServlet {
private static final long serialVersionUID = 1L;
private static ArrayList st = new ArrayList();

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
ArrayList st1 = new ArrayList();
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
st1.add("This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string This is a big string ");
System.out.println("Inside MemoryLeakServlet.doGet() " + st.size());

st.add(st1);
response.setContentType("text/html");
response.getWriter().println("Hello from memory leak servlet");
}

}


As you can see the MemoryLeakServlet has one static ArrayList st and i keep adding data to it for every HTTP GET request.

Once that was done i used JMEter to create a simple performance test which kept making HTTP GET requests to MemoryLeakServlet until i got OutOfMemory leak error and during that time i kept looking at native_stderr.log file using PAM tool to see what type of data i see

Verbose garbage collection

When you turn the verbose garbage collection on, WAS will start printing garbage collection related information in your native_stderr.log file. You can turn the vebose garbage collection on using WAS Admin Console by going to Application servers < servername servers < Process Definition servers < Java Virtual Machine



On this screen check verbose garbage collection checkbox and restart the server for your changes to take effect. Once you enable the verbose garbage collection on the WAS server will start writing messages into native_stderr.log file every time it executes garbage collection.

These are couple of entries from my native_stderr.log file.


<af type="tenured" id="49" timestamp="Jul 05 13:13:17 2009" intervalms="32.872">
<minimum requested_bytes="16776" />
<time exclusiveaccessms="0.044" />
<tenured freebytes="3624584" totalbytes="116873216" percent="3" >
<soa freebytes="3624584" totalbytes="116873216" percent="3" />
<loa freebytes="0" totalbytes="0" percent="0" />
</tenured>
<gc type="global" id="49" totalid="49" intervalms="34.359">
<refs_cleared soft="0" threshold="32" weak="0" phantom="0" />
<finalization objectsqueued="0" />
<timesms mark="107.399" sweep="1.583" compact="0.000" total="109.117" />
<tenured freebytes="43894704" totalbytes="116873216" percent="37" >
<soa freebytes="43894704" totalbytes="116873216" percent="37" />
<loa freebytes="0" totalbytes="0" percent="0" />
</tenured>
</gc>
<tenured freebytes="43877928" totalbytes="116873216" percent="37" >
<soa freebytes="43877928" totalbytes="116873216" percent="37" />
<loa freebytes="0" totalbytes="0" percent="0" />
</tenured>
<time totalms="110.648" />
</af>

<af type="tenured" id="50" timestamp="Jul 05 13:13:18 2009" intervalms="232.708">
<minimum requested_bytes="32" />
<time exclusiveaccessms="0.031" />
<tenured freebytes="0" totalbytes="116873216" percent="0" >
<soa freebytes="0" totalbytes="116873216" percent="0" />
<loa freebytes="0" totalbytes="0" percent="0" />
</tenured>
<gc type="global" id="50" totalid="50" intervalms="234.140">
<classloadersunloaded count="13" timetakenms="48.694" />
<expansion type="tenured" amount="19868672" newsize="136741888" timetaken="0.152" reason="excessive time being spent in gc" gctimepercent="49" />
<refs_cleared soft="0" threshold="32" weak="3" phantom="0" />
<finalization objectsqueued="0" />
<timesms mark="118.982" sweep="2.639" compact="0.000" total="170.899" />
<tenured freebytes="55392816" totalbytes="136741888" percent="40" >
<soa freebytes="55392816" totalbytes="136741888" percent="40" />
<loa freebytes="0" totalbytes="0" percent="0" />
</tenured>
</gc>
<tenured freebytes="55392176" totalbytes="136741888" percent="40" >
<soa freebytes="55392176" totalbytes="136741888" percent="40" />
<loa freebytes="0" totalbytes="0" percent="0" />
</tenured>
<time totalms="172.525" />
</af>


The way IBM JDK works is if it is not able to allocate a memory then it will execute garbage collection to free up the memory. The J9 VM used in WAS 6.1 generates one <af> element every time a garbage collection works.

The af element has following elements

  • type:

  • id: The id represents how many times the gc was executed

  • intervalms: The time in ms since last time gc was executed

  • timestamp: time of gc



The minimum represents the number of bytes that were requested and JVM couldnot allocate them so it had to trigger garbage collection cycle.

The af element has 3 main child elements first tenured element has data about the tenured memory position before gc then gc element represents data about what happened during gc, such as time spent in mark, sweep and compact phases, The second tenured element represents the position of tenured memory after gc.


The IBM Support assistance has IBM Pattern modeling and Analysis tool for Java Garbage collection tool that can be used to analyze the garbage collection.

Removing WAS Service

When i installed WebSphere Application Server 6.1 it did create a Windows Service on my machine (I forgot to uncheck the checkbox during installation). As a result whenever i start my machine it used to start the service, and i had to mark its startup type to Manual.

But other disadvantage of service was that whenver i tried executing startServer.sh command it was trying to start the service


So i decided to remove the service from my machine. I tried executing WASService -remove servicename command but took some time to figure out what is the service name.

you can find service name by following either of these options. Take a look at profiles\AppSrv01\logs folder there would be xxxx service.log file, in this xxxx is name of the service. In my mase name of the log file is sunpatil-wxp02Node01 Service.log so service name is sunpatil-wxp02Node01.

Other way to find out the service name is using Services tool. open the tool and find out the service starting with IBM WebSphere Application Server V6.1. The later part of the name is service name.


Once you know the service name you can remove it by executing WASService.exe -remove servicename

Process (native) logs

The process logs are created by redirecting the STDOUT and STDERR streams of the process to independent log files. Native code, including the Java virtual machine (JVM) itself, writes to these files. As a general rule, WebSphere Application Server does not write to these files. However, these logs can contain information relating to problems in native code or diagnostic information written by the JVM.

As with JVM logs, there is a set of process logs for each application server, since each JVM is an operating system process. For WebSphere Application Server Network Deployment configuration, a set of process logs is created for the deployment manager and each node agent.

The only configuration that is possible for the process logs is changing the directory location or file names for the logs. You can do this in the WAS Admin console by going to Troubleshooting -< Logs and Traces and clicking on process name. On the Logging and Tracing screen select Process Logs



Then on the Process Logs screen you can change the location of native_stdout.log or native_stderr.log file.



You can view the native_stderr.log and native_stdout.log file either using any text editor or you can view it using the WAS Admin Console (even for remote location) by going to the Runtime tab. Select the log file that you want to see

View JVM Logs using Log Analyzer

You can view JVM Logs in Log Analyzer tool, which is part of WebSphere Application Server Toolkit. The Log analyzer will format and display the log in easy to read format.


  • Start the WebSphere Application Server Toolkit. And once it is started switch to Logging and Performance perspective.

  • Right in the Log Navigator perspective and click on Import. In the Import dialog select Profiling and Logging -< Log Files


  • Click Next, on the next Import Log Files dialog click on Add button to get Add Log File dialog like this


  • On this dialog select type of log as IBM -< WebSphere Application Server -< IBM WebSphere Application Server System Out. In the details section select location of the SystemOut.log file as well as the rules that you want to apply

  • The Log analyzer tool will take couple of minutes to parse the log file and finally it will show a screen like this. As you can see it marked the error messages in red and warning messages in yellow


  • You can get more information about each of the message by right clicking on message and executing Analyze -> Run. It will take the message identifier for the message and try and find more information in the Symptomps database. And display that information about detail message, possible recommendation,.. on the next screen.



View JVM Logs using WAS Admin Console

If you dont have access to the file system of where server is installed Ex. you want to view SystemOut.logs for WAS on remote system, then you can use WAS Admin Console. Go to the Runtime tab for JVM Logs and you will get screen like this



On this screen select the log that you want to view by clicking on the View button and it would display the JVM log on next screen like this.