Performance impact of jvm settings
One of the more annoying experiences for users with less
than ideal memory configurations is garbage collection pauses,
particularly major "stop-the-world" collections in
which the JVM appears to hang for a period of time.
There are a number of settings that affect the way the
JVM allocates memory and the behavior of garbage collection
in J2SE 1.4.1. I have attempted to ascertain the impact of
various settings on NetBeans.
What was tested
A script was run which would copy a given set of settings
to NetBeans ide.cfg file, and then run NetBeans with the
argument -Dnetbeans.close=true which shuts the
IDE down after the main window appears. Each set of settings
was run 10 times and the variation determined (the bracket at
the top of each graph bar indicates the standard deviation).
These tests only asses the impact on NetBeans' startup performance.
Attempts were made to run NetBeans using the qa-functional tests
that drive NetBeans UI. After several runs, the Ant process
(which is lightweight - all processing is done after the the
tests have been completed, on log files generated by NetBeans)
failed with OutOfMemoryExceptions on all platforms. Further
investigation is needed.
These tests were run on four machines - a Dell machine with
384Mb RAM and an 800Mhz processor (Linux and Windows), a
Sun Ultra 60 running Solaris - 450Mhz and 512Mb RAM, and a
Sony VAIO picturebook running Linux with 128Mb RAM. Note that
the results on the low end laptop are quite different than
those elsewhere. Further testing on underpowered machines
is indicated.
The IDE version tested was a checkout of the NetBeans trunk
from shortly after 3.4 release.
What was measured and its reliability
A set of 34 different combinations of JVM settings were
developed. A subset of them is presented in the reports, since
those with extremely poor performance results were culled.
Since it was not possible to test runtime performance during
a sustained run of the IDE, all settings which produced a
full garbage collection event during startup were culled
from the results (with the exception of the set "baseline"
which is NetBeans 3.4's out-of-the-box settings).
The following metrics were measured (all averaged across 8 runs of
each set of settings):
- Seconds spent in GC - the total number of seconds that
the process was doing garbage collection
- Number of GC events - the total number of garbage collection
events during the run.
- Average garbage cycle duration - the average duration of
each garbage collection cycle
- Session duration - The total time, from startup to shutdown
of the session
Each of these metrics was graphed and the results charted
and detailed here.
What numbers are good numbers?
Session duration is probably the least interesting metric,
for two reasons: First, it has the highest standard deviation,
across the board, of any of the numbers measured. This is
in part because it logged by the Ant script driving the tests,
which has its own inherent jitter due to garbage collection.
The primary goals of tuning garbage collection settings are
to:
- Minimize intrusive pauses caused by GC cycles
- Distribute garbage collection work across time, such that
cycles are of imperceptively short
- Eliminate cycles that involve intensive memory copying
(such as growing the permanent area to accomodate more classes)
by starting the JVM with appropriately sized memory areas for
things of known size
- Reduce the frequency of major garbage collection events
without increasing their duration. Being a GUI application,
NetBeans produces a fairly large number of medium- and long-lived
objects, often using weak references as the means of disposal.
By tuning survivor ratio and threshold, we can raise the
barrier to these objects ever getting tenured, so lightweight
garbage collection cycles are shorter.
Probably the most interesting metric is average gc cycle
duration. Keeping that number down means fewer user-perceptible
pauses.
What information do we still need to make informed
recommendations?
Note that absent from these tests is any information on
old-generation collection - without more memory-intensive tests
of NetBeans, it is impossible to collect such information. Once
a tool is available to drive NetBeans UI in memory-intensive
activities such as code completion and popup-javadoc (without causing
the ant process to run out of memory), the
recommendations can be refined to cover what settings produce the
best results for minimizing and distributing old-generation collections.
Summary of Results
- Windows -
Dell Precision 220: 800Mhz, 384Mb RAM. JDK 1.4.1
- Linux - same machine as Windows, JDK 1.4.1
- Linux - Sony VAIO PictureBook, 400Mhz, 128Mb, JDK 1.4.1
- Solaris - Sun Ultra 60, 450Mhz, 512Mb RAM, JDK 1.4.1
All tests were done using the same build of the NetBeans IDE,
from the trunk shortly after NetBeans 3.4 release.
Naturally, there is no magic bullet across diverse systems -
even Linux and Windows on the same machine give slightly different
results in terms of best performance (test 17 below repeatedly
came in first on by session time on Linux and second on Windows,
but produces an average garbage collection duration dramatically
lower).
Across the board, one consistently useful setting is
-XX:PermSize=20M - this sets the permanent area
memory size (where classes are stored) on NetBeans' startup
and eliminates that area being grown during startup. Simulating
the promoteall modifier appears to help as well- using
-XXMaxTenuringThreshold=0 reduces garbage
collection cycles by causing promotion of objects directly
from the new area to the old area (without using the two
survivor areas), thus eliminating two memory copies. Whether
this has a deleterious effect on old generation gc's remains
to be seen - as long as this is accompanied by raising the
barrier to an object being copied to the old generation in
the first place, it should provide a performance improvement.
What are thus far indicated to be the most effective settings
are listed below. Follow the links to read the generated report
and resulting charts.
Detailed results
The following were the general results - follow the links below for
details and charts. Note that in most cases, there was more than
one set of settings that was nearly equally effective, with
statistically differences in impact. The entry that produced good
results most consistently across platforms was:
- test17:
-XX:TargetSurvivorRatio=1 -Xverify:none -XX:SurvivorRatio=2
-XX:+UseParallelGC -XX:PermSize=20M -XX:MaxTenuringThreshold=0 -XX:MaxNewSize=32M
-XX:NewSize=32M -Xmx96m -Xms96m
As noted below, the survivor ratio settings should be no-ops. Also,
on a single processor machine, using the parallel garbage collector
should be either a no-op or not produce an improvement, but the
numbers see fairly persistent.
The survivor ratio settings can be deleted - the result is an increase
in how much the various measurements vary between runs. This could be
an artifact of the testing process, though it is difficult to see how.
See the anomolies section below for further discussion.
At the same time, there is not a huge difference between the numbers
for a number of the configurations used.
So, discarding anaomalous results, the two most sensible candidates for recommended
settings are:
-Xverify:none -XX:+UseParallelGC -XX:PermSize=20M -XX:MaxNewSize=32M
-XX:NewSize=32M -Xmx96m -Xms96m (test17_nosurvivorflags) or
-XX:TargetSurvivorRatio=1 -Xverify:none -XX:SurvivorRatio=2
-XX:PermSize=20M -XX:MaxTenuringThreshold=0 -XX:MaxNewSize=32M -XX:NewSize=32M
-Xmx96m -Xms96m (test28) - this avoids the possibly paradoxical
advantage +UseParallelGC imparts, but produces average garbage minor collection
times almost twice as long - 128Ms +/-3%.
Below are the full reports and graphs, generated by the testing
infrastructure:
Open questions
Testing is still needed to assess the impact of using these
settings in real life - particularly regarding the impact on
old generation collections. A set of settings that minimizes the
duration of old generation collections is needed. So if the above
settings produce their improvement by handing more work to the
old generation collector, then they are not desirable.
Anomalies in the results
There were a few surprises in the results that are awaiting
explanation: All of the testing machines
were single-processor machines, but in most cases showed benefits
to running with the -XX+UseParallelGC setting. This
garbage collector is described as being designed for gigabyte
heaps and multiple processors, which makes this result surprising.
Further, the Survivor Ratio and Target Survivor Ratio settings
should be no-ops when using Parallel GC. Indeed, the resulting
numbers are very similar. What is interesting is that when these
numbers are not set or set to higher values (compare test31 and
test17), the standard deviation for all of the numbers
measured increases by about six times - the range of values
dramatically increases. It is unclear why this should be the
case.
Also note that the numbers for Windows may not be reliable:
There is a bug which only appears there, that when Windows is shut
down with the -Dnetbeans.close=true flag, all of the
components in the main window dock themselves into new frames a
split second before the process exists. Since there is a real cost
to acquiring these frames from the operating system, and it does
not happen with all settings, this maybe impacting the results.