One of the more annoying experiences for users with less than ideal memory configurations is garbage collection pauses, particularly major "stop-the-world" collections in which the JVM appears to hang for a period of time.
There are a number of settings that affect the way the JVM allocates memory and the behavior of garbage collection in J2SE 1.4.1. I have attempted to ascertain the impact of various settings on NetBeans.
What was tested
A script was run which would copy a given set of settings to NetBeans ide.cfg file, and then run NetBeans with the argument
-Dnetbeans.close=true which shuts the
IDE down after the main window appears. Each set of settings
was run 10 times and the variation determined (the bracket at
the top of each graph bar indicates the standard deviation).
These tests only asses the impact on NetBeans' startup performance. Attempts were made to run NetBeans using the qa-functional tests that drive NetBeans UI. After several runs, the Ant process (which is lightweight - all processing is done after the the tests have been completed, on log files generated by NetBeans) failed with OutOfMemoryExceptions on all platforms. Further investigation is needed.
These tests were run on four machines - a Dell machine with 384Mb RAM and an 800Mhz processor (Linux and Windows), a Sun Ultra 60 running Solaris - 450Mhz and 512Mb RAM, and a Sony VAIO picturebook running Linux with 128Mb RAM. Note that the results on the low end laptop are quite different than those elsewhere. Further testing on underpowered machines is indicated.
The IDE version tested was a checkout of the NetBeans trunk from shortly after 3.4 release.
What was measured and its reliability
A set of 34 different combinations of JVM settings were developed. A subset of them is presented in the reports, since those with extremely poor performance results were culled. Since it was not possible to test runtime performance during a sustained run of the IDE, all settings which produced a full garbage collection event during startup were culled from the results (with the exception of the set "baseline" which is NetBeans 3.4's out-of-the-box settings).
The following metrics were measured (all averaged across 8 runs of each set of settings):
- Seconds spent in GC - the total number of seconds that the process was doing garbage collection
- Number of GC events - the total number of garbage collection events during the run.
- Average garbage cycle duration - the average duration of each garbage collection cycle
- Session duration - The total time, from startup to shutdown of the session
Each of these metrics was graphed and the results charted and detailed here.
What numbers are good numbers?
Session duration is probably the least interesting metric, for two reasons: First, it has the highest standard deviation, across the board, of any of the numbers measured. This is in part because it logged by the Ant script driving the tests, which has its own inherent jitter due to garbage collection.
The primary goals of tuning garbage collection settings are to:
- Minimize intrusive pauses caused by GC cycles
- Distribute garbage collection work across time, such that cycles are of imperceptively short
- Eliminate cycles that involve intensive memory copying (such as growing the permanent area to accomodate more classes) by starting the JVM with appropriately sized memory areas for things of known size
- Reduce the frequency of major garbage collection events without increasing their duration. Being a GUI application, NetBeans produces a fairly large number of medium- and long-lived objects, often using weak references as the means of disposal. By tuning survivor ratio and threshold, we can raise the barrier to these objects ever getting tenured, so lightweight garbage collection cycles are shorter.
Probably the most interesting metric is average gc cycle duration. Keeping that number down means fewer user-perceptible pauses.
What information do we still need to make informed
Note that absent from these tests is any information on old-generation collection - without more memory-intensive tests of NetBeans, it is impossible to collect such information. Once a tool is available to drive NetBeans UI in memory-intensive activities such as code completion and popup-javadoc (without causing the ant process to run out of memory), the recommendations can be refined to cover what settings produce the best results for minimizing and distributing old-generation collections.
Summary of Results
- Windows - Dell Precision 220: 800Mhz, 384Mb RAM. JDK 1.4.1
- Linux - same machine as Windows, JDK 1.4.1
- Linux - Sony VAIO PictureBook, 400Mhz, 128Mb, JDK 1.4.1
- Solaris - Sun Ultra 60, 450Mhz, 512Mb RAM, JDK 1.4.1
All tests were done using the same build of the NetBeans IDE, from the trunk shortly after NetBeans 3.4 release.
Naturally, there is no magic bullet across diverse systems - even Linux and Windows on the same machine give slightly different results in terms of best performance (test 17 below repeatedly came in first on by session time on Linux and second on Windows, but produces an average garbage collection duration dramatically lower).
Across the board, one consistently useful setting is
-XX:PermSize=20M - this sets the permanent area
memory size (where classes are stored) on NetBeans' startup
and eliminates that area being grown during startup. Simulating
the promoteall modifier appears to help as well- using
-XXMaxTenuringThreshold=0 reduces garbage
collection cycles by causing promotion of objects directly
from the new area to the old area (without using the two
survivor areas), thus eliminating two memory copies. Whether
this has a deleterious effect on old generation gc's remains
to be seen - as long as this is accompanied by raising the
barrier to an object being copied to the old generation in
the first place, it should provide a performance improvement.
What are thus far indicated to be the most effective settings are listed below. Follow the links to read the generated report and resulting charts.
The following were the general results - follow the links below for details and charts. Note that in most cases, there was more than one set of settings that was nearly equally effective, with statistically differences in impact. The entry that produced good results most consistently across platforms was:
-XX:TargetSurvivorRatio=1 -Xverify:none -XX:SurvivorRatio=2 -XX:+UseParallelGC -XX:PermSize=20M -XX:MaxTenuringThreshold=0 -XX:MaxNewSize=32M -XX:NewSize=32M -Xmx96m -Xms96m
As noted below, the survivor ratio settings should be no-ops. Also, on a single processor machine, using the parallel garbage collector should be either a no-op or not produce an improvement, but the numbers see fairly persistent. The survivor ratio settings can be deleted - the result is an increase in how much the various measurements vary between runs. This could be an artifact of the testing process, though it is difficult to see how. See the anomolies section below for further discussion.
At the same time, there is not a huge difference between the numbers for a number of the configurations used.
So, discarding anaomalous results, the two most sensible candidates for recommended settings are:
-Xverify:none -XX:+UseParallelGC -XX:PermSize=20M -XX:MaxNewSize=32M -XX:NewSize=32M -Xmx96m -Xms96m(test17_nosurvivorflags) or
-XX:TargetSurvivorRatio=1 -Xverify:none -XX:SurvivorRatio=2 -XX:PermSize=20M -XX:MaxTenuringThreshold=0 -XX:MaxNewSize=32M -XX:NewSize=32M -Xmx96m -Xms96m(test28) - this avoids the possibly paradoxical advantage +UseParallelGC imparts, but produces average garbage minor collection times almost twice as long - 128Ms +/-3%.
Below are the full reports and graphs, generated by the testing infrastructure:
- The Windows report* - 800Mhz, 384Mb PC
- The Linux report - 800Mhz, 384Mb PC
- The low end Linux report - Sony VAIO PictureBook,
400Mhz Mobile PII, 128Mb
Here, the most effective results differed a little bit - some of the settings above seem to produce poorer results in 128Mb.
-XX:TargetSurvivorRatio=1 -Xverify:none -XX:PermSize=20M -Xmx96m -Xms96m. The results from test17 were not hugely worse.
- The Solaris report - 450Mhz, 512Mb Sun Ultra60
Testing is still needed to assess the impact of using these settings in real life - particularly regarding the impact on old generation collections. A set of settings that minimizes the duration of old generation collections is needed. So if the above settings produce their improvement by handing more work to the old generation collector, then they are not desirable.
Anomalies in the results
There were a few surprises in the results that are awaiting
explanation: All of the testing machines
were single-processor machines, but in most cases showed benefits
to running with the
-XX+UseParallelGC setting. This
garbage collector is described as being designed for gigabyte
heaps and multiple processors, which makes this result surprising.
Further, the Survivor Ratio and Target Survivor Ratio settings should be no-ops when using Parallel GC. Indeed, the resulting numbers are very similar. What is interesting is that when these numbers are not set or set to higher values (compare test31 and test17), the standard deviation for all of the numbers measured increases by about six times - the range of values dramatically increases. It is unclear why this should be the case.
Also note that the numbers for Windows may not be reliable:
There is a bug which only appears there, that when Windows is shut
down with the
-Dnetbeans.close=true flag, all of the
components in the main window dock themselves into new frames a
split second before the process exists. Since there is a real cost
to acquiring these frames from the operating system, and it does
not happen with all settings, this maybe impacting the results.