Measuring Java Object Sizes

Probably pure coincidence, but I stumbled upon this Estimating Java Object Sizes with Instrumentation blog. Given we just had released Ehcache 2.5 with Automatic Resource Control, aka ARC, I had to read through this. We do, amongst other, use an instance of java.lang.instrument.Instrumentation to measure object sizes ourselves. Yet, we found some shortcomings to that approach:

Getting a reference to a Instrumentation instance !

As the blog mentions, you need an agent to get to that instance. Yet it felt like imposing every Ehcache users wanting to use ARC to add a -javaagent: just wasn’t a great idea. Trying to work around this, it turns out Java 6 introduced the Attach API. Now we can try to attach to the VM while it’s running and load the agent. And when I say try, I do mean try! As this could fail… For all kind of reasons: we’re running JDK5, so no Attach API for us to use; or the attaching to the VM itself fails, this could be for different reasons again. One particularly weird one being due to a bug on OS X, when the java.io.tmpdir system property is set! And even if we get to attach and load the agent, we still need the Ehcache code to get a reference to that Instrumentation instance. The agent classes are being loaded by the system class loader, but the Ehcache classes aren’t necessarily and we might not get access to the system class loader directly. We don’t necessarily need to, but we try to avoid accessing another agent class instance, loaded by some other class loader. This would be generally not possible, as we hide the java agent jar within the Ehcache-core jar, so the classes it contains can’t be present multiple times…

What if we can’t access an Instrumentation instance ?

Ehcache’s sizeOf engine, as we named it, falls back to other mechanisms to size POJO. We’ve added two other methods, to which we fallback shouldn’t we be able to access the Instrumentation instance: The Unsafe and, finally, the Reflection based one. The UnsafeSizeOf will try to get a reference to the sun.misc.Unsafe#theUnsafe. Using that reference, we can now query for an Object’s last non-static field offset in memory using Unsafe.objectFieldOffset and do some math to calculate the object’s size in memory. I’ll come back later to the some math part… And finally, shouldn’t we be able to gain access to theUnsafe, we use reflection based sizing. This will measure all primitives and references within an object and sum the size these use in memory. Dr. Heinz Kabutz published more details on that approach in his Java Specialists Newsletter #78: MemoryCounter for Java 1.4 back in 2003.

Now that’s all very simple, isn’t it ?

Well… Sadly it isn’t. But luckily, we’ve mostly sorted it all out for you! We just were done with the agent based implementation (which didn’t auto attach yet), and started the testing. Obviously, since this calls into the VM’s internal, it would all magically figure it out and all. Well, no. CMS wasn’t properly accounted for. CMS needs a certain minimal amount of memory to store information when an object is garbage collected and it’s memory allocation is “freed”. That affects the minimal size an object will use on heap. And that was Hotspot only… We then moved on to test on JRockit that required some finer adjustments, but I won’t start with these here now. CMS, Compressed OOPS, minimum object size were just some of the things that we needed to account for in the some math to in the other implementations: pointer sizes (32 vs. 64 bit VMs), object alignment, field offset adjustment (on JRockit) and “object header” size. All these required us to gather all that information about the VM the sizing was happening in order to properly measure object sizes, even using the Instrumentation instance to measure.

Know what to measure !

As you could read in Heinz’s newsletter there is some objects you probably don’t want to account for. Especially while measuring the size a cached entries are using on heap. There are all the obvious static, classes and other “Flyweight type objects”. These can all automatically be discarded by the sizing engine. But some other times, you also don’t want every cached entry to account for a particular part of an object graph. Simply because every, for instance, every cached entry will reference that particular bit. Hibernate’s 2nd level cache is good example of that. For that particular example, we’ve added a “resource” file that describes fields and types to be discarded when measuring a cache entry’s size on heap. For application types though (ones not going into the cache through Hibernate, but applications using the Ehcache API directly), we’ve added the @IgnoreSizeOf annotation. Annotating a Field, a Type or even an entire package with it, will result in the sizing engine skipping that part of the graph (those types or the types in those packages respectively) while doing the sizing.

Try it now !

Ehcache 2.5 is out now and available for direct download or through maven central. It enables you to size your caches simply using values in bytes using Ehcache ARC, you can read more about cache sizing on the ehcache.org website.