Several performance optimizations, primarily aimed at reducing garbage object creation in common cases #258

JoshRosen · 2025-01-04T02:58:21Z

This PR bundles together several small optimizations, most aimed at reducing garbage object creation. Collectively, these changes result in a large performance improvement for some of our largest jsonnet inputs.

Optimizations

These are somewhat coherently split into multiple smaller intermediate commits, which can help for navigating this change. I'll describe each optimization in more detail below.

Optimizing `Obj` key lookup methods for objects without `super`

The hasKeys, containsKey, containsVisibleKey, allKeyNames, and visibleKeyNames methods can be optimized in the common case of objects that don't have super objects.

We already perform an optimization for static objects, pre-populating allKeys, but for non-static objects we had to run gatherAllKeys to populate a LinkedHashMap of keys and a boolean indicating visibility.

If a non-static object has no super then we can just use value0 to compute the keys: this avoids an additional LinkedHashMap allocation and also lets us pre-size the resulting string arrays, avoiding wasteful array copies from resizes or trims.

In visibleKeyNames, I chose to pre-size the output builder based on the total key count: this is based a common-case assumption that most objects don't have hidden keys.

This optimization makes a huge difference in std.foldl(std.megePatch, listOfPatches, {}), where the intermediate merge targets' visible are repeatedly recomputed. In these cases, the intermediate objects contain only visible keys, allowing this optimization to maximally avoid unnecessary array allocations.

Pre-size various hash maps

This builds on an idea from #197 : there are multiple places where we construct hashmaps that are either over- or under-sized: an over-sized map wastes space (I saw >90% of backing array slots wasted in some heap dumps) and an under-sized map wastes time and space in re-sizing upwards during construction.

Here, I've generalized that PR's pre-sizing to apply in more contexts.

One notable special case is the valueCache: if an object inherits fields then it's not free to determine the map size. As a result, I've only special-sized this for super-free objects. This map is a little bit different from value0 or allFields because its final size is a function of whether or not object field values are actually computed. Given this, I've chosen to start pretty conservatively by avoiding changing the size in cases where it's not an obvious win; I may revisit this further in a future followup.

Change `valueCache` from a Scala map to a Java map

This was originally necessary because the Scala 2.12 version of mutable.HashMap does not support capacity / load factor configuration, which got in the way with the pre-sizing work described above.

But a nice secondary benefit is that Java maps let me avoid closure / anonfun allocation in map.getOrElse(k, default) calls: even if we don't invoke default, we still end up doing some allocations for the lambda / closure / thunk. I had noticed this overhead previously in Obj.value and this optimization should fix it.

Remove most Scala sugar in `std.mergePatch`, plus other optimizations

The recMerge and recSingle methods used by std.mergePatch contained big Scala for comprehensions and used Option for handling nulls. This improves readability but comes at a surprising performance cost. I would have naively assumed that most of those overheads would have been optimized out but empirically this was not the case in my benchmarks.

Here, I've rewritten this with Java-style imperative while loops and explicit null checks.

Optimize `std.mergePatch`'s distinct key computation

After fixing other bottlenecks, I noticed that the

val keys: Array[String] = (l.visibleKeyNames ++ r.visibleKeyNames).distinct

step in std.mergePatch was very expensive. Under the hood, this constructs a combined array, allocates an ArrayBuilder, and uses an intermediate HashSet for detecting already-seen keys.

Here, I've added an optimized fast implementation for the cases where r.visibleKeyNames.length < 8. I think it's much more common for the LHS of a merge to be large and the RHS to be small, in which case we're conceptually better off by building a hash set on the RHS and removing RHS elements as they're seen during the LHS traversal. But if the RHS is small enough then the cost of hashing and probing will be higher than a simple linear scan of a small RHS array.

Here, 8 is a somewhat arbitrarily chosen threshold based on some local microbenchmarking.

Special overload of `Val.Obj.mk` to skip an array copy

Pretty self-explanatory: we often have an Array[(String, Val.Member)] and we can avoid a copy by defining a Val.Obj.mk overload which accepts the array directly.

Make `PrettyNamed` implicits into constants

This is pretty straightforward, just changing a def to a val, but it makes a huge different in reducing ephemeral garabge in some parts of the evaluator.

Other changes

I also added Error.fail calls in a couple of case _ => matches which should never be hit. We weren't actually hitting these, but it felt like a potentially dangerous pitfall to silently ignore those cases.

He-Pin · 2025-01-04T03:08:01Z

sjsonnet/src/sjsonnet/Evaluator.scala

+      val valueCache = if (sup == null) {
+        Val.Obj.getEmptyValueCacheForObjWithoutSuper(builder.size())
+      } else {
+        new java.util.HashMap[Any, Val]()


better with a size

See the PR description and the Scaladoc of getEmptyValueCacheForObjWithoutSuper for more details, but this is deliberate:

Because this object has a super object, computing an upper bound on the field count is not free. In this PR I'm aiming to be conservative and only down-size in cases where we have a cheap tight bound.

Also, an object might have a large number of fields but many of them might not end up being computed in a particular evaluation or materialization. In those cases, we want to avoid sizing the map larger than the previous default.

He-Pin · 2025-01-04T03:09:18Z

sjsonnet/src/sjsonnet/Val.scala

-    val cache = mutable.HashMap.empty[Any, Val]
-    val allKeys = new util.LinkedHashMap[String, java.lang.Boolean](capacity, hashMapDefaultLoadFactor)
+    val cache = Util.preSizedJavaHashMap[Any, Val](fields.length)
+    val allKeys = Util.preSizedJavaLinkedHashMap[String, java.lang.Boolean](fields.length)


the capacity calculation is gone now, how about extracting to a helper method otherwise the map will resize.

The calculation was moved into Util.preSizedJavaLinkedHashMap, which is defined as

def preSizedJavaLinkedHashMap[K, V](expectedElems: Int): java.util.LinkedHashMap[K, V] = { // Set the initial capacity to the number of elems divided by the default load factor + 1 // this ensures that we can fill up the map to the total number of fields without resizing. // From JavaDoc - true for both Scala & Java HashMaps val hashMapDefaultLoadFactor = 0.75f val capacity = (expectedElems / hashMapDefaultLoadFactor).toInt + 1 new java.util.LinkedHashMap[K, V](capacity, hashMapDefaultLoadFactor) }

JoshRosen · 2025-01-04T03:13:41Z

sjsonnet/src/sjsonnet/Std.scala

+            val rValue = if (r.containsVisibleKey(key)) r.valueRaw(key, r, pos)(ev) else null
+            if (rValue == null || !rValue.isInstanceOf[Val.Null]) {
+              if (lValue != null && rValue == null) {
+                kvs(kvsIdx) = (key, new Val.Obj.ConstMember(false, Visibility.Normal, lValue))


This change to use ConstMember for the case where we have an already-computed value which doesn't need cleaning (which must be done lazily) ends up reducing allocations because we don't need to generate a lambda / thunk and capture state from the closure.

I'm considering adding a branch in the createMember(recSingle(rValue)) path (both here and in recSingle) to do a similar optimization for leaf non-Obj values that come in from the RHS.

JoshRosen · 2025-01-04T03:19:07Z

sjsonnet/src/sjsonnet/Std.scala


        case _ => v
      }
+      def distinctKeys(lKeys: Array[String], rKeys: Array[String]): Array[String] = {


I plan to add code comments here and throughout the PR before it merges, BTW. I'll revisit this with fresh eyes tomorrow.

JoshRosen · 2025-01-04T03:27:36Z

sjsonnet/src/sjsonnet/Val.scala

@@ -221,37 +244,65 @@ object Val{
      allKeys
    }

-    @inline def hasKeys = !getAllKeys.isEmpty
+    @inline def hasKeys: Boolean = {
+      val m = if (static || `super` != null) getAllKeys else value0


Notably, I don't call getValue0 here or in the other new branches added below: it turns out that value0 is guaranteed to be defined for non-static objects, so getValue0 is only effective for static objects.

When I started working on sjsonnet, I found the invariants on some of the Obj internals to be somewhat confusing. I'm considering adding a big block comment at the top of this class to document my current understanding. I think we especially need documentation for the invariants / constraints on static objects.

He-Pin · 2025-01-04T04:19:46Z

As a performance optimization, is there any jmh number can share, thanks, I'm planing to learn some jsonnet today.

JoshRosen added 11 commits January 3, 2025 11:33

Optimize key lookup / visibility checks for super-free objects.

1e35708

Throw errors on more cases we should never hit

5e531ef

Manual unroll of for comprehensions in mergePatch

9fa504a

Factor existing pre-sizing code into helper function.

a19b92e

Presize more call sites.

3314529

Pre-size value caches for objects without super objects.

e6ecc8e

Make valueCache into a Java HashMap.

21704f2

Optimized distinctKeys for small rhs (typical case)

3ff6f08

Optimize for common case where most keys are visible

017b80c

Avoid some array copies in Obj.mk in hot paths.

8054868

PrettyNamed implicits should be singletons / constants

bf84312

JoshRosen mentioned this pull request Jan 4, 2025

[WIP / experiment] Specialization of std.foldl(std.mergePatch, arr, target) #254

Closed

He-Pin reviewed Jan 4, 2025

View reviewed changes

JoshRosen commented Jan 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several performance optimizations, primarily aimed at reducing garbage object creation in common cases #258

Several performance optimizations, primarily aimed at reducing garbage object creation in common cases #258

JoshRosen commented Jan 4, 2025 •

edited

Loading

He-Pin Jan 4, 2025

JoshRosen Jan 4, 2025

He-Pin Jan 4, 2025 •

edited

Loading

JoshRosen Jan 4, 2025

JoshRosen Jan 4, 2025

JoshRosen Jan 4, 2025

JoshRosen Jan 4, 2025

He-Pin commented Jan 4, 2025

Several performance optimizations, primarily aimed at reducing garbage object creation in common cases #258

Are you sure you want to change the base?

Several performance optimizations, primarily aimed at reducing garbage object creation in common cases #258

Conversation

JoshRosen commented Jan 4, 2025 • edited Loading

Optimizations

Optimizing Obj key lookup methods for objects without super

Pre-size various hash maps

Change valueCache from a Scala map to a Java map

Remove most Scala sugar in std.mergePatch, plus other optimizations

Optimize std.mergePatch's distinct key computation

Special overload of Val.Obj.mk to skip an array copy

Make PrettyNamed implicits into constants

Other changes

He-Pin Jan 4, 2025

Choose a reason for hiding this comment

JoshRosen Jan 4, 2025

Choose a reason for hiding this comment

He-Pin Jan 4, 2025 • edited Loading

Choose a reason for hiding this comment

JoshRosen Jan 4, 2025

Choose a reason for hiding this comment

JoshRosen Jan 4, 2025

Choose a reason for hiding this comment

JoshRosen Jan 4, 2025

Choose a reason for hiding this comment

JoshRosen Jan 4, 2025

Choose a reason for hiding this comment

He-Pin commented Jan 4, 2025

JoshRosen commented Jan 4, 2025 •

edited

Loading

Optimizing `Obj` key lookup methods for objects without `super`

Change `valueCache` from a Scala map to a Java map

Remove most Scala sugar in `std.mergePatch`, plus other optimizations

Optimize `std.mergePatch`'s distinct key computation

Special overload of `Val.Obj.mk` to skip an array copy

Make `PrettyNamed` implicits into constants

He-Pin Jan 4, 2025 •

edited

Loading