Skip to content

Commit 6f4e54b

Browse files
authored
Optimization: std.sort should only evaluate keyF once per array element (#245)
This PR improves the performance of `std.sort` and related functions when `keyF` is used. The existing implementation evaluates `keyF` multiple times per input element because: - Prior to sorting, it evaluates `keyF` on all elements to check whether all keys are of the same type. - During sorting, it re-evaluates `keyF` on every pair of compared elements. In the best case (an already-sorted array), this performs ~3x more evaluations than needed because each element participates in up to two extra unnecessary comparisons. In the worst case, we have to do additional comparisons during sorting and the unnecessary work will be even higher. ### The fix The fix: - Precompute all keys up front (which we already do for type-checking purposes). - Sort an array of indices using a comparator which fetches their corresponding precomputed keys - Use the sorted indices to project out the array values in the correct order I also made a few other small improvements: - Explicitly error out when trying to sort arrays of booleans: neither jsonnet nor go-jsonnet supports this. The existing sjsonnet code didn't either, but failed with a confusing `"Cannot sort with key values that are not all the same type"` error because `Val.True` and `Val.False` are different classes. The existing code which did class equality checks on `Val.Bool` would never match because `Val.Bool` is an abstract class. - Avoid allocating a `keyTypes` set: we can simply check that all other elements match the first type's element. This saves some garbage allocations in the common case. - Move `.force` calls earlier so that we don't have to call them in the sort comparator. ### Benchmarking results Consider the following toy benchmark case: ```jsonnet local largeArr = [ { complexKey: { a: i, b: i+1, c: i+2 }, value: "val" + i } for i in std.range(0, 9999) ]; local sortedArr = std.sort( largeArr, keyF=function(x) std.toString(x.complexKey) ); { sortedArrSample: sortedArr[0:5] } ``` With the `RunProfiler` we can see an enormous difference in the number of `std.toString` invocations via the key function: with a standard 5 benchmark runs, we expect to see only 50,000 hits but the old code ran it `241,210` times! I also measured performance on one of our real-world jsonnet bundles, where this PR's optimization cut one expensive target's runtime by 25%.
1 parent 89c04a0 commit 6f4e54b

2 files changed

Lines changed: 28 additions & 18 deletions

File tree

sjsonnet/src/sjsonnet/Std.scala

Lines changed: 23 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1700,34 +1700,39 @@ class Std(private val additionalNativeFunctions: Map[String, Val.Builtin] = Map.
17001700
} else {
17011701
val keyFFunc = if (keyF == null || keyF.isInstanceOf[Val.False]) null else keyF.asInstanceOf[Val.Func]
17021702
new Val.Arr(pos, if (keyFFunc != null) {
1703-
val keys = new Val.Arr(pos.noOffset, vs.map(v => keyFFunc(Array(v.force), null, pos.noOffset)(ev)))
1704-
val keyTypes = keys.iterator.map(_.force.getClass).toSet
1705-
if (keyTypes.size != 1) {
1703+
val keys: Array[Val] = vs.map(v => keyFFunc(Array(v.force), null, pos.noOffset)(ev).force)
1704+
val keyType = keys(0).getClass
1705+
if (classOf[Val.Bool].isAssignableFrom(keyType)) {
1706+
Error.fail("Cannot sort with key values that are booleans")
1707+
}
1708+
if (!keys.forall(_.getClass == keyType)) {
17061709
Error.fail("Cannot sort with key values that are not all the same type")
17071710
}
17081711

1709-
if (keyTypes.contains(classOf[Val.Str])) {
1710-
vs.sortBy(v => keyFFunc(Array(v.force), null, pos.noOffset)(ev).cast[Val.Str].asString)
1711-
} else if (keyTypes.contains(classOf[Val.Num])) {
1712-
vs.sortBy(v => keyFFunc(Array(v.force), null, pos.noOffset)(ev).cast[Val.Num].asDouble)
1713-
} else if (keyTypes.contains(classOf[Val.Bool])) {
1714-
vs.sortBy(v => keyFFunc(Array(v.force), null, pos.noOffset)(ev).cast[Val.Bool].asBoolean)
1712+
val indices = Array.range(0, vs.length)
1713+
1714+
val sortedIndices = if (keyType == classOf[Val.Str]) {
1715+
indices.sortBy(i => keys(i).cast[Val.Str].asString)
1716+
} else if (keyType == classOf[Val.Num]) {
1717+
indices.sortBy(i => keys(i).cast[Val.Num].asDouble)
17151718
} else {
1716-
Error.fail("Cannot sort with key values that are " + keys.force(0).prettyName + "s")
1719+
Error.fail("Cannot sort with key values that are " + keys(0).prettyName + "s")
17171720
}
1721+
1722+
sortedIndices.map(i => vs(i))
17181723
} else {
1719-
val keyTypes = vs.map(_.force.getClass).toSet
1720-
if (keyTypes.size != 1) {
1721-
Error.fail("Cannot sort with values that are not all the same type")
1724+
val keyType = vs(0).force.getClass
1725+
if (classOf[Val.Bool].isAssignableFrom(keyType)) {
1726+
Error.fail("Cannot sort with values that are booleans")
17221727
}
1728+
if (!vs.forall(_.force.getClass == keyType))
1729+
Error.fail("Cannot sort with values that are not all the same type")
17231730

1724-
if (keyTypes.contains(classOf[Val.Str])) {
1731+
if (keyType == classOf[Val.Str]) {
17251732
vs.map(_.force.cast[Val.Str]).sortBy(_.asString)
1726-
} else if (keyTypes.contains(classOf[Val.Num])) {
1733+
} else if (keyType == classOf[Val.Num]) {
17271734
vs.map(_.force.cast[Val.Num]).sortBy(_.asDouble)
1728-
} else if (keyTypes.contains(classOf[Val.Bool])) {
1729-
vs.map(_.force.cast[Val.Bool]).sortBy(_.asBoolean)
1730-
} else if (keyTypes.contains(classOf[Val.Obj])) {
1735+
} else if (keyType == classOf[Val.Obj]) {
17311736
Error.fail("Unable to sort array of objects without key function")
17321737
} else {
17331738
Error.fail("Cannot sort array of " + vs(0).force.prettyName)

sjsonnet/test/src/sjsonnet/StdWithKeyFTests.scala

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,11 @@ object StdWithKeyFTests extends TestSuite {
5757
evalErr("""std.sort([1,2, error "foo"])""").startsWith("sjsonnet.Error: foo"))
5858
assert(
5959
evalErr("""std.sort([1, [error "foo"]])""").startsWith("sjsonnet.Error: Cannot sort with values that are not all the same type"))
60+
// google/go-jsonnet and google/jsonnet also error on sorting of booleans:
61+
assert(
62+
evalErr("""std.sort([false, true])""").startsWith("sjsonnet.Error: Cannot sort with values that are booleans"))
63+
assert(
64+
evalErr("""std.sort([1, 2], keyF=function(x) x == 1)""").startsWith("sjsonnet.Error: Cannot sort with key values that are booleans"))
6065

6166
eval(
6267
"""local arr = [

0 commit comments

Comments
 (0)