fix performance #39

mbauman · 2024-12-03T19:29:18Z

Previously, we had been plopping the return values of both iterate(I.skips) and iterate(I.picks) in a single tuple to use as the iteration state. This meant that we had a type instability of the flavor:

(skipitr, pickitr)::Tuple{Union{Nothing, Tuple{_, _}}, Union{Nothing, Tuple{_, _}}

Julia's inference does not like this. In particular, this isn't a small splittable union. This change refactors iteration with two dramatic simplifications:

We no longer carry around the previously returned index from the pickitr. Changing this to only worry about the state from the picks gets rid of one union split and simplifies things such that the iteration of picks is lock-step with the iteration itself; we're no longer one ahead.
We do still need to carry around the previously returned skipper — it's the next value we need to skip! But here we can introduce a branch to either return a (nothing, pickstate) or a ((skipvalue, skipstate), pickstate), which is exactly the magic we need to get Julia to normalize the Tuple{Union} to a Union{Tuple, Tuple}.

fixes #14, fixes #27

mbauman · 2024-12-03T19:56:04Z

Using the benchmark suite from #27, this is now far and away the fastest of all but one of the cases by 5-50x. The odd one out is rand(10000)[Not(1:10000)] — the second to last test here:

julia> using BenchmarkTools, InvertedIndices
Precompiling InvertedIndices
  1 dependency successfully precompiled in 1 seconds

julia> function benchindexing(a, not)
           print("indexing by `!in(not)`:     ")
           @btime $a[map(!in($not), axes($a, 1))]
           print("indexing by `!in(Set(not))`:")
           @btime $a[map(!in(Set($not)), axes($a, 1))]
           print("indexing by `Not(not)`:     ")
           @btime $a[Not($not)]
           return nothing
       end
benchindexing (generic function with 1 method)

julia> benchindexing(rand(100), 1)
indexing by `!in(not)`:       144.704 ns (2 allocations: 1.03 KiB)
indexing by `!in(Set(not))`:  504.510 ns (6 allocations: 1.42 KiB)
indexing by `Not(not)`:       76.090 ns (1 allocation: 896 bytes)

julia> benchindexing(rand(100), 1:10)
indexing by `!in(not)`:       137.191 ns (2 allocations: 976 bytes)
indexing by `!in(Set(not))`:  521.597 ns (6 allocations: 1.34 KiB)
indexing by `Not(not)`:       84.454 ns (1 allocation: 816 bytes)

julia> benchindexing(rand(100), collect(1:10))
indexing by `!in(not)`:       530.263 ns (2 allocations: 976 bytes)
indexing by `!in(Set(not))`:  528.947 ns (6 allocations: 1.34 KiB)
indexing by `Not(not)`:       90.097 ns (1 allocation: 816 bytes)

julia> benchindexing(rand(100), collect(1:50))
indexing by `!in(not)`:       1.442 μs (2 allocations: 656 bytes)
indexing by `!in(Set(not))`:  829.392 ns (9 allocations: 2.24 KiB)
indexing by `Not(not)`:       99.657 ns (1 allocation: 496 bytes)

julia> benchindexing(rand(100), collect(1:100))
indexing by `!in(not)`:       2.167 μs (2 allocations: 224 bytes)
indexing by `!in(Set(not))`:  1.125 μs (9 allocations: 3.01 KiB)
indexing by `Not(not)`:       126.206 ns (1 allocation: 64 bytes)

julia> benchindexing(rand(10000), collect(1:5000))
indexing by `!in(not)`:       11.734 ms (3 allocations: 49.17 KiB)
indexing by `!in(Set(not))`:  169.917 μs (10 allocations: 121.82 KiB)
indexing by `Not(not)`:       8.472 μs (2 allocations: 39.17 KiB)

julia> benchindexing(rand(10000), collect(1:8000))
indexing by `!in(not)`:       14.982 ms (2 allocations: 25.88 KiB)
indexing by `!in(Set(not))`:  154.333 μs (9 allocations: 170.52 KiB)
indexing by `Not(not)`:       8.820 μs (1 allocation: 15.88 KiB)

julia> benchindexing(rand(10000), collect(1:10000))
indexing by `!in(not)`:       15.615 ms (2 allocations: 10.06 KiB)
indexing by `!in(Set(not))`:  194.083 μs (9 allocations: 154.71 KiB)
indexing by `Not(not)`:       8.958 μs (1 allocation: 64 bytes)

julia> @btime collect(itr) setup=(itr=(i for i in to_indices(rand(100), (Not(1:50),))[1]));
  58.038 ns (1 allocation: 496 bytes)

julia> benchindexing(rand(10000), 1:10000)
indexing by `!in(not)`:       2.287 μs (2 allocations: 10.06 KiB)
indexing by `!in(Set(not))`:  182.000 μs (9 allocations: 154.71 KiB)
indexing by `Not(not)`:       6.217 μs (1 allocation: 64 bytes)

julia> benchindexing(rand(10000), 1:2:10000)
indexing by `!in(not)`:       31.792 μs (3 allocations: 49.17 KiB)
indexing by `!in(Set(not))`:  163.916 μs (10 allocations: 121.82 KiB)
indexing by `Not(not)`:       6.275 μs (2 allocations: 39.17 KiB)

mbauman · 2024-12-03T20:12:25Z

This fixes iterate, but interestingly it's still one step too complicated for collect, which is why #14 is still an order of magnitude away here:

julia> using InvertedIndices, Statistics, BenchmarkTools

julia> thing1(f, x) = map(i->f(view(x, Not(i))), eachindex(x))
thing1 (generic function with 1 method)

julia> thing2(f, x) = (is = eachindex(x); map(i->f(view(x, filter(!isequal(i), is))), is))
thing2 (generic function with 1 method)

julia> x = randn(2000);

julia> @btime thing1(mean, $x);
  277.239 ms (14971031 allocations: 411.97 MiB)

julia> @btime thing2(mean, $x);
  9.928 ms (4001 allocations: 62.03 MiB)

all its time is being spent in collect, because:

julia> @code_warntype collect(I)
MethodInstance for collect(::InvertedIndices.InvertedIndexIterator{Int64, Int64, Base.OneTo{Int64}})
  from collect(III::InvertedIndices.InvertedIndexIterator) @ InvertedIndices ~/Projects/InvertedIndices.jl/src/InvertedIndices.jl:138
Arguments
  #self#::Core.Const(collect)
  III::InvertedIndices.InvertedIndexIterator{Int64, Int64, Base.OneTo{Int64}}
Body::Vector
1 ─ %1 = Base.Generator(Base.identity, III)::Base.Generator{InvertedIndices.InvertedIndexIterator{Int64, Int64, Base.OneTo{Int64}}, typeof(identity)}
│   %2 = Base.collect(%1)::Vector
└──      return %2

mbauman · 2024-12-03T20:40:10Z

OK, the remaining trouble with collect is that the iteration state is still marginally unstable — it's small-union-splittable unstable — but that's a union nonetheless. At first order, this works great. But when it's composed with other iterators (like the Generator in collect), that unstable state ends up getting stashed into a tuple where it's no longer splittable — just like the original problem.

mbauman · 2024-12-03T21:47:52Z

There we go, this now fixes #14, too. It's worth noting that I tried changing iteration to use Iterators.Stateful in mbauman:InvertedIndices.jl:mb/perf...mbauman:InvertedIndices.jl:mb/stateful, but that regressed some of the above performance metrics and didn't even resolve the performance of #14 even though it fixed the type instability. So instead, I just pushed a simple collect optimization here.

julia> @btime thing1(mean, $x);
  6.495 ms (2001 allocations: 31.02 MiB)

julia> @btime thing2(mean, $x);
  10.911 ms (4001 allocations: 62.03 MiB)

ararslan · 2024-12-09T20:51:36Z

src/InvertedIndices.jl

+    # This is a little silly, but splitting the tuple here allows inference to normalize
+    # Tuple{Union{Nothing, Tuple}, Tuple} to Union{Tuple{Nothing, Tuple}, Tuple{Tuple, Tuple}}


Out of curiosity, have you noticed this across all supported versions or just in e.g. 1.11?

The type stability unit tests fail without this manual union-split ternary, but pass with it. It's the difference between putting a type-unstable value in a tuple v.s. returning two different tuples... and those fundamentals hold true going back to 1.0:

▶ julia +1.0 -q julia> function f() x = rand([nothing, 1.0]) return (x,) end f (generic function with 1 method) julia> @code_warntype f() Body::Tuple{Union{Nothing, Float64}} # ... julia> function g() x = rand([nothing, 1.0]) return x===nothing ? (nothing,) : (x,) end g (generic function with 1 method) julia> @code_warntype g() Body::Union{Tuple{Nothing}, Tuple{Float64}}

Oh interesting, I don't think I knew that. Thanks, I appreciate the explanation!

ararslan · 2024-12-10T03:00:11Z

Want to rebase for a clean CI run now that #40 has been merged?

test/runtests.jl

ararslan reviewed Dec 9, 2024

View reviewed changes

mbauman added 2 commits December 10, 2024 11:19

fix performance

28b028c

optimize collect

fab9c0d

mbauman force-pushed the mb/perf branch from 01fa568 to fab9c0d Compare December 10, 2024 16:20

mbauman commented Dec 10, 2024

View reviewed changes

test/runtests.jl Outdated Show resolved Hide resolved

fix for v1.11's cartesian logical arrays

335416c

mbauman commented Dec 10, 2024

View reviewed changes

test/runtests.jl Outdated Show resolved Hide resolved

Update test/runtests.jl

53c1f5c

ararslan approved these changes Dec 10, 2024

View reviewed changes

ararslan merged commit 2efe37d into JuliaData:main Dec 10, 2024
6 checks passed

mbauman deleted the mb/perf branch December 10, 2024 21:29

mbauman restored the mb/perf branch December 10, 2024 21:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix performance #39

fix performance #39

mbauman commented Dec 3, 2024 •

edited

Loading

mbauman commented Dec 3, 2024 •

edited

Loading

mbauman commented Dec 3, 2024

mbauman commented Dec 3, 2024

mbauman commented Dec 3, 2024

ararslan Dec 9, 2024

mbauman Dec 9, 2024

ararslan Dec 10, 2024

ararslan commented Dec 10, 2024

		# This is a little silly, but splitting the tuple here allows inference to normalize
		# Tuple{Union{Nothing, Tuple}, Tuple} to Union{Tuple{Nothing, Tuple}, Tuple{Tuple, Tuple}}

fix performance #39

fix performance #39

Conversation

mbauman commented Dec 3, 2024 • edited Loading

mbauman commented Dec 3, 2024 • edited Loading

mbauman commented Dec 3, 2024

mbauman commented Dec 3, 2024

mbauman commented Dec 3, 2024

ararslan Dec 9, 2024

Choose a reason for hiding this comment

mbauman Dec 9, 2024

Choose a reason for hiding this comment

ararslan Dec 10, 2024

Choose a reason for hiding this comment

ararslan commented Dec 10, 2024

mbauman commented Dec 3, 2024 •

edited

Loading

mbauman commented Dec 3, 2024 •

edited

Loading