Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix performance #39

Merged
merged 4 commits into from
Dec 10, 2024
Merged

fix performance #39

merged 4 commits into from
Dec 10, 2024

Conversation

mbauman
Copy link
Collaborator

@mbauman mbauman commented Dec 3, 2024

Previously, we had been plopping the return values of both iterate(I.skips) and iterate(I.picks) in a single tuple to use as the iteration state. This meant that we had a type instability of the flavor:

(skipitr, pickitr)::Tuple{Union{Nothing, Tuple{_, _}}, Union{Nothing, Tuple{_, _}}

Julia's inference does not like this. In particular, this isn't a small splittable union. This change refactors iteration with two dramatic simplifications:

  1. We no longer carry around the previously returned index from the pickitr. Changing this to only worry about the state from the picks gets rid of one union split and simplifies things such that the iteration of picks is lock-step with the iteration itself; we're no longer one ahead.
  2. We do still need to carry around the previously returned skipper — it's the next value we need to skip! But here we can introduce a branch to either return a (nothing, pickstate) or a ((skipvalue, skipstate), pickstate), which is exactly the magic we need to get Julia to normalize the Tuple{Union} to a Union{Tuple, Tuple}.

fixes #14, fixes #27

@mbauman
Copy link
Collaborator Author

mbauman commented Dec 3, 2024

Using the benchmark suite from #27, this is now far and away the fastest of all but one of the cases by 5-50x. The odd one out is rand(10000)[Not(1:10000)] — the second to last test here:

julia> using BenchmarkTools, InvertedIndices
Precompiling InvertedIndices
  1 dependency successfully precompiled in 1 seconds

julia> function benchindexing(a, not)
           print("indexing by `!in(not)`:     ")
           @btime $a[map(!in($not), axes($a, 1))]
           print("indexing by `!in(Set(not))`:")
           @btime $a[map(!in(Set($not)), axes($a, 1))]
           print("indexing by `Not(not)`:     ")
           @btime $a[Not($not)]
           return nothing
       end
benchindexing (generic function with 1 method)

julia> benchindexing(rand(100), 1)
indexing by `!in(not)`:       144.704 ns (2 allocations: 1.03 KiB)
indexing by `!in(Set(not))`:  504.510 ns (6 allocations: 1.42 KiB)
indexing by `Not(not)`:       76.090 ns (1 allocation: 896 bytes)

julia> benchindexing(rand(100), 1:10)
indexing by `!in(not)`:       137.191 ns (2 allocations: 976 bytes)
indexing by `!in(Set(not))`:  521.597 ns (6 allocations: 1.34 KiB)
indexing by `Not(not)`:       84.454 ns (1 allocation: 816 bytes)

julia> benchindexing(rand(100), collect(1:10))
indexing by `!in(not)`:       530.263 ns (2 allocations: 976 bytes)
indexing by `!in(Set(not))`:  528.947 ns (6 allocations: 1.34 KiB)
indexing by `Not(not)`:       90.097 ns (1 allocation: 816 bytes)

julia> benchindexing(rand(100), collect(1:50))
indexing by `!in(not)`:       1.442 μs (2 allocations: 656 bytes)
indexing by `!in(Set(not))`:  829.392 ns (9 allocations: 2.24 KiB)
indexing by `Not(not)`:       99.657 ns (1 allocation: 496 bytes)

julia> benchindexing(rand(100), collect(1:100))
indexing by `!in(not)`:       2.167 μs (2 allocations: 224 bytes)
indexing by `!in(Set(not))`:  1.125 μs (9 allocations: 3.01 KiB)
indexing by `Not(not)`:       126.206 ns (1 allocation: 64 bytes)

julia> benchindexing(rand(10000), collect(1:5000))
indexing by `!in(not)`:       11.734 ms (3 allocations: 49.17 KiB)
indexing by `!in(Set(not))`:  169.917 μs (10 allocations: 121.82 KiB)
indexing by `Not(not)`:       8.472 μs (2 allocations: 39.17 KiB)

julia> benchindexing(rand(10000), collect(1:8000))
indexing by `!in(not)`:       14.982 ms (2 allocations: 25.88 KiB)
indexing by `!in(Set(not))`:  154.333 μs (9 allocations: 170.52 KiB)
indexing by `Not(not)`:       8.820 μs (1 allocation: 15.88 KiB)

julia> benchindexing(rand(10000), collect(1:10000))
indexing by `!in(not)`:       15.615 ms (2 allocations: 10.06 KiB)
indexing by `!in(Set(not))`:  194.083 μs (9 allocations: 154.71 KiB)
indexing by `Not(not)`:       8.958 μs (1 allocation: 64 bytes)

julia> @btime collect(itr) setup=(itr=(i for i in to_indices(rand(100), (Not(1:50),))[1]));
  58.038 ns (1 allocation: 496 bytes)

julia> benchindexing(rand(10000), 1:10000)
indexing by `!in(not)`:       2.287 μs (2 allocations: 10.06 KiB)
indexing by `!in(Set(not))`:  182.000 μs (9 allocations: 154.71 KiB)
indexing by `Not(not)`:       6.217 μs (1 allocation: 64 bytes)

julia> benchindexing(rand(10000), 1:2:10000)
indexing by `!in(not)`:       31.792 μs (3 allocations: 49.17 KiB)
indexing by `!in(Set(not))`:  163.916 μs (10 allocations: 121.82 KiB)
indexing by `Not(not)`:       6.275 μs (2 allocations: 39.17 KiB)

@mbauman
Copy link
Collaborator Author

mbauman commented Dec 3, 2024

This fixes iterate, but interestingly it's still one step too complicated for collect, which is why #14 is still an order of magnitude away here:

julia> using InvertedIndices, Statistics, BenchmarkTools

julia> thing1(f, x) = map(i->f(view(x, Not(i))), eachindex(x))
thing1 (generic function with 1 method)

julia> thing2(f, x) = (is = eachindex(x); map(i->f(view(x, filter(!isequal(i), is))), is))
thing2 (generic function with 1 method)

julia> x = randn(2000);

julia> @btime thing1(mean, $x);
  277.239 ms (14971031 allocations: 411.97 MiB)

julia> @btime thing2(mean, $x);
  9.928 ms (4001 allocations: 62.03 MiB)

all its time is being spent in collect, because:

julia> @code_warntype collect(I)
MethodInstance for collect(::InvertedIndices.InvertedIndexIterator{Int64, Int64, Base.OneTo{Int64}})
  from collect(III::InvertedIndices.InvertedIndexIterator) @ InvertedIndices ~/Projects/InvertedIndices.jl/src/InvertedIndices.jl:138
Arguments
  #self#::Core.Const(collect)
  III::InvertedIndices.InvertedIndexIterator{Int64, Int64, Base.OneTo{Int64}}
Body::Vector
1 ─ %1 = Base.Generator(Base.identity, III)::Base.Generator{InvertedIndices.InvertedIndexIterator{Int64, Int64, Base.OneTo{Int64}}, typeof(identity)}
│   %2 = Base.collect(%1)::Vector
└──      return %2

@mbauman
Copy link
Collaborator Author

mbauman commented Dec 3, 2024

OK, the remaining trouble with collect is that the iteration state is still marginally unstable — it's small-union-splittable unstable — but that's a union nonetheless. At first order, this works great. But when it's composed with other iterators (like the Generator in collect), that unstable state ends up getting stashed into a tuple where it's no longer splittable — just like the original problem.

@mbauman
Copy link
Collaborator Author

mbauman commented Dec 3, 2024

There we go, this now fixes #14, too. It's worth noting that I tried changing iteration to use Iterators.Stateful in mbauman:InvertedIndices.jl:mb/perf...mbauman:InvertedIndices.jl:mb/stateful, but that regressed some of the above performance metrics and didn't even resolve the performance of #14 even though it fixed the type instability. So instead, I just pushed a simple collect optimization here.

julia> @btime thing1(mean, $x);
  6.495 ms (2001 allocations: 31.02 MiB)

julia> @btime thing2(mean, $x);
  10.911 ms (4001 allocations: 62.03 MiB)

Comment on lines +113 to +114
# This is a little silly, but splitting the tuple here allows inference to normalize
# Tuple{Union{Nothing, Tuple}, Tuple} to Union{Tuple{Nothing, Tuple}, Tuple{Tuple, Tuple}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, have you noticed this across all supported versions or just in e.g. 1.11?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type stability unit tests fail without this manual union-split ternary, but pass with it. It's the difference between putting a type-unstable value in a tuple v.s. returning two different tuples... and those fundamentals hold true going back to 1.0:

▶ julia +1.0 -q
julia> function f()
           x = rand([nothing, 1.0])
           return (x,)
       end
f (generic function with 1 method)

julia> @code_warntype f()
Body::Tuple{Union{Nothing, Float64}}
# ...

julia> function g()
           x = rand([nothing, 1.0])
           return x===nothing ? (nothing,) : (x,)
       end
g (generic function with 1 method)

julia> @code_warntype g()
Body::Union{Tuple{Nothing}, Tuple{Float64}}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, I don't think I knew that. Thanks, I appreciate the explanation!

@ararslan
Copy link
Member

Want to rebase for a clean CI run now that #40 has been merged?

test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
@ararslan ararslan merged commit 2efe37d into JuliaData:main Dec 10, 2024
6 checks passed
@mbauman mbauman deleted the mb/perf branch December 10, 2024 21:29
@mbauman mbauman restored the mb/perf branch December 10, 2024 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bad performance caused by type unstable iterate Significant performance cost over naively filtering indices
2 participants