You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we can thread ufuncs we do not understand.
For a binary_reduce, on a large array, we can divide the work up assigning each work chunk to a thread.
each work item would output to a slot in another output array (allocated on the fly).
then that output array can be sent back to the binary_reduce loop for the final calculation (example would be each thread calculates the sum, then the final calculation does the sum of sums)
For non binary reduce on large arrays, we can divide up the work as normal (for both binary and unary ufuncs).
The text was updated successfully, but these errors were encountered:
We can reuse the pointer we pull out of PyUFunc_ReplaceLoopBySignature and plug it back into our loop override. This has the advantage of using the SSE/AVX optimized loop without needing the CPU detection since NumPy already uses AVX for many ufuncs.
we can thread ufuncs we do not understand.
For a binary_reduce, on a large array, we can divide the work up assigning each work chunk to a thread.
each work item would output to a slot in another output array (allocated on the fly).
then that output array can be sent back to the binary_reduce loop for the final calculation (example would be each thread calculates the sum, then the final calculation does the sum of sums)
For non binary reduce on large arrays, we can divide up the work as normal (for both binary and unary ufuncs).
The text was updated successfully, but these errors were encountered: