Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create 'benchmarking' section of documentation #110

Open
jack-pappas opened this issue Dec 17, 2020 · 10 comments
Open

Create 'benchmarking' section of documentation #110

jack-pappas opened this issue Dec 17, 2020 · 10 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@jack-pappas
Copy link
Collaborator

Per @mattip, create a 'benchmarking' page in the documentation. The page should include the following information:

  • instructions on how to set up and run the benchmark suite (via asv)
  • parameterization of the benchmark suite; in other words, what performance aspects are we looking to understand through the benchmarks? -- and how are we parameterizing the benchmarks to gather information to understand these?
  • wish list -- what things would we like to cover in the benchmark suite which we're currently not?
  • references / links to external materials: Fix threading support in some benchmarks #107 (comment)_
@jack-pappas jack-pappas self-assigned this Dec 17, 2020
@jack-pappas jack-pappas added the documentation Improvements or additions to documentation label Dec 17, 2020
@mattip
Copy link
Collaborator

mattip commented Dec 18, 2020

I ran the benchmarks on a intel machine after running sudo pyperf system tune, but did not see any improvement when activating multiple threads. Here is the machine.json and the compressed .asv/results directory.

{
    "arch": "x86_64",
    "cpu": "Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz",
    "machine": "benchmarker",
    "num_cpu": "8",
    "os": "Linux 4.15.0-74-generic",
    "ram": "65748452",
    "version": 1
}

benchmarker.tar.gz

@mattip
Copy link
Collaborator

mattip commented Dec 18, 2020

The benchmarks ran for 2 hours on this machine

@mattip
Copy link
Collaborator

mattip commented Jan 6, 2021

@jack-pappas @tdimitri: any thoughts why I do not see an improvement?

@tdimitri
Copy link
Collaborator

tdimitri commented Jan 6, 2021

Matti, did you do...

pn.init()
pn.benchmark()

What are the numbers returned?
Then now there is a parallel lexsort and a parallel sort.

@mattip
Copy link
Collaborator

mattip commented Jan 6, 2021

No, I followed the instructions on the benchmarks README

asv run

Here is my result for pn.benchmark():

>>> pn.benchmark()
1000000 rows,bool,int8,int16,int32,int64,float32,float64,
a==b , 0.99, 1.00, 1.00, 1.15, 1.01, 1.15, 1.02,
a==5 , 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.02,
a+b, 1.01, 1.00, 1.00, 1.06, 1.01, 0.97, 1.00,
a+5, 1.13, 1.00, 1.01, 1.00, 1.07, 1.02, 1.05,
a/5, 1.00, 1.00, 1.00, 0.99, 1.00, 1.00, 1.00,
abs, 1.00, 1.00, 1.00, 0.93, 0.98, 1.00, 1.08,
isnan, 1.00, 1.01, 1.01, 1.00, 1.01, 1.02, 0.99,
sin, 1.00, 0.99, 1.00, 1.00, 1.00, 0.98, 1.00,
log, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00,
sum, 1.00, 1.00, 1.00, 1.00, 1.02, 1.00, 1.02,
min, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00,

@mattip
Copy link
Collaborator

mattip commented Jan 6, 2021

Ahh, hangon, after pn.init() it gets better:

>>> pn.init()
>>> pn.benchmark()
1000000 rows,bool,int8,int16,int32,int64,float32,float64,
a==b , 6.79, 2.58, 2.59, 3.29, 6.67, 2.45, 6.14,
a==5 , 4.71, 1.81, 1.87, 3.00, 4.69, 1.97, 2.64,
a+b, 9.37, 2.31, 2.46, 3.14, 9.44, 2.89, 9.20,
a+5, 4.12, 2.33, 2.16, 2.75, 4.23, 1.85, 4.78,
a/5, 0.72, 0.86, 0.87, 0.91, 0.70, 4.08, 6.99,
abs, 4.02, 5.83, 6.53, 3.16, 4.00, 9.85,11.18,
isnan, 0.79, 0.70, 0.80, 0.74, 0.80, 1.96, 2.73,
sin, 4.30, 3.88, 3.95, 8.81, 5.32,21.15,60.16,
log, 1.25, 2.13, 2.17, 1.30, 1.58, 6.39, 3.05,
sum, 8.28, 1.01, 1.04, 1.00, 9.61, 6.45, 5.44,
min, 3.65,41.85,41.73,31.00, 3.66, 1.93, 2.64,

@mattip
Copy link
Collaborator

mattip commented Jan 6, 2021

Why isn't that reflected in the ASV results?

@tdimitri
Copy link
Collaborator

tdimitri commented Jan 6, 2021

I will check with Jack and review his benchmark, I did not work with him on his benchmark and I apologize for any confusion.
The benchmarks are hard because we have not hooked the "initialization" functions yet (like ones, zeros, arange, etc). We also have not hooked the copy functions, copy with mask, etc. We also have not hooked the conversion functions.
I spent the last 10 hours trying to figure out how to hook the conversion functions, calling PyArray_RegisterCastFunc.. but does not seem to work yet.

Your numbers above look good and expected. One dip is in division of integers because it converts from int to float64 and does so in the main thread, thus invalidating the other cores... which is why I am trying to hook more functions.

Ideally divide would "convert and divide" on the fly... but we also cannot hook that right now.

On a good note... there is pn.getitem() which acts like a[b] when a is an array, and b is a boolean or fancy index array. It runs in parallel. On another good note... I have reviewed so much numpy internal low level code, I understand it better and can at least suggest hooks.

@jack-pappas
Copy link
Collaborator Author

jack-pappas commented Jan 6, 2021

@jack-pappas
Copy link
Collaborator Author

@mattip One thing that could be causing this -- I ran the latest benchmark code on Windows, and you're running it on Linux. asv supports running benchmarks in individual subprocesses, and (I'm speculating) it may be doing that by default on Linux but not on Windows, or asv is defaulting to a different approach for it on Windows vs. Linux. If that's the case, maybe we need to move the pn.initialize() call at the top of the bench_ufunc.py file, or e.g. have pnumpy auto-initialize when imported or detect when it's been forked (after pn.initialize() has been called) and re-initialize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants