-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
streamline/speed up addc/subc by using in/out carry parameter #267
base: master
Are you sure you want to change the base?
Conversation
I believe this version is significantly faster! |
I benchmarked GCC 11. In general we would need to split these into smaller changes (I can do this myself) for better analysis.
We should also consult Clang. It can optimize add/sub perfectly so as long it stays this way we can go ahead for changes making it easier for GCC.
|
Thanks for looking into it @chfast ! I'm not sure how to read the numbers you posted, but I am surprised that you found that add/sub got slower. In my tests it was faster. I noticed that sometimes running the same code twice gave different numbers though! Benchmarks are hard. |
@@ -182,6 +182,11 @@ inline constexpr uint64_t addc( | |||
} | |||
#endif | |||
|
|||
if (((x | y) & (uint64_t(1) << 63)) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ineffective for compilers we care about. GCC and Clang should be using builtins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we use the builtin how do you explain that my version is slower in the benchmark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I had removed this change. Are you looking at my latest version?
see https://github.com/greg7mdp/intx/blob/master/include/intx/intx.hpp
Also why diff between add and inline_add? |
This is comparison between master and your changes. Numbers are:
This comes from a tool comparing outputs of |
Because the benchmark run in a loop the "inline_add" can be vectorized. This is not relevant for EVM use case. I probably could review all benchmark cases and remove half of these. |
Thanks!
vectorized or inlined? Why shouldn't it be relevant for EVM? |
The main difference is that the |
@chfast if you are using gcc 11 and therefore the builtin, I am puzzled that my version would be slower. |
return subc(x, y).carry; | ||
for (size_t i = uint<N>::num_words; i-- > 1; ) { | ||
if (x[i] != y[i]) | ||
return x[i] < y[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This single change looks interesting. Can you submit it as a separate PR for easier verification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kudos, SonarCloud Quality Gate passed! |
No description provided.