Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streamline/speed up addc/subc by using in/out carry parameter #267

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

greg7mdp
Copy link

@greg7mdp greg7mdp commented Apr 2, 2022

No description provided.

@greg7mdp
Copy link
Author

greg7mdp commented Apr 3, 2022

I believe this version is significantly faster!

@chfast
Copy link
Owner

chfast commented Apr 5, 2022

I benchmarked GCC 11. In general we would need to split these into smaller changes (I can do this myself) for better analysis.
Some quick takeaways:

  • comparison optimization looks promising. If we can show it is always faster than subtraction this can land.
  • multiplication looks faster,
  • the basic add/sub got slower, I plan to inspect the assembly to see what happen.

We should also consult Clang. It can optimize add/sub perfectly so as long it stays this way we can go ahead for changes making it easier for GCC.

clz<uint32_t, intx::clz>_mean                                        -0.0031         -0.0031             0             0             0             0                                                                              
clz<uint32_t, intx::clz_generic>_mean                                +0.0005         +0.0005             1             1             1             1                                                                              
clz<uint64_t, intx::clz>_mean                                        +0.3459         +0.3459             0             1             0             1                                                                              
clz<uint64_t, intx::clz_generic>_mean                                +0.0008         +0.0008             3             3             3             3                                                                              
div_normalize<internal::normalize>_mean                              -0.0001         -0.0001            11            11            11            11                                                                              
reciprocal<uint64_t, neg>_mean                                       +0.0014         +0.0014             0             0             0             0                                                                              
reciprocal<uint64_t, reciprocal_naive>_mean                          +0.0012         +0.0012            19            19            19            19                                                                              
reciprocal<uint64_t, reciprocal_2by1>_mean                           -0.0002         -0.0002             4             4             4             4                                                                              
reciprocal<uint64_t, reciprocal_2by1_noinline>_mean                  +0.0010         +0.0010             4             4             4             4                                                                              
reciprocal<uint128, reciprocal_3by2>_mean                            +0.0012         +0.0012             7             7             7             7                                                                              
reciprocal<uint128, reciprocal_3by2_noinline>_mean                   +0.0072         +0.0072             6             6             6             6                                                                              
udiv64<nop>_mean                                                     +0.0018         +0.0018             1             1             1             1                                                                              
udiv64<udiv_by_reciprocal>_mean                                      +0.0015         +0.0015             7             7             7             7                                                                              
udiv64<udiv_native>_mean                                             -0.0002         -0.0002             7             7             7             7                                                                              
udiv64<soft_div_unr>_mean                                            +0.0137         +0.0137             7             7             7             7                                                                              
udiv64<soft_div_unr_unrolled>_mean                                   +0.0190         +0.0190             8             8             8             8                                                                              
udiv128<gcc_>/0_mean                                                 -0.0005         -0.0005            25            25            25            25                                                                              
udiv128<gcc_>/1_mean                                                 -0.0017         -0.0017            24            24            24            24                                                                              
udiv128<gcc_>/2_mean                                                 -0.0007         -0.0007            29            29            29            29                                                                              
udiv128<gcc_>/3_mean                                                 +0.0027         +0.0027            21            21            21            21                                                                              
udiv128<intx_>/0_mean                                                +0.0047         +0.0047            16            16            16            16                                                                              
udiv128<intx_>/1_mean                                                +0.0035         +0.0035            16            16            16            16                                                                              
udiv128<intx_>/2_mean                                                +0.0023         +0.0023            13            13            13            13                                                                              
udiv128<intx_>/3_mean                                                -0.0006         -0.0006            13            13            13            13                                                                              
udiv128<gmp_>/0_mean                                                 +0.0010         +0.0010            24            24            24            24                                                                              
udiv128<gmp_>/1_mean                                                 +0.0046         +0.0045            25            25            25            25                                                                              
udiv128<gmp_>/2_mean                                                 -0.0007         -0.0007            18            18            18            18                                                                              
udiv128<gmp_>/3_mean                                                 -0.0004         -0.0004            14            14            14            14                                                                              
umul128<intx::uint128, intx::umul>_mean                              -0.0021         -0.0022             0             0             0             0                                                                              
div<uint256, udivrem>/64_mean                                        +0.0221         +0.0221            22            23            22            23                                                                              
div<uint256, udivrem>/128_mean                                       -0.0029         -0.0029            28            28            28            28                                                                              
div<uint256, udivrem>/192_mean                                       -0.0097         -0.0097            33            33            33            33                                                                              
div<uint256, udivrem>/256_mean                                       -0.0031         -0.0031            29            29            29            29                                                                              
div<uint256, gmp::udivrem>/64_mean                                   +0.0238         +0.0238            24            24            24            24                                                                              
div<uint256, gmp::udivrem>/128_mean                                  +0.0119         +0.0119            31            32            31            32                                                                              
div<uint256, gmp::udivrem>/192_mean                                  +0.0037         +0.0037            52            52            52            52                                                                              
div<uint256, gmp::udivrem>/256_mean                                  +0.0109         +0.0109            52            52            52            52                                                                              
div<uint512, udivrem>/64_mean                                        +0.0008         +0.0008            37            37            37            37                                                                              
div<uint512, udivrem>/128_mean                                       -0.0010         -0.0010            47            47            47            47                                                                              
div<uint512, udivrem>/192_mean                                       -0.0039         -0.0039            67            67            67            67                                                                              
div<uint512, udivrem>/256_mean                                       +0.0019         +0.0019            65            65            65            65                                                                              
div<uint512, gmp::udivrem>/64_mean                                   +0.0206         +0.0206            38            38            38            38                                                                              
div<uint512, gmp::udivrem>/128_mean                                  -0.0005         -0.0005            54            54            54            54                                                                              
div<uint512, gmp::udivrem>/192_mean                                  -0.0024         -0.0024            78            78            78            78                                                                              
div<uint512, gmp::udivrem>/256_mean                                  -0.0073         -0.0073            82            81            82            81                                                                              
mod<addmod>/64_mean                                                  -0.0037         -0.0037            29            29            29            29                                                                              
mod<addmod>/128_mean                                                 -0.0069         -0.0069            32            32            32            32                                                                              
mod<addmod>/192_mean                                                 -0.0064         -0.0064            39            38            39            38                                                                              
mod<addmod>/256_mean                                                 -0.0048         -0.0047            35            35            35            35                                                                              
mod<addmod_public>/64_mean                                           -0.0019         -0.0019            32            32            32            32                                                                              
mod<addmod_public>/128_mean                                          +0.0039         +0.0038            34            34            34            34                                                                              
mod<addmod_public>/192_mean                                          +0.0016         +0.0016            40            40            40            40
mod<addmod_public>/256_mean                                          +0.0032         +0.0032            35            35            35            35
mod<addmod_simple>/64_mean                                           +0.0078         +0.0078            31            31            31            31
mod<addmod_simple>/128_mean                                          +0.0012         +0.0012            33            33            33            33
mod<addmod_simple>/192_mean                                          -0.0048         -0.0048            39            39            39            39
mod<addmod_simple>/256_mean                                          +0.0075         +0.0075            34            34            34            34
mod<addmod_prenormalize>/64_mean                                     -0.0509         -0.0509            60            57            60            57
mod<addmod_prenormalize>/128_mean                                    -0.0022         -0.0022            63            63            63            63
mod<addmod_prenormalize>/192_mean                                    -0.0193         -0.0193            74            73            74            73
mod<addmod_prenormalize>/256_mean                                    -0.0155         -0.0155            57            56            57            56
mod<addmod_daosvik_v1>/64_mean                                       -0.0062         -0.0062            32            31            32            31
mod<addmod_daosvik_v1>/128_mean                                      +0.0032         +0.0032            33            33            33            33
mod<addmod_daosvik_v1>/192_mean                                      -0.0040         -0.0040            39            39            39            39
mod<addmod_daosvik_v1>/256_mean                                      +0.0014         +0.0014            34            34            34            34
mod<addmod_daosvik_v2>/64_mean                                       +0.0077         +0.0077            31            32            31            32
mod<addmod_daosvik_v2>/128_mean                                      -0.0024         -0.0024            33            33            33            33
mod<addmod_daosvik_v2>/192_mean                                      +0.0014         +0.0014            39            39            39            39
mod<addmod_daosvik_v2>/256_mean                                      -0.0003         -0.0003            34            34            34            34
mod<mulmod>/64_mean                                                  -0.0283         -0.0283            51            50            51            50
mod<mulmod>/128_mean                                                 -0.0324         -0.0324            61            59            61            59
mod<mulmod>/192_mean                                                 -0.0067         -0.0067            79            78            79            78
mod<mulmod>/256_mean                                                 -0.0123         -0.0123            78            77            78            77
ecmod<addmod_public>_mean                                            +0.2047         +0.2047             9            11             9            11
ecmod<addmod_simple>_mean                                            +0.0066         +0.0066            25            25            25            25
ecmod<addmod_prenormalize>_mean                                      +0.3527         +0.3527             8            10             8            10
ecmod<addmod_daosvik_v1>_mean                                        -0.3898         -0.3899            33            20            33            20
ecmod<addmod_daosvik_v2>_mean                                        +0.3402         +0.3402             8            11             8            11
ecmod<mulmod>_mean                                                   -0.0154         -0.0154            75            73            75            73
binop<uint256, uint256, add>_mean                                    +1.2626         +1.2626             2             5             2             5
binop<uint256, uint256, inline_add>_mean                             +0.0012         +0.0012             2             2             2             2
binop<uint256, uint256, sub>_mean                                    +1.0722         +1.0722             3             5             3             5
binop<uint256, uint256, inline_sub>_mean                             +0.0018         +0.0018             2             2             2             2
binop<uint256, uint256, public_mul>_mean                             +0.0194         +0.0194             6             6             6             6
binop<uint256, uint256, gmp::mul>_mean                               -0.0003         -0.0003            14            14            14            14
binop<uint512, uint256, umul_>_mean                                  -0.1252         -0.1252            14            12            14            12
binop<uint512, uint256, gmp::mul_full>_mean                          +0.0012         +0.0012            15            15            15            15
binop<uint512, uint512, add>_mean                                    +0.4870         +0.4869             4             6             4             6
binop<uint512, uint512, inline_add>_mean                             +0.0043         +0.0043             4             4             4             4
binop<uint512, uint512, sub>_mean                                    +0.4274         +0.4274             4             6             4             6
binop<uint512, uint512, inline_sub>_mean                             +0.0037         +0.0037             4             4             4             4
binop<uint512, uint512, public_mul>_mean                             -0.1096         -0.1096            35            31            35            31
binop<uint512, uint512, gmp::mul>_mean                               -0.0054         -0.0054            52            52            52            52
shift<uint256, uint256, shl_public>/-1_mean                          +0.0120         +0.0120             3             3             3             3
shift<uint256, uint256, shl_public>/0_mean                           -0.0199         -0.0199             4             4             4             4
shift<uint256, uint256, shl_public>/1_mean                           -0.0525         -0.0525             4             4             4             4
shift<uint256, uint256, shl_public>/2_mean                           +0.0003         +0.0002             3             3             3             3
shift<uint256, uint256, shl_public>/3_mean                           +0.0689         +0.0689             2             2             2             2
shift<uint256, uint256, shl_halves>/-1_mean                          -0.0404         -0.0404             3             3             3             3
shift<uint256, uint256, shl_halves>/0_mean                           -0.0309         -0.0309             4             4             4             4
shift<uint256, uint256, shl_halves>/1_mean                           -0.0002         -0.0002             4             4             4             4
shift<uint256, uint256, shl_halves>/2_mean                           -0.1520         -0.1520             3             3             3             3
shift<uint256, uint256, shl_halves>/3_mean                           -0.0950         -0.0950             2             2             2             2
shift<uint256, uint64_t, shl_public>/-1_mean                         -0.0402         -0.0402             3             3             3             3
shift<uint256, uint64_t, shl_public>/0_mean                          -0.1448         -0.1448             4             4             4             4
shift<uint256, uint64_t, shl_public>/1_mean                          -0.1506         -0.1506             4             4             4             4
shift<uint256, uint64_t, shl_public>/2_mean                          -0.0001         -0.0001             3             3             3             3
shift<uint256, uint64_t, shl_public>/3_mean                          -0.0084         -0.0084             2             2             2             2
shift<uint256, uint64_t, shl_halves>/-1_mean                         +0.0003         +0.0003             3             3             3             3
shift<uint256, uint64_t, shl_halves>/0_mean                          -0.0059         -0.0059             4             4             4             4
shift<uint256, uint64_t, shl_halves>/1_mean                          -0.0348         -0.0348             4             4             4             4
shift<uint256, uint64_t, shl_halves>/2_mean                          -0.0160         -0.0160             3             3             3             3
shift<uint256, uint64_t, shl_halves>/3_mean                          +0.0018         +0.0018             2             2             2             2
shift<uint512, uint512, shl_public>/-1_mean                          -0.0194         -0.0194            10            10            10            10
shift<uint512, uint512, shl_public>/0_mean                           -0.0185         -0.0185            10            10            10            10
shift<uint512, uint512, shl_public>/1_mean                           -0.0192         -0.0192            10            10            10            10
shift<uint512, uint512, shl_public>/2_mean                           -0.0183         -0.0183            10            10            10            10
shift<uint512, uint512, shl_public>/3_mean                           -0.0184         -0.0184             9             9             9             9
shift<uint512, uint64_t, shl_public>/-1_mean                         -0.0019         -0.0019             9             9             9             9
shift<uint512, uint64_t, shl_public>/0_mean                          -0.0034         -0.0034             9             9             9             9
shift<uint512, uint64_t, shl_public>/1_mean                          -0.0007         -0.0007             9             9             9             9
shift<uint512, uint64_t, shl_public>/2_mean                          +0.0009         +0.0009             9             9             9             9
shift<uint512, uint64_t, shl_public>/3_mean                          +0.0006         +0.0006             8             8             8             8
compare<lt_public>/64_mean                                           +0.0058         +0.0058             2             2             2             2
compare<lt_public>/128_mean                                          -0.1167         -0.1167             2             2             2             2
compare<lt_public>/192_mean                                          -0.0361         -0.0361             2             2             2             2
compare<lt_public>/256_mean                                          -0.2078         -0.2078             2             1             2             1
compare<lt_sub>/64_mean                                              -0.0066         -0.0066             2             2             2             2
compare<lt_sub>/128_mean                                             -0.0063         -0.0063             2             2             2             2
compare<lt_sub>/192_mean                                             -0.0059         -0.0059             2             2             2             2
compare<lt_sub>/256_mean                                             -0.0066         -0.0066             2             2             2             2
compare<lt_wordcmp>/64_mean                                          -0.0029         -0.0029             2             2             2             2
compare<lt_wordcmp>/128_mean                                         +0.2282         +0.2282             4             5             4             5
compare<lt_wordcmp>/192_mean                                         +0.4004         +0.4004             4             5             4             5
compare<lt_wordcmp>/256_mean                                         +0.4286         +0.4286             4             5             4             5
compare<lt_halves>/64_mean                                           +0.0044         +0.0044             3             3             3             3
compare<lt_halves>/128_mean                                          +0.0032         +0.0032             3             3             3             3
compare<lt_halves>/192_mean                                          +0.0051         +0.0051             3             3             3             3
compare<lt_halves>/256_mean                                          +0.0046         +0.0046             3             3             3             3
exponentiation/64_mean                                               -0.0100         -0.0100           729           722           729           722
exponentiation/128_mean                                              -0.0073         -0.0073          1457          1447          1457          1447
exponentiation/192_mean                                              -0.0125         -0.0125          2190          2163          2190          2163
exponentiation/256_mean                                              -0.0088         -0.0088          2914          2888          2914          2888
exponentiation2/0_mean                                               -0.0201         -0.0201            11            11            11            11
exponentiation2/64_mean                                              -0.0217         -0.0217            11            11            11            11
exponentiation2/128_mean                                             -0.0781         -0.0781            12            11            12            11
exponentiation2/256_mean                                             -0.0216         -0.0216            10            10            10            10
exponentiation2/512_mean                                             -0.0211         -0.0211            10            10            10            10
count_sigificant_words_256/0_mean                                    -0.0005         -0.0005             0             0             0             0
count_sigificant_words_256/1_mean                                    +0.0010         +0.0010             0             0             0             0
count_sigificant_words_256/2_mean                                    +0.0024         +0.0024             0             0             0             0
count_sigificant_words_256/3_mean                                    +0.0009         +0.0009             0             0             0             0
count_sigificant_words_256/4_mean                                    -0.0000         -0.0000             0             0             0             0
count_sigificant_words_256/5_mean                                    -0.0009         -0.0009             0             0             0             0
count_sigificant_words_256/6_mean                                    -0.0011         -0.0011             0             0             0             0
count_sigificant_words_256/7_mean                                    +0.0005         +0.0005             0             0             0             0
count_sigificant_words_256/8_mean                                    +0.0005         +0.0005             0             0             0             0
to_string<uint128>_mean                                              -0.0043         -0.0043           443           441           443           441
to_string<uint256>_mean                                              +0.0029         +0.0029          2428          2434          2428          2434
to_string<uint512>_mean                                              -0.0102         -0.0102          7608          7530          7608          7530

@greg7mdp
Copy link
Author

greg7mdp commented Apr 5, 2022

Thanks for looking into it @chfast ! I'm not sure how to read the numbers you posted, but I am surprised that you found that add/sub got slower. In my tests it was faster. I noticed that sometimes running the same code twice gave different numbers though! Benchmarks are hard.

@@ -182,6 +182,11 @@ inline constexpr uint64_t addc(
}
#endif

if (((x | y) & (uint64_t(1) << 63)) == 0) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ineffective for compilers we care about. GCC and Clang should be using builtins.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we use the builtin how do you explain that my version is slower in the benchmark?

Copy link
Author

@greg7mdp greg7mdp Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I had removed this change. Are you looking at my latest version?
see https://github.com/greg7mdp/intx/blob/master/include/intx/intx.hpp

@greg7mdp
Copy link
Author

greg7mdp commented Apr 5, 2022

Also why diff between add and inline_add?

@chfast
Copy link
Owner

chfast commented Apr 5, 2022

I'm not sure how to read the numbers

This is comparison between master and your changes. Numbers are:

  • CPU time diff,
  • wall-clock time diff,
  • master CPU time in ns,
  • your CPU time in ns,
  • master wall-clock time in ns,
  • your wall-clock time in ns.

This comes from a tool comparing outputs of test/intx-bench.

@chfast
Copy link
Owner

chfast commented Apr 5, 2022

Also why diff between add and inline_add?

Because the benchmark run in a loop the "inline_add" can be vectorized. This is not relevant for EVM use case. I probably could review all benchmark cases and remove half of these.

@greg7mdp
Copy link
Author

greg7mdp commented Apr 5, 2022

Thanks!

the "inline_add" can be vectorized

vectorized or inlined? Why shouldn't it be relevant for EVM?

@chfast
Copy link
Owner

chfast commented Apr 5, 2022

The main difference is that the addc implementation tries to use compiler builtins via __has_builtin() macro. This macro is not available until GCC 10 (although the builtin is supported). This can be "fixed" by adding more preprocessor conditions, but I try to stick the latest compiler versions if performance matters.

https://godbolt.org/z/1cxc6aTG3

@greg7mdp
Copy link
Author

greg7mdp commented Apr 5, 2022

@chfast if you are using gcc 11 and therefore the builtin, I am puzzled that my version would be slower.

return subc(x, y).carry;
for (size_t i = uint<N>::num_words; i-- > 1; ) {
if (x[i] != y[i])
return x[i] < y[i];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This single change looks interesting. Can you submit it as a separate PR for easier verification?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chfast, I did it in a separate PR (#269). I also commented out t he special version for uint256 above.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants