streamline/speed up addc/subc by using in/out carry parameter #267

greg7mdp · 2022-04-02T22:54:54Z

No description provided.

… subtraction

greg7mdp · 2022-04-03T23:05:14Z

I believe this version is significantly faster!

chfast · 2022-04-05T12:59:42Z

I benchmarked GCC 11. In general we would need to split these into smaller changes (I can do this myself) for better analysis.
Some quick takeaways:

comparison optimization looks promising. If we can show it is always faster than subtraction this can land.
multiplication looks faster,
the basic add/sub got slower, I plan to inspect the assembly to see what happen.

We should also consult Clang. It can optimize add/sub perfectly so as long it stays this way we can go ahead for changes making it easier for GCC.

clz<uint32_t, intx::clz>_mean                                        -0.0031         -0.0031             0             0             0             0                                                                              
clz<uint32_t, intx::clz_generic>_mean                                +0.0005         +0.0005             1             1             1             1                                                                              
clz<uint64_t, intx::clz>_mean                                        +0.3459         +0.3459             0             1             0             1                                                                              
clz<uint64_t, intx::clz_generic>_mean                                +0.0008         +0.0008             3             3             3             3                                                                              
div_normalize<internal::normalize>_mean                              -0.0001         -0.0001            11            11            11            11                                                                              
reciprocal<uint64_t, neg>_mean                                       +0.0014         +0.0014             0             0             0             0                                                                              
reciprocal<uint64_t, reciprocal_naive>_mean                          +0.0012         +0.0012            19            19            19            19                                                                              
reciprocal<uint64_t, reciprocal_2by1>_mean                           -0.0002         -0.0002             4             4             4             4                                                                              
reciprocal<uint64_t, reciprocal_2by1_noinline>_mean                  +0.0010         +0.0010             4             4             4             4                                                                              
reciprocal<uint128, reciprocal_3by2>_mean                            +0.0012         +0.0012             7             7             7             7                                                                              
reciprocal<uint128, reciprocal_3by2_noinline>_mean                   +0.0072         +0.0072             6             6             6             6                                                                              
udiv64<nop>_mean                                                     +0.0018         +0.0018             1             1             1             1                                                                              
udiv64<udiv_by_reciprocal>_mean                                      +0.0015         +0.0015             7             7             7             7                                                                              
udiv64<udiv_native>_mean                                             -0.0002         -0.0002             7             7             7             7                                                                              
udiv64<soft_div_unr>_mean                                            +0.0137         +0.0137             7             7             7             7                                                                              
udiv64<soft_div_unr_unrolled>_mean                                   +0.0190         +0.0190             8             8             8             8                                                                              
udiv128<gcc_>/0_mean                                                 -0.0005         -0.0005            25            25            25            25                                                                              
udiv128<gcc_>/1_mean                                                 -0.0017         -0.0017            24            24            24            24                                                                              
udiv128<gcc_>/2_mean                                                 -0.0007         -0.0007            29            29            29            29                                                                              
udiv128<gcc_>/3_mean                                                 +0.0027         +0.0027            21            21            21            21                                                                              
udiv128<intx_>/0_mean                                                +0.0047         +0.0047            16            16            16            16                                                                              
udiv128<intx_>/1_mean                                                +0.0035         +0.0035            16            16            16            16                                                                              
udiv128<intx_>/2_mean                                                +0.0023         +0.0023            13            13            13            13                                                                              
udiv128<intx_>/3_mean                                                -0.0006         -0.0006            13            13            13            13                                                                              
udiv128<gmp_>/0_mean                                                 +0.0010         +0.0010            24            24            24            24                                                                              
udiv128<gmp_>/1_mean                                                 +0.0046         +0.0045            25            25            25            25                                                                              
udiv128<gmp_>/2_mean                                                 -0.0007         -0.0007            18            18            18            18                                                                              
udiv128<gmp_>/3_mean                                                 -0.0004         -0.0004            14            14            14            14                                                                              
umul128<intx::uint128, intx::umul>_mean                              -0.0021         -0.0022             0             0             0             0                                                                              
div<uint256, udivrem>/64_mean                                        +0.0221         +0.0221            22            23            22            23                                                                              
div<uint256, udivrem>/128_mean                                       -0.0029         -0.0029            28            28            28            28                                                                              
div<uint256, udivrem>/192_mean                                       -0.0097         -0.0097            33            33            33            33                                                                              
div<uint256, udivrem>/256_mean                                       -0.0031         -0.0031            29            29            29            29                                                                              
div<uint256, gmp::udivrem>/64_mean                                   +0.0238         +0.0238            24            24            24            24                                                                              
div<uint256, gmp::udivrem>/128_mean                                  +0.0119         +0.0119            31            32            31            32                                                                              
div<uint256, gmp::udivrem>/192_mean                                  +0.0037         +0.0037            52            52            52            52                                                                              
div<uint256, gmp::udivrem>/256_mean                                  +0.0109         +0.0109            52            52            52            52                                                                              
div<uint512, udivrem>/64_mean                                        +0.0008         +0.0008            37            37            37            37                                                                              
div<uint512, udivrem>/128_mean                                       -0.0010         -0.0010            47            47            47            47                                                                              
div<uint512, udivrem>/192_mean                                       -0.0039         -0.0039            67            67            67            67                                                                              
div<uint512, udivrem>/256_mean                                       +0.0019         +0.0019            65            65            65            65                                                                              
div<uint512, gmp::udivrem>/64_mean                                   +0.0206         +0.0206            38            38            38            38                                                                              
div<uint512, gmp::udivrem>/128_mean                                  -0.0005         -0.0005            54            54            54            54                                                                              
div<uint512, gmp::udivrem>/192_mean                                  -0.0024         -0.0024            78            78            78            78                                                                              
div<uint512, gmp::udivrem>/256_mean                                  -0.0073         -0.0073            82            81            82            81                                                                              
mod<addmod>/64_mean                                                  -0.0037         -0.0037            29            29            29            29                                                                              
mod<addmod>/128_mean                                                 -0.0069         -0.0069            32            32            32            32                                                                              
mod<addmod>/192_mean                                                 -0.0064         -0.0064            39            38            39            38                                                                              
mod<addmod>/256_mean                                                 -0.0048         -0.0047            35            35            35            35                                                                              
mod<addmod_public>/64_mean                                           -0.0019         -0.0019            32            32            32            32                                                                              
mod<addmod_public>/128_mean                                          +0.0039         +0.0038            34            34            34            34                                                                              
mod<addmod_public>/192_mean                                          +0.0016         +0.0016            40            40            40            40
mod<addmod_public>/256_mean                                          +0.0032         +0.0032            35            35            35            35
mod<addmod_simple>/64_mean                                           +0.0078         +0.0078            31            31            31            31
mod<addmod_simple>/128_mean                                          +0.0012         +0.0012            33            33            33            33
mod<addmod_simple>/192_mean                                          -0.0048         -0.0048            39            39            39            39
mod<addmod_simple>/256_mean                                          +0.0075         +0.0075            34            34            34            34
mod<addmod_prenormalize>/64_mean                                     -0.0509         -0.0509            60            57            60            57
mod<addmod_prenormalize>/128_mean                                    -0.0022         -0.0022            63            63            63            63
mod<addmod_prenormalize>/192_mean                                    -0.0193         -0.0193            74            73            74            73
mod<addmod_prenormalize>/256_mean                                    -0.0155         -0.0155            57            56            57            56
mod<addmod_daosvik_v1>/64_mean                                       -0.0062         -0.0062            32            31            32            31
mod<addmod_daosvik_v1>/128_mean                                      +0.0032         +0.0032            33            33            33            33
mod<addmod_daosvik_v1>/192_mean                                      -0.0040         -0.0040            39            39            39            39
mod<addmod_daosvik_v1>/256_mean                                      +0.0014         +0.0014            34            34            34            34
mod<addmod_daosvik_v2>/64_mean                                       +0.0077         +0.0077            31            32            31            32
mod<addmod_daosvik_v2>/128_mean                                      -0.0024         -0.0024            33            33            33            33
mod<addmod_daosvik_v2>/192_mean                                      +0.0014         +0.0014            39            39            39            39
mod<addmod_daosvik_v2>/256_mean                                      -0.0003         -0.0003            34            34            34            34
mod<mulmod>/64_mean                                                  -0.0283         -0.0283            51            50            51            50
mod<mulmod>/128_mean                                                 -0.0324         -0.0324            61            59            61            59
mod<mulmod>/192_mean                                                 -0.0067         -0.0067            79            78            79            78
mod<mulmod>/256_mean                                                 -0.0123         -0.0123            78            77            78            77
ecmod<addmod_public>_mean                                            +0.2047         +0.2047             9            11             9            11
ecmod<addmod_simple>_mean                                            +0.0066         +0.0066            25            25            25            25
ecmod<addmod_prenormalize>_mean                                      +0.3527         +0.3527             8            10             8            10
ecmod<addmod_daosvik_v1>_mean                                        -0.3898         -0.3899            33            20            33            20
ecmod<addmod_daosvik_v2>_mean                                        +0.3402         +0.3402             8            11             8            11
ecmod<mulmod>_mean                                                   -0.0154         -0.0154            75            73            75            73
binop<uint256, uint256, add>_mean                                    +1.2626         +1.2626             2             5             2             5
binop<uint256, uint256, inline_add>_mean                             +0.0012         +0.0012             2             2             2             2
binop<uint256, uint256, sub>_mean                                    +1.0722         +1.0722             3             5             3             5
binop<uint256, uint256, inline_sub>_mean                             +0.0018         +0.0018             2             2             2             2
binop<uint256, uint256, public_mul>_mean                             +0.0194         +0.0194             6             6             6             6
binop<uint256, uint256, gmp::mul>_mean                               -0.0003         -0.0003            14            14            14            14
binop<uint512, uint256, umul_>_mean                                  -0.1252         -0.1252            14            12            14            12
binop<uint512, uint256, gmp::mul_full>_mean                          +0.0012         +0.0012            15            15            15            15
binop<uint512, uint512, add>_mean                                    +0.4870         +0.4869             4             6             4             6
binop<uint512, uint512, inline_add>_mean                             +0.0043         +0.0043             4             4             4             4
binop<uint512, uint512, sub>_mean                                    +0.4274         +0.4274             4             6             4             6
binop<uint512, uint512, inline_sub>_mean                             +0.0037         +0.0037             4             4             4             4
binop<uint512, uint512, public_mul>_mean                             -0.1096         -0.1096            35            31            35            31
binop<uint512, uint512, gmp::mul>_mean                               -0.0054         -0.0054            52            52            52            52
shift<uint256, uint256, shl_public>/-1_mean                          +0.0120         +0.0120             3             3             3             3
shift<uint256, uint256, shl_public>/0_mean                           -0.0199         -0.0199             4             4             4             4
shift<uint256, uint256, shl_public>/1_mean                           -0.0525         -0.0525             4             4             4             4
shift<uint256, uint256, shl_public>/2_mean                           +0.0003         +0.0002             3             3             3             3
shift<uint256, uint256, shl_public>/3_mean                           +0.0689         +0.0689             2             2             2             2
shift<uint256, uint256, shl_halves>/-1_mean                          -0.0404         -0.0404             3             3             3             3
shift<uint256, uint256, shl_halves>/0_mean                           -0.0309         -0.0309             4             4             4             4
shift<uint256, uint256, shl_halves>/1_mean                           -0.0002         -0.0002             4             4             4             4
shift<uint256, uint256, shl_halves>/2_mean                           -0.1520         -0.1520             3             3             3             3
shift<uint256, uint256, shl_halves>/3_mean                           -0.0950         -0.0950             2             2             2             2
shift<uint256, uint64_t, shl_public>/-1_mean                         -0.0402         -0.0402             3             3             3             3
shift<uint256, uint64_t, shl_public>/0_mean                          -0.1448         -0.1448             4             4             4             4
shift<uint256, uint64_t, shl_public>/1_mean                          -0.1506         -0.1506             4             4             4             4
shift<uint256, uint64_t, shl_public>/2_mean                          -0.0001         -0.0001             3             3             3             3
shift<uint256, uint64_t, shl_public>/3_mean                          -0.0084         -0.0084             2             2             2             2
shift<uint256, uint64_t, shl_halves>/-1_mean                         +0.0003         +0.0003             3             3             3             3
shift<uint256, uint64_t, shl_halves>/0_mean                          -0.0059         -0.0059             4             4             4             4
shift<uint256, uint64_t, shl_halves>/1_mean                          -0.0348         -0.0348             4             4             4             4
shift<uint256, uint64_t, shl_halves>/2_mean                          -0.0160         -0.0160             3             3             3             3
shift<uint256, uint64_t, shl_halves>/3_mean                          +0.0018         +0.0018             2             2             2             2
shift<uint512, uint512, shl_public>/-1_mean                          -0.0194         -0.0194            10            10            10            10
shift<uint512, uint512, shl_public>/0_mean                           -0.0185         -0.0185            10            10            10            10
shift<uint512, uint512, shl_public>/1_mean                           -0.0192         -0.0192            10            10            10            10
shift<uint512, uint512, shl_public>/2_mean                           -0.0183         -0.0183            10            10            10            10
shift<uint512, uint512, shl_public>/3_mean                           -0.0184         -0.0184             9             9             9             9
shift<uint512, uint64_t, shl_public>/-1_mean                         -0.0019         -0.0019             9             9             9             9
shift<uint512, uint64_t, shl_public>/0_mean                          -0.0034         -0.0034             9             9             9             9
shift<uint512, uint64_t, shl_public>/1_mean                          -0.0007         -0.0007             9             9             9             9
shift<uint512, uint64_t, shl_public>/2_mean                          +0.0009         +0.0009             9             9             9             9
shift<uint512, uint64_t, shl_public>/3_mean                          +0.0006         +0.0006             8             8             8             8
compare<lt_public>/64_mean                                           +0.0058         +0.0058             2             2             2             2
compare<lt_public>/128_mean                                          -0.1167         -0.1167             2             2             2             2
compare<lt_public>/192_mean                                          -0.0361         -0.0361             2             2             2             2
compare<lt_public>/256_mean                                          -0.2078         -0.2078             2             1             2             1
compare<lt_sub>/64_mean                                              -0.0066         -0.0066             2             2             2             2
compare<lt_sub>/128_mean                                             -0.0063         -0.0063             2             2             2             2
compare<lt_sub>/192_mean                                             -0.0059         -0.0059             2             2             2             2
compare<lt_sub>/256_mean                                             -0.0066         -0.0066             2             2             2             2
compare<lt_wordcmp>/64_mean                                          -0.0029         -0.0029             2             2             2             2
compare<lt_wordcmp>/128_mean                                         +0.2282         +0.2282             4             5             4             5
compare<lt_wordcmp>/192_mean                                         +0.4004         +0.4004             4             5             4             5
compare<lt_wordcmp>/256_mean                                         +0.4286         +0.4286             4             5             4             5
compare<lt_halves>/64_mean                                           +0.0044         +0.0044             3             3             3             3
compare<lt_halves>/128_mean                                          +0.0032         +0.0032             3             3             3             3
compare<lt_halves>/192_mean                                          +0.0051         +0.0051             3             3             3             3
compare<lt_halves>/256_mean                                          +0.0046         +0.0046             3             3             3             3
exponentiation/64_mean                                               -0.0100         -0.0100           729           722           729           722
exponentiation/128_mean                                              -0.0073         -0.0073          1457          1447          1457          1447
exponentiation/192_mean                                              -0.0125         -0.0125          2190          2163          2190          2163
exponentiation/256_mean                                              -0.0088         -0.0088          2914          2888          2914          2888
exponentiation2/0_mean                                               -0.0201         -0.0201            11            11            11            11
exponentiation2/64_mean                                              -0.0217         -0.0217            11            11            11            11
exponentiation2/128_mean                                             -0.0781         -0.0781            12            11            12            11
exponentiation2/256_mean                                             -0.0216         -0.0216            10            10            10            10
exponentiation2/512_mean                                             -0.0211         -0.0211            10            10            10            10
count_sigificant_words_256/0_mean                                    -0.0005         -0.0005             0             0             0             0
count_sigificant_words_256/1_mean                                    +0.0010         +0.0010             0             0             0             0
count_sigificant_words_256/2_mean                                    +0.0024         +0.0024             0             0             0             0
count_sigificant_words_256/3_mean                                    +0.0009         +0.0009             0             0             0             0
count_sigificant_words_256/4_mean                                    -0.0000         -0.0000             0             0             0             0
count_sigificant_words_256/5_mean                                    -0.0009         -0.0009             0             0             0             0
count_sigificant_words_256/6_mean                                    -0.0011         -0.0011             0             0             0             0
count_sigificant_words_256/7_mean                                    +0.0005         +0.0005             0             0             0             0
count_sigificant_words_256/8_mean                                    +0.0005         +0.0005             0             0             0             0
to_string<uint128>_mean                                              -0.0043         -0.0043           443           441           443           441
to_string<uint256>_mean                                              +0.0029         +0.0029          2428          2434          2428          2434
to_string<uint512>_mean                                              -0.0102         -0.0102          7608          7530          7608          7530

greg7mdp · 2022-04-05T13:30:06Z

Thanks for looking into it @chfast ! I'm not sure how to read the numbers you posted, but I am surprised that you found that add/sub got slower. In my tests it was faster. I noticed that sometimes running the same code twice gave different numbers though! Benchmarks are hard.

chfast · 2022-04-05T13:02:28Z

include/intx/intx.hpp

@@ -182,6 +182,11 @@ inline constexpr uint64_t addc(
    }
 #endif

+    if (((x | y) & (uint64_t(1) << 63)) == 0) {


This is ineffective for compilers we care about. GCC and Clang should be using builtins.

So if we use the builtin how do you explain that my version is slower in the benchmark?

Actually I had removed this change. Are you looking at my latest version?
see https://github.com/greg7mdp/intx/blob/master/include/intx/intx.hpp

greg7mdp · 2022-04-05T13:34:00Z

Also why diff between add and inline_add?

chfast · 2022-04-05T13:39:36Z

I'm not sure how to read the numbers

This is comparison between master and your changes. Numbers are:

CPU time diff,
wall-clock time diff,
master CPU time in ns,
your CPU time in ns,
master wall-clock time in ns,
your wall-clock time in ns.

This comes from a tool comparing outputs of test/intx-bench.

chfast · 2022-04-05T13:41:49Z

Also why diff between add and inline_add?

Because the benchmark run in a loop the "inline_add" can be vectorized. This is not relevant for EVM use case. I probably could review all benchmark cases and remove half of these.

greg7mdp · 2022-04-05T13:43:17Z

Thanks!

the "inline_add" can be vectorized

vectorized or inlined? Why shouldn't it be relevant for EVM?

chfast · 2022-04-05T13:56:05Z

The main difference is that the addc implementation tries to use compiler builtins via __has_builtin() macro. This macro is not available until GCC 10 (although the builtin is supported). This can be "fixed" by adding more preprocessor conditions, but I try to stick the latest compiler versions if performance matters.

https://godbolt.org/z/1cxc6aTG3

greg7mdp · 2022-04-05T14:34:13Z

@chfast if you are using gcc 11 and therefore the builtin, I am puzzled that my version would be slower.

chfast · 2022-04-12T09:57:09Z

include/intx/intx.hpp

-    return subc(x, y).carry;
+    for (size_t i = uint<N>::num_words; i-- > 1; ) {
+        if (x[i] != y[i])
+            return x[i] < y[i];


This single change looks interesting. Can you submit it as a separate PR for easier verification?

Thanks @chfast, I did it in a separate PR (#269). I also commented out t he special version for uint256 above.

sonarqubecloud · 2023-09-20T12:10:29Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
No Duplication information

greg7mdp added 15 commits April 2, 2022 18:53

streamline/speed up addc/subc by using in/out carry parameter

905f86e

add cast for clang build.

ebe54df

implement some code formatting suggestions

9074a9e

Fix a couple typos

ba889d0

fix clang build issues

183afb2

fix code formatting

4ae0f6f

Fix code formatting

fe61916

lint issues

98d704f

add another optimization in addc

1dbe06d

more optimizations. Use shortcuts in comparison instead of doing full…

ef4fa8d

… subtraction

small speedups.

7fab2e0

fix typo

e8528f4

faster comparison

cf27392

make addc/subc branchless

7f33f76

revert previous change so it remains branchless

9ec3554

chfast reviewed Apr 5, 2022

View reviewed changes

chfast reviewed Apr 12, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

streamline/speed up addc/subc by using in/out carry parameter #267

streamline/speed up addc/subc by using in/out carry parameter #267

greg7mdp commented Apr 2, 2022

greg7mdp commented Apr 3, 2022

chfast commented Apr 5, 2022

greg7mdp commented Apr 5, 2022

chfast Apr 5, 2022

greg7mdp Apr 5, 2022

greg7mdp Apr 5, 2022 •

edited

Loading

greg7mdp commented Apr 5, 2022

chfast commented Apr 5, 2022

chfast commented Apr 5, 2022

greg7mdp commented Apr 5, 2022

chfast commented Apr 5, 2022

greg7mdp commented Apr 5, 2022

chfast Apr 12, 2022

greg7mdp Apr 12, 2022

sonarqubecloud bot commented Sep 20, 2023

streamline/speed up addc/subc by using in/out carry parameter #267

Are you sure you want to change the base?

streamline/speed up addc/subc by using in/out carry parameter #267

Conversation

greg7mdp commented Apr 2, 2022

greg7mdp commented Apr 3, 2022

chfast commented Apr 5, 2022

greg7mdp commented Apr 5, 2022

chfast Apr 5, 2022

Choose a reason for hiding this comment

greg7mdp Apr 5, 2022

Choose a reason for hiding this comment

greg7mdp Apr 5, 2022 • edited Loading

Choose a reason for hiding this comment

greg7mdp commented Apr 5, 2022

chfast commented Apr 5, 2022

chfast commented Apr 5, 2022

greg7mdp commented Apr 5, 2022

chfast commented Apr 5, 2022

greg7mdp commented Apr 5, 2022

chfast Apr 12, 2022

Choose a reason for hiding this comment

greg7mdp Apr 12, 2022

Choose a reason for hiding this comment

sonarqubecloud bot commented Sep 20, 2023

greg7mdp Apr 5, 2022 •

edited

Loading