-
Notifications
You must be signed in to change notification settings - Fork 63
Ryzen (CPU model unknown)
travisdowns edited this page May 20, 2018
·
2 revisions
Welcome to uarch-bench (e1d92fb-dirty)
Median CPU speed: 1.499 GHz
Running benchmarks groups using timer clock
** Running benchmark group Default Group **
Benchmark Cycles Nanos
Dependent add chain 1.00 0.67
Independent add chain 0.25 0.17
Dependent imul 64->128 3.00 2.00
Dependent imul 64->64 3.00 2.00
Independent imul 64->128 2.00 1.33
Same location stores 1.00 0.67
Disjoint location stores 1.00 0.67
Dependent push/pop chain 7.00 4.67
Inependent push/pop chain 1.00 0.67
** Inverse throughput for load/16-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
** Inverse throughput for load/32-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
16 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0
32 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
48 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0
** Inverse throughput for load/64-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
16 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0
32 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
48 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0
** Inverse throughput for load/128-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
16 : 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
32 : 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
48 : 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
** Inverse throughput for load/256-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
16 : 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
32 : 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
48 : 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
** Inverse throughput for store/16-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0
16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0
32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0
48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0
** Inverse throughput for store/32-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0
16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0
32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0
48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0
** Inverse throughput for store/64-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
16 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
32 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
48 : 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
** Inverse throughput for store/128-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
16 : 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
32 : 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
48 : 1.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0 2.0 5.0 5.0 5.0
** Inverse throughput for store/256-bit **
offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 : 2.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0
16 : 2.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0
32 : 2.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0
48 : 2.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0 4.0 7.0 7.0 7.0
** Running benchmark group Parallel load/prefetches from fixed-size regions **
Benchmark Cycles Nanos
16-KiB parallel-loads 0.53 0.35
16-KiB parallel-prefetcht0 0.50 0.33
16-KiB parallel-prefetcht1 0.50 0.33
16-KiB parallel-prefetcht2 0.50 0.33
16-KiB parallel-prefetchnta 0.50 0.33
32-KiB parallel-loads 0.53 0.35
32-KiB parallel-prefetcht0 0.50 0.33
32-KiB parallel-prefetcht1 0.50 0.33
32-KiB parallel-prefetcht2 0.50 0.33
32-KiB parallel-prefetchnta 0.50 0.33
64-KiB parallel-loads 2.00 1.33
64-KiB parallel-prefetcht0 0.50 0.33
64-KiB parallel-prefetcht1 0.50 0.33
64-KiB parallel-prefetcht2 0.50 0.33
64-KiB parallel-prefetchnta 0.50 0.33
128-KiB parallel-loads 2.00 1.33
128-KiB parallel-prefetcht0 0.50 0.33
128-KiB parallel-prefetcht1 0.50 0.33
128-KiB parallel-prefetcht2 0.50 0.33
128-KiB parallel-prefetchnta 0.50 0.33
256-KiB parallel-loads 2.00 1.33
256-KiB parallel-prefetcht0 0.50 0.33
256-KiB parallel-prefetcht1 0.50 0.33
256-KiB parallel-prefetcht2 0.50 0.33
256-KiB parallel-prefetchnta 0.50 0.33
512-KiB parallel-loads 2.01 1.34
512-KiB parallel-prefetcht0 0.50 0.33
512-KiB parallel-prefetcht1 0.50 0.33
512-KiB parallel-prefetcht2 0.50 0.34
512-KiB parallel-prefetchnta 0.50 0.33
2048-KiB parallel-loads 2.06 1.37
2048-KiB parallel-prefetcht0 0.51 0.34
2048-KiB parallel-prefetcht1 0.51 0.34
2048-KiB parallel-prefetcht2 0.51 0.34
2048-KiB parallel-prefetchnta 0.50 0.33
** Running benchmark group Serial loads from fixed-size regions **
Benchmark Cycles Nanos
16-KiB serial loads 4.00 2.67
24-KiB serial loads 4.00 2.67
30-KiB serial loads 4.00 2.67
31-KiB serial loads 4.00 2.67
32-KiB serial loads 4.00 2.67
33-KiB serial loads 5.11 3.41
34-KiB serial loads 5.76 3.84
35-KiB serial loads 8.53 5.69
40-KiB serial loads 12.06 8.04
48-KiB serial loads 12.16 8.11
56-KiB serial loads 12.06 8.05
64-KiB serial loads 12.05 8.03
80-KiB serial loads 12.06 8.04
96-KiB serial loads 12.07 8.05
112-KiB serial loads 12.08 8.06
128-KiB serial loads 12.05 8.04
196-KiB serial loads 12.06 8.05
252-KiB serial loads 12.06 8.04
256-KiB serial loads 12.06 8.04
260-KiB serial loads 12.19 8.13
384-KiB serial loads 17.28 11.53
512-KiB serial loads 21.46 14.31
1024-KiB serial loads 35.88 23.93
2048-KiB serial loads 39.74 26.51
** Running benchmark group Store forwaring latency and throughput **
Benchmark Cycles Nanos
Store forward latency delay 0 6.99 4.66
Store forward latency delay 1 6.99 4.66
Store forward latency delay 2 6.99 4.66
Store forward latency delay 3 6.99 4.66
Store forward latency delay 4 6.99 4.66
Store forward latency delay 5 6.31 4.21
Store fwd tput concurrency 1 6.99 4.66
Store fwd tput concurrency 2 3.50 2.33
Store fwd tput concurrency 3 2.33 1.55
Store fwd tput concurrency 4 1.75 1.17
Store fwd tput concurrency 5 1.40 0.93
Store fwd tput concurrency 6 1.17 0.78
Store fwd tput concurrency 7 1.06 0.71
Store fwd tput concurrency 8 1.00 0.67
Store fwd tput concurrency 9 1.00 0.67
Store fwd tput concurrency 10 1.00 0.67
** Running benchmark group Store forwaring latency and throughput **
---------- Oneshot calibration start --------------
Benchmark Cycles Nanos
Oneshot overhead min 89.96 60.00
Oneshot overhead median (used) 104.95 70.00
Oneshot overhead max 104.95 70.00
---------- Oneshot calibration end --------------
oneshot-dummy @ 0x0x494100
Benchmark Sample Cycles Nanos
Empty oneshot bench 1 0.00 0.00
Empty oneshot bench 2 0.00 0.00
Empty oneshot bench 3 0.00 0.00
Empty oneshot bench 4 -14.99 -10.00
Empty oneshot bench 5 -14.99 -10.00
Empty oneshot bench 6 0.00 0.00
Empty oneshot bench 7 -14.99 -10.00
Empty oneshot bench 8 -14.99 -10.00
Empty oneshot bench 9 -14.99 -10.00
Empty oneshot bench 10 0.00 0.00
Empty oneshot bench 11 -14.99 -10.00
Empty oneshot bench 12 -14.99 -10.00
Empty oneshot bench 13 0.00 0.00
Empty oneshot bench 14 0.00 0.00
Empty oneshot bench 15 -14.99 -10.00
Empty oneshot bench 16 -14.99 -10.00
Empty oneshot bench 17 0.00 0.00
Empty oneshot bench 18 0.00 0.00
Empty oneshot bench 19 -14.99 -10.00
Empty oneshot bench 20 -14.99 -10.00
oneshot-latency-2 @ 0x0x4a11c0
Benchmark Sample Cycles Nanos
StFwd oneshot lat (delay 2) 1144767.62 96560.00
StFwd oneshot lat (delay 2) 2144482.76 96370.00
StFwd oneshot lat (delay 2) 3144481.26 96369.00
StFwd oneshot lat (delay 2) 4144482.76 96370.00
StFwd oneshot lat (delay 2) 5144467.77 96360.00
StFwd oneshot lat (delay 2) 6144481.26 96369.00
StFwd oneshot lat (delay 2) 7144482.76 96370.00
StFwd oneshot lat (delay 2) 8144481.26 96369.00
StFwd oneshot lat (delay 2) 9144482.76 96370.00
StFwd oneshot lat (delay 2) 10144467.77 96360.00
StFwd oneshot lat (delay 2) 11144481.26 96369.00
StFwd oneshot lat (delay 2) 12144482.76 96370.00
StFwd oneshot lat (delay 2) 13144482.76 96370.00
StFwd oneshot lat (delay 2) 14144481.26 96369.00
StFwd oneshot lat (delay 2) 15144482.76 96370.00
StFwd oneshot lat (delay 2) 16151587.71101109.00
StFwd oneshot lat (delay 2) 17144482.76 96370.00
StFwd oneshot lat (delay 2) 18144482.76 96370.00
StFwd oneshot lat (delay 2) 19144481.26 96369.00
StFwd oneshot lat (delay 2) 20144482.76 96370.00
oneshot-latency-1 @ 0x0x4a0e40
Benchmark Sample Cycles Nanos
StFwd oneshot lat (delay 1) 1144767.62 96560.00
StFwd oneshot lat (delay 1) 2144482.76 96370.00
StFwd oneshot lat (delay 1) 3144481.26 96369.00
StFwd oneshot lat (delay 1) 4144482.76 96370.00
StFwd oneshot lat (delay 1) 5144482.76 96370.00
StFwd oneshot lat (delay 1) 6144481.26 96369.00
StFwd oneshot lat (delay 1) 7144482.76 96370.00
StFwd oneshot lat (delay 1) 8144482.76 96370.00
StFwd oneshot lat (delay 1) 9144481.26 96369.00
StFwd oneshot lat (delay 1) 10144482.76 96370.00
StFwd oneshot lat (delay 1) 11144466.27 96359.00
StFwd oneshot lat (delay 1) 12144467.77 96360.00
StFwd oneshot lat (delay 1) 13144482.76 96370.00
StFwd oneshot lat (delay 1) 14144481.26 96369.00
StFwd oneshot lat (delay 1) 15144482.76 96370.00
StFwd oneshot lat (delay 1) 16144482.76 96370.00
StFwd oneshot lat (delay 1) 17144481.26 96369.00
StFwd oneshot lat (delay 1) 18144482.76 96370.00
StFwd oneshot lat (delay 1) 19144482.76 96370.00
StFwd oneshot lat (delay 1) 20144481.26 96369.00
oneshot-latency-0 @ 0x0x4a0ac0
Benchmark Sample Cycles Nanos
StFwd oneshot lat (delay 0) 1144691.15 96509.00
StFwd oneshot lat (delay 0) 2144482.76 96370.00
StFwd oneshot lat (delay 0) 3144482.76 96370.00
StFwd oneshot lat (delay 0) 4144481.26 96369.00
StFwd oneshot lat (delay 0) 5144482.76 96370.00
StFwd oneshot lat (delay 0) 6144482.76 96370.00
StFwd oneshot lat (delay 0) 7144481.26 96369.00
StFwd oneshot lat (delay 0) 8144467.77 96360.00
StFwd oneshot lat (delay 0) 9152052.47101419.00
StFwd oneshot lat (delay 0) 10144482.76 96370.00
StFwd oneshot lat (delay 0) 11144482.76 96370.00
StFwd oneshot lat (delay 0) 12144481.26 96369.00
StFwd oneshot lat (delay 0) 13144482.76 96370.00
StFwd oneshot lat (delay 0) 14144482.76 96370.00
StFwd oneshot lat (delay 0) 15144481.26 96369.00
StFwd oneshot lat (delay 0) 16144482.76 96370.00
StFwd oneshot lat (delay 0) 17144466.27 96359.00
StFwd oneshot lat (delay 0) 18144467.77 96360.00
StFwd oneshot lat (delay 0) 19144467.77 96360.00
StFwd oneshot lat (delay 0) 20144466.27 96359.00
** Running benchmark group Store forward attempts **
oneshot-dummy @ 0x0x494100
Benchmark Sample Cycles Nanos
Empty oneshot bench 1 -14.99 -10.00
Empty oneshot bench 2 -14.99 -10.00
Empty oneshot bench 3 -14.99 -10.00
Empty oneshot bench 4 -14.99 -10.00
Empty oneshot bench 5 0.00 0.00
Empty oneshot bench 6 0.00 0.00
Empty oneshot bench 7 -14.99 -10.00
Empty oneshot bench 8 -14.99 -10.00
Empty oneshot bench 9 -14.99 -10.00
Empty oneshot bench 10 0.00 0.00
Empty oneshot bench 11 -14.99 -10.00
Empty oneshot bench 12 -14.99 -10.00
Empty oneshot bench 13 0.00 0.00
Empty oneshot bench 14 0.00 0.00
Empty oneshot bench 15 -14.99 -10.00
Empty oneshot bench 16 -14.99 -10.00
Empty oneshot bench 17 0.00 0.00
Empty oneshot bench 18 -14.99 -10.00
Empty oneshot bench 19 -14.99 -10.00
Empty oneshot bench 20 -14.99 -10.00
stfwd-try1 @ 0x0x4a0780
Benchmark Sample Cycles Nanos
stfwd-try1 1 674.66 450.00
stfwd-try1 2 89.96 60.00
stfwd-try1 3 89.96 60.00
stfwd-try1 4 89.96 60.00
stfwd-try1 5 89.96 60.00
stfwd-try1 6 74.96 50.00
stfwd-try1 7 89.96 60.00
stfwd-try1 8 74.96 50.00
stfwd-try1 9 89.96 60.00
stfwd-try1 10 74.96 50.00
stfwd-try1 11 74.96 50.00
stfwd-try1 12 74.96 50.00
stfwd-try1 13 74.96 50.00
stfwd-try1 14 74.96 50.00
stfwd-try1 15 89.96 60.00
stfwd-try1 16 74.96 50.00
stfwd-try1 17 89.96 60.00
stfwd-try1 18 74.96 50.00
stfwd-try1 19 89.96 60.00
stfwd-try1 20 89.96 60.00
stfwd-try2 @ 0x0x4a02c0
Benchmark Sample Cycles Nanos
stfwd-try2 100 loads 1 614.69 410.00
stfwd-try2 100 loads 2 3658.17 2440.00
stfwd-try2 100 loads 3 254.87 170.00
stfwd-try2 100 loads 4 254.87 170.00
stfwd-try2 100 loads 5 254.87 170.00
stfwd-try2 100 loads 6 254.87 170.00
stfwd-try2 100 loads 7 269.87 180.00
stfwd-try2 100 loads 8 254.87 170.00
stfwd-try2 100 loads 9 254.87 170.00
stfwd-try2 100 loads 10 254.87 170.00
stfwd-try2 100 loads 11 254.87 170.00
stfwd-try2 100 loads 12 269.87 180.00
stfwd-try2 100 loads 13 254.87 170.00
stfwd-try2 100 loads 14 254.87 170.00
stfwd-try2 100 loads 15 254.87 170.00
stfwd-try2 100 loads 16 254.87 170.00
stfwd-try2 100 loads 17 269.87 180.00
stfwd-try2 100 loads 18 254.87 170.00
stfwd-try2 100 loads 19 254.87 170.00
stfwd-try2 100 loads 20 254.87 170.00
stfwd-try2-4 @ 0x0x49d200
Benchmark Sample Cycles Nanos
stfwd-try2 4 loads 1 74.96 50.00
stfwd-try2 4 loads 2 164.92 110.00
stfwd-try2 4 loads 3 -14.99 -10.00
stfwd-try2 4 loads 4 0.00 0.00
stfwd-try2 4 loads 5 0.00 0.00
stfwd-try2 4 loads 6 -14.99 -10.00
stfwd-try2 4 loads 7 -14.99 -10.00
stfwd-try2 4 loads 8 -14.99 -10.00
stfwd-try2 4 loads 9 0.00 0.00
stfwd-try2 4 loads 10 0.00 0.00
stfwd-try2 4 loads 11 -14.99 -10.00
stfwd-try2 4 loads 12 -14.99 -10.00
stfwd-try2 4 loads 13 -14.99 -10.00
stfwd-try2 4 loads 14 0.00 0.00
stfwd-try2 4 loads 15 0.00 0.00
stfwd-try2 4 loads 16 -14.99 -10.00
stfwd-try2 4 loads 17 -14.99 -10.00
stfwd-try2 4 loads 18 -14.99 -10.00
stfwd-try2 4 loads 19 0.00 0.00
stfwd-try2 4 loads 20 0.00 0.00
stfwd-try2-10 @ 0x0x49d240
Benchmark Sample Cycles Nanos
stfwd-try2 10 loads 1 44.98 30.00
stfwd-try2 10 loads 2 389.81 260.00
stfwd-try2 10 loads 3 0.00 0.00
stfwd-try2 10 loads 4 0.00 0.00
stfwd-try2 10 loads 5 0.00 0.00
stfwd-try2 10 loads 6 0.00 0.00
stfwd-try2 10 loads 7 -14.99 -10.00
stfwd-try2 10 loads 8 -14.99 -10.00
stfwd-try2 10 loads 9 -14.99 -10.00
stfwd-try2 10 loads 10 0.00 0.00
stfwd-try2 10 loads 11 0.00 0.00
stfwd-try2 10 loads 12 0.00 0.00
stfwd-try2 10 loads 13 0.00 0.00
stfwd-try2 10 loads 14 0.00 0.00
stfwd-try2 10 loads 15 -14.99 -10.00
stfwd-try2 10 loads 16 -14.99 -10.00
stfwd-try2 10 loads 17 0.00 0.00
stfwd-try2 10 loads 18 0.00 0.00
stfwd-try2 10 loads 19 0.00 0.00
stfwd-try2 10 loads 20 0.00 0.00
stfwd-try2-20 @ 0x0x4a01c0
Benchmark Sample Cycles Nanos
stfwd-try2 20 loads 1 509.75 340.00
stfwd-try2 20 loads 2 734.63 490.00
stfwd-try2 20 loads 3 14.99 10.00
stfwd-try2 20 loads 4 29.99 20.00
stfwd-try2 20 loads 5 14.99 10.00
stfwd-try2 20 loads 6 29.99 20.00
stfwd-try2 20 loads 7 14.99 10.00
stfwd-try2 20 loads 8 14.99 10.00
stfwd-try2 20 loads 9 14.99 10.00
stfwd-try2 20 loads 10 14.99 10.00
stfwd-try2 20 loads 11 14.99 10.00
stfwd-try2 20 loads 12 14.99 10.00
stfwd-try2 20 loads 13 14.99 10.00
stfwd-try2 20 loads 14 29.99 20.00
stfwd-try2 20 loads 15 14.99 10.00
stfwd-try2 20 loads 16 29.99 20.00
stfwd-try2 20 loads 17 14.99 10.00
stfwd-try2 20 loads 18 14.99 10.00
stfwd-try2 20 loads 19 14.99 10.00
stfwd-try2 20 loads 20 14.99 10.00
stfwd-try2-1000 @ 0x0x49d2c0
Benchmark Sample Cycles Nanos
stfwd-try2 1000 loads 1 32188.91 21470.00
stfwd-try2 1000 loads 2 36236.88 24170.00
stfwd-try2 1000 loads 3 2968.52 1980.00
stfwd-try2 1000 loads 4 2968.52 1980.00
stfwd-try2 1000 loads 5 2953.52 1970.00
stfwd-try2 1000 loads 6 2953.52 1970.00
stfwd-try2 1000 loads 7 2968.52 1980.00
stfwd-try2 1000 loads 8 2968.52 1980.00
stfwd-try2 1000 loads 9 2968.52 1980.00
stfwd-try2 1000 loads 10 2953.52 1970.00
stfwd-try2 1000 loads 11 2953.52 1970.00
stfwd-try2 1000 loads 12 2968.52 1980.00
stfwd-try2 1000 loads 13 2968.52 1980.00
stfwd-try2 1000 loads 14 2968.52 1980.00
stfwd-try2 1000 loads 15 2953.52 1970.00
stfwd-try2 1000 loads 16 2953.52 1970.00
stfwd-try2 1000 loads 17 2968.52 1980.00
stfwd-try2 1000 loads 18 2968.52 1980.00
stfwd-try2 1000 loads 19 2968.52 1980.00
stfwd-try2 1000 loads 20 2953.52 1970.00
stfwd-try2-1000w @ 0x0x49d2c0
Benchmark Sample Cycles Nanos
stfwd-try2 1000 loads warm 1 2983.51 1990.00
stfwd-try2 1000 loads warm 2 36236.88 24170.00
stfwd-try2 1000 loads warm 3 36236.88 24170.00
stfwd-try2 1000 loads warm 4 36236.88 24170.00
stfwd-try2 1000 loads warm 5 36251.87 24180.00
stfwd-try2 1000 loads warm 6 36236.88 24170.00
stfwd-try2 1000 loads warm 7 36236.88 24170.00
stfwd-try2 1000 loads warm 8 36236.88 24170.00
stfwd-try2 1000 loads warm 9 36236.88 24170.00
stfwd-try2 1000 loads warm 10 36235.38 24169.00
stfwd-try2 1000 loads warm 11 36236.88 24170.00
stfwd-try2 1000 loads warm 12 36236.88 24170.00
stfwd-try2 1000 loads warm 13 36236.88 24170.00
stfwd-try2 1000 loads warm 14 36251.87 24180.00
stfwd-try2 1000 loads warm 15 36236.88 24170.00
stfwd-try2 1000 loads warm 16 36236.88 24170.00
stfwd-try2 1000 loads warm 17 36236.88 24170.00
stfwd-try2 1000 loads warm 18 36236.88 24170.00
stfwd-try2 1000 loads warm 19 36236.88 24170.00
stfwd-try2 1000 loads warm 20 36235.38 24169.00
stfwd-try2b @ 0x0x4a02c0
Benchmark Sample Cycles Nanos
stfwd-try2 100 loads 1 254.87 170.00
stfwd-try2 100 loads 2 254.87 170.00
stfwd-try2 100 loads 3 254.87 170.00
stfwd-try2 100 loads 4 269.87 180.00
stfwd-try2 100 loads 5 254.87 170.00
stfwd-try2 100 loads 6 254.87 170.00
stfwd-try2 100 loads 7 254.87 170.00
stfwd-try2 100 loads 8 254.87 170.00
stfwd-try2 100 loads 9 269.87 180.00
stfwd-try2 100 loads 10 254.87 170.00
stfwd-try2 100 loads 11 254.87 170.00
stfwd-try2 100 loads 12 254.87 170.00
stfwd-try2 100 loads 13 254.87 170.00
stfwd-try2 100 loads 14 269.87 180.00
stfwd-try2 100 loads 15 254.87 170.00
stfwd-try2 100 loads 16 254.87 170.00
stfwd-try2 100 loads 17 254.87 170.00
stfwd-try2 100 loads 18 254.87 170.00
stfwd-try2 100 loads 19 269.87 180.00
stfwd-try2 100 loads 20 254.87 170.00
stfwd-try2c @ 0x0x49d180
Benchmark Sample Cycles Nanos
trained loads 1 209.90 140.00
trained loads 2 209.90 140.00
trained loads 3 149.93 100.00
trained loads 4 149.93 100.00
trained loads 5 74.96 50.00
trained loads 6 74.96 50.00
trained loads 7 74.96 50.00
trained loads 8 59.97 40.00
trained loads 9 59.97 40.00
trained loads 10 59.97 40.00
trained loads 11 29.99 20.00
trained loads 12 44.98 30.00
trained loads 13 29.99 20.00
trained loads 14 44.98 30.00
trained loads 15 44.98 30.00
trained loads 16 44.98 30.00
trained loads 17 44.98 30.00
trained loads 18 29.99 20.00
trained loads 19 44.98 30.00
trained loads 20 44.98 30.00
** Running benchmark group Miscellaneous tests **
Benchmark Cycles Nanos
32-bit add-loop 2.50 1.67
64-bit add-loop 2.50 1.67
Can port7 be used by loads 1.50 1.00
Test micro-fused add 1.00 0.67
Add-JO fusion 1.00 0.67
Flag merge 1 1.24 0.83
Flag merge 2 1.17 0.78
Flag merge 3 1.24 0.83
Loop weirdness fast 6.99 4.66
** Running benchmark group Fusion tests from dendibakh blog **
Benchmark Cycles Nanos
Crosses 64-byte i-boundary 300.83 200.65
No cross 64-byte i-boundary 173.95 116.02
Fused (original) 1.38 0.92
Fused (simple addr) 1.36 0.91
Fused (add [reg + reg * 4], 1) 1.38 0.92
Fused (add [reg], 1) 1.36 0.91
Unfused (original) 1.61 1.07
Fused summation 2.15 1.44
Unfused summation 1.63 1.08
** Running benchmark group BMI false-dependency tests **
Benchmark Cycles Nanos
dest-dependent tzcnt 0.50 0.34
dest-dependent lzcnt 0.25 0.17
dest-dependent popcnt 0.25 0.17
** Running benchmark group retpoline tests **
Benchmark Cycles Nanos
Dense retpoline call pause 55.60 37.08
Dense retpoline call lfence 55.48 37.01
Dense indirect pred calls 4.15 2.77
Dense indirect unpred calls 21.38 14.26
Sparse retpo indep call pause 13.69 9.13
Sparse retpo indep call lfence 15.43 10.29
Sparse retpo dep call pause 46.79 31.21
Sparse retpo dep call lfence 47.29 31.54
** Running benchmark group Tests written in C++ **
Benchmark Cycles Nanos
Dependent inline divisions 16.99 11.33
Dependent 64-bit divisions 16.99 11.33
Independent inline divisions 14.53 9.69
Independent divisions 14.53 9.69
Linked-list w/ sentinel 9.74 6.49
Linked-list w/ count 10.14 6.76
** Running benchmark group Vector unit bypass latency **
Benchmark Cycles Nanos
movdqa [mem] -> paddb latency 10.99 7.33
movdqu [mem] -> paddb latency 10.99 7.33
movups [mem] -> paddb latency 10.99 7.33
movupd [mem] -> paddb latency 10.99 7.33
movq rax,xmm0 -> xmm0,rax lat 6.00 4.00
movq rax,xmm0 -> xmm0,rax lat 6.00 4.00
** Running benchmark group Vector load-load latency **
Benchmark Cycles Nanos
aligned movdqu load lat 9.99 6.67
aligned vmovdqu load lat 9.99 6.67
aligned lddqu load lat 9.99 6.67
aligned vlddqu load lat 9.99 6.67
misaligned movdqu load lat 10.99 7.33
misaligned vmovdqu load lat 10.99 7.33
misaligned lddqu load lat 10.99 7.33
misaligned vlddqu load lat 10.99 7.33
** Running benchmark group Call/ret benchmarks **
Benchmark Cycles Nanos
calls sparsed by 0 4.12 2.75
calls sparsed by 1 4.19 2.79
calls sparsed by 2 4.12 2.75
calls sparsed by 3 4.25 2.83
calls sparsed by 4 4.31 2.87
calls sparsed by 5 5.00 3.33
calls sparsed by 6 6.00 4.00
calls sparsed by 7 7.00 4.67
calls chained by 0 4.06 2.71
calls chained by 1 4.06 2.71
calls chained by 2 4.06 2.71
calls chained by 3 4.06 2.71
calls to pushpop fn 7.00 4.67
calls to addrsp0 fn 13.99 9.33
calls to addrsp8 fn 13.99 9.33
** Running benchmark group Oneshot Group **
dep-add-oneshot @ 0x0x494380
Benchmark Sample Cycles Nanos
Oneshot dep add chain 1 1.51 1.01
Oneshot dep add chain 2 0.70 0.47
Oneshot dep add chain 3 0.70 0.47
Oneshot dep add chain 4 0.70 0.47
Oneshot dep add chain 5 0.70 0.47
Oneshot dep add chain 6 0.70 0.47
Oneshot dep add chain 7 0.70 0.47
Oneshot dep add chain 8 0.70 0.47
Oneshot dep add chain 9 0.70 0.47
Oneshot dep add chain 10 0.70 0.47
Oneshot dep add chain 11 0.70 0.47
Oneshot dep add chain 12 0.70 0.47
Oneshot dep add chain 13 0.70 0.47
Oneshot dep add chain 14 0.70 0.47
Oneshot dep add chain 15 0.70 0.47
Oneshot dep add chain 16 0.70 0.47
Oneshot dep add chain 17 0.70 0.47
Oneshot dep add chain 18 0.70 0.47
Oneshot dep add chain 19 0.70 0.47
Oneshot dep add chain 20 0.70 0.47
indep-add-oneshot @ 0x0x495ac0
Benchmark Sample Cycles Nanos
Oneshot indep add chain 1 2.51 1.68
Oneshot indep add chain 2 0.19 0.12
Oneshot indep add chain 3 0.26 0.18
Oneshot indep add chain 4 0.22 0.15
Oneshot indep add chain 5 0.22 0.15
Oneshot indep add chain 6 0.22 0.15
Oneshot indep add chain 7 0.22 0.15
Oneshot indep add chain 8 0.26 0.18
Oneshot indep add chain 9 0.22 0.15
Oneshot indep add chain 10 0.22 0.15
Oneshot indep add chain 11 0.22 0.15
Oneshot indep add chain 12 0.22 0.15
Oneshot indep add chain 13 0.26 0.18
Oneshot indep add chain 14 0.22 0.15
Oneshot indep add chain 15 0.22 0.15
Oneshot indep add chain 16 0.22 0.15
Oneshot indep add chain 17 0.22 0.15
Oneshot indep add chain 18 0.26 0.18
Oneshot indep add chain 19 0.22 0.15
Oneshot indep add chain 20 0.22 0.15
dep-add128 @ 0x0x4941c0
Benchmark Sample Cycles Nanos
128 dependent add instructions 1 3.98 2.66
128 dependent add instructions 2 0.59 0.39
128 dependent add instructions 3 0.59 0.39
128 dependent add instructions 4 0.59 0.39
128 dependent add instructions 5 0.59 0.39
128 dependent add instructions 6 0.59 0.39
128 dependent add instructions 7 0.59 0.39
128 dependent add instructions 8 0.59 0.39
128 dependent add instructions 9 0.70 0.47
128 dependent add instructions 10 0.70 0.47
128 dependent add instructions 11 0.70 0.47
128 dependent add instructions 12 0.70 0.47
128 dependent add instructions 13 0.70 0.47
128 dependent add instructions 14 0.70 0.47
128 dependent add instructions 15 0.70 0.47
128 dependent add instructions 16 0.70 0.47
128 dependent add instructions 17 0.59 0.39
128 dependent add instructions 18 0.59 0.39
128 dependent add instructions 19 0.59 0.39
128 dependent add instructions 20 0.59 0.39
oneshot-dummy-touch @ 0x0x494180
Benchmark Sample Cycles Nanos
Empty touched oneshot bench 1 44.98 30.00
Empty touched oneshot bench 2 -14.99 -10.00
Empty touched oneshot bench 3 -14.99 -10.00
Empty touched oneshot bench 4 0.00 0.00
Empty touched oneshot bench 5 0.00 0.00
Empty touched oneshot bench 6 -14.99 -10.00
Empty touched oneshot bench 7 -14.99 -10.00
Empty touched oneshot bench 8 0.00 0.00
Empty touched oneshot bench 9 0.00 0.00
Empty touched oneshot bench 10 -14.99 -10.00
Empty touched oneshot bench 11 -14.99 -10.00
Empty touched oneshot bench 12 0.00 0.00
Empty touched oneshot bench 13 0.00 0.00
Empty touched oneshot bench 14 -14.99 -10.00
Empty touched oneshot bench 15 -14.99 -10.00
Empty touched oneshot bench 16 0.00 0.00
Empty touched oneshot bench 17 -14.99 -10.00
Empty touched oneshot bench 18 -14.99 -10.00
Empty touched oneshot bench 19 -14.99 -10.00
Empty touched oneshot bench 20 0.00 0.00
oneshot-dummy-notouch @ 0x0x494140
Benchmark Sample Cycles Nanos
Empty untouched oneshot bench 1 74.96 50.00
Empty untouched oneshot bench 2 -14.99 -10.00
Empty untouched oneshot bench 3 0.00 0.00
Empty untouched oneshot bench 4 0.00 0.00
Empty untouched oneshot bench 5 -14.99 -10.00
Empty untouched oneshot bench 6 -14.99 -10.00
Empty untouched oneshot bench 7 0.00 0.00
Empty untouched oneshot bench 8 0.00 0.00
Empty untouched oneshot bench 9 -14.99 -10.00
Empty untouched oneshot bench 10 -14.99 -10.00
Empty untouched oneshot bench 11 0.00 0.00
Empty untouched oneshot bench 12 -14.99 -10.00
Empty untouched oneshot bench 13 -14.99 -10.00
Empty untouched oneshot bench 14 -14.99 -10.00
Empty untouched oneshot bench 15 0.00 0.00
Empty untouched oneshot bench 16 -14.99 -10.00
Empty untouched oneshot bench 17 -14.99 -10.00
Empty untouched oneshot bench 18 0.00 0.00
Empty untouched oneshot bench 19 0.00 0.00
Empty untouched oneshot bench 20 -14.99 -10.00