Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some benchmarks #4

Closed
dangershony opened this issue Feb 25, 2020 · 5 comments
Closed

Some benchmarks #4

dangershony opened this issue Feb 25, 2020 · 5 comments

Comments

@dangershony
Copy link

This was posted on discord discussions

image

image

image

image

image

@MithrilMan
Copy link
Owner

image

mithrilshard parallelized is 5x faster than nbitcoin current implementation and almost half allocation
(i'm looking at 2000 headers results, last 4 lines
SequentialBitcoin is how currently it performs hash generation (one by one in a for loop), I added ParallelizedNBitcoin to show the performance improvement using Parallel.Foreach
pretty straightforward

[Benchmark]
public object SequentialNBitcoin()
{
 foreach (var header in headersNBitcoin)
 {
    header.GetHash();
 }
 return null;
}

[Benchmark]
public object ParallelizedNBitcoin()
{
 return Parallel.ForEach(headersNBitcoin, header => header.GetHash());
}

@MithrilMan
Copy link
Owner

this is a proper doublesha comparison against NBitcoin
image

@MithrilMan
Copy link
Owner

MithrilMan commented Feb 25, 2020

image

this one is quite interesting, I was testing different way to serialize bytes into hex strings, I came up with the green ones, I knew already which to pick (the one with Span) but wanted anyway to have some benchmark, actually it would save one more allocation because for this test I passed a byte array in, but in my case I've already a readonly span so I save even more
the interesting thing here, is that this is a generic Hex Encoder that produce hex string that's needed, if I get right the protocol, to generate the block header hash, basically the block id (that's my next step)
and so will be used very much
as you see, I've better performance and even better allocation
something more to put in the bank :slight_smile:
let me push the benchmark if someone is interested
9d9aacb
candidates that I'm going to merge in my codebase are ConvertSpan and ConvertSpanReverse (will be renamed of course)

@MithrilMan
Copy link
Owner

another about block locator
https://media.discordapp.net/attachments/664593513769336875/665402291691716609/unknown.png

the one named Log is my implementation, Loop is NBitcoin
It's not completed, this benchmark just computes the indexes to fetch, I'd expect even more improvement on allocation because I won't do the final ToArray (copy) when populating the BlockLocator
4759f4e
basically I compute ahead the number of sent hashes using log base 2

int itemsToAdd = this.top_height <= 10 ? (this.top_height + 1) : (10 + (int)Math.Ceiling(Math.Log2(this.top_height)));
Span<int> indexes = new Span<int>(ArrayPool<int>.Shared.Rent(itemsToAdd), 0, itemsToAdd);
int index = 0;
int current = this.top_height;
while (index < 10 && current > 0)
{
indexes[index++] = current--;
}
int step = 1;
while (current > 0)
{
indexes[index++] = current;
step *= 2;
current -= step;
}
indexes[itemsToAdd - 1] = 0;
return indexes;

and the adapt the loops to do the job

the Loop method instead is the one used by NBitcoin (that just ported the one stated in the protocol documentation https://en.bitcoin.it/wiki/Protocol_documentation#getblocks)

@MithrilMan
Copy link
Owner

image

I'm very happy about my optimization on allocation (that's more important than CPU in our scenario)
and I improved bitcoin core shift operators code too, I could open an issue or PR about that, basically I spare some pass in <<= and >>=
to recap: all test passed, 1 /4 of NBitcoin memory allocation and more than 100% gain in speed, enough for Target type!

Actually the code is on a branch, this is the code that I ported from bitcoin core from shift operators and improved too

private void ShiftLeft(int shiftAmount)
{
const int ELEMENTS = EXPECTED_SIZE / sizeof(ulong);
const int SIZE = sizeof(ulong) * 8;
Span<ulong> data = MemoryMarshal.CreateSpan(ref Unsafe.As<ulong, ulong>(ref this.part1), ELEMENTS);
Span<ulong> result = stackalloc ulong[ELEMENTS];
result.Clear();
int k = shiftAmount / SIZE;
shiftAmount %= SIZE;
for (int i = 0; i < ELEMENTS - k; i++)
{
if (i + k + 1 < ELEMENTS && shiftAmount != 0)
{
result[i + k + 1] |= data[i] >> (SIZE - shiftAmount);
}
result[i + k] |= data[i] << shiftAmount;
}
result.CopyTo(data);
}
private uint GetCompactMantissa(int shiftAmount)
{
const int ELEMENTS = EXPECTED_SIZE / sizeof(uint);
const int SIZE = sizeof(uint) * 8;
ReadOnlySpan<uint> leftBytes = MemoryMarshal.CreateReadOnlySpan(ref Unsafe.As<ulong, uint>(ref this.part1), ELEMENTS);
int k = shiftAmount / SIZE;
shiftAmount %= SIZE;
Span<uint> result = stackalloc uint[ELEMENTS - k];
result.Clear();
for (int i = k; i < ELEMENTS; i++)
{
if (i - k - 1 >= 0 && shiftAmount != 0)
{
result[i - k - 1] |= leftBytes[i] << (SIZE - shiftAmount);
}
result[i - k] |= (leftBytes[i] >> shiftAmount);
}
return result[0];
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants