-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update syncing logic to fix duplicate block requests #3410
Update syncing logic to fix duplicate block requests #3410
Conversation
9f546d8
to
88d5574
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some nits
Co-authored-by: ljedrz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I think I'll feel a bit more comfortable with the change after we run our gambit of sync tests and malice tests, but otherwise logic looks good
Note that the PR description mentions sync tests were run on deployed network. We don't have any malicious sync tests yet, but we will one day! |
Motivation
As issue #3404 described, validators send out duplicate block requests when syncing. This PR updates the syncing logic to make sure we only remove a block response after it was added to the ledger or if we encountered an error.
For background, currently, syncing works as follows:
BlockSync
requests the blocks, and stores it in the internalresponses
map.Sync
callsblock_sync.process_next_block(current_height)
to get the next block to process. This removes it fromBlockSync
’sresponses
and returns it toSync
.Sync
performs the checks, and if successful, adds the block to theledger
. This also updates BlockSync’scanon
.The issue is that
BlockSync
‘s view of thecanon
is only updated afterSync
is done processing blocks (and adding to the ledger). There’s a window whereBlockSync
is unaware of blocks that are pending validation inSync
. Thus, it re-requests them.This PR adjusts the block processing interface to the
Sync
module as follows:process_next_block
READS a block from a particular heightremove_block_response
REMOVES a block from a particular height, which MUST be called after advancing completed or failedNote: in the latest commit,
process_next_block
was renamed topeek_next_block
, following the reviewer discussion.Test Plan
Ran it locally and verified it removes the duplicate block request.
Also ran it with light load in a cloud devnet, and verified syncing works for validators and clients.
The following figure shows a syncing process before the fix, with a longer pause until a duplicate request is generated:
The following figure shows the syncing process with the fix (note there is no double request and no delay, as such, the syncing is much faster):
Closes #3404