Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for reorganizing shuffling code #6386

Closed
matthewkeil opened this issue Feb 2, 2024 · 5 comments
Closed

Proposal for reorganizing shuffling code #6386

matthewkeil opened this issue Feb 2, 2024 · 5 comments
Labels
meta-discussion Indicates a topic that requires input from various developers. meta-feature-request Issues to track feature requests. scope-performance Performance issue and ideas to improve performance.

Comments

@matthewkeil
Copy link
Member

Problem description

Currently we are calculating the shuffling for the next epoch during the state transition of the current epoch.

https://github.com/ChainSafe/lodestar/blob/00dfa63413e9b2a57dbaea70ac7151ef134b0c05/packages/state-transition/src/cache/epochCache.ts#L513-517

This takes about a second.

Screenshot 2024-02-02 at 11 13 35 AM (2)

We propose to offload that work to idle time not on critical path during epoch transition which should shrink the transition duration by about a second.

Screenshot 2024-02-02 at 11 14 00 AM (2)

The nextShuffling is not necessary for epoch transition but is useful for calculating the next epoch proposers and sync committee duties so pre-computation is still recommended but it just doesnt need to be on critical path.

Solution description

Currently the shuffling code is housed in state-transition and the shuffling results are available, on the EpochCache. We propose to move that all to the ShufflingCache in `beacon-node.

This will facilitate a couple of things. We will still have access to the results for service via API, the current consumers of the nextShuffling and the shuffling results can be passed into the state-transition pre-computed. It will also open up the ability to do the shuffling on a separate thread or async in main thread idle times.

As a result of moving the shuffling to beacon-node the getBeaconProposersNextEpoch and similar getters would also need to be moved out of state-transition as they rely on the nextShuffling that will only be available in beacon-node after a period of time (so they cannot be precomputed until the nextShuffling is finished).

I am looking either moving those onto the ShufflingCache but that feels like mixing of concerns so potentially I will create a PrecomputationCache or similar to house that type of convenience information that gets served by the API.

Additional context

What are everyones thoughts on the approach and process?

Step 1) refactor code to move shuffling fully out of state-transition and into beacon-node. This will not change epoch transition times but will set the groundwork for the next steps
Step 2) create async processing code to move shuffling off critical path
Step 3) potentially create worker for precomputation tasks and implement with napi-nice to make sure kernel scheduling prioritizes main, network and bls threads

@matthewkeil matthewkeil added meta-feature-request Issues to track feature requests. meta-discussion Indicates a topic that requires input from various developers. scope-performance Performance issue and ideas to improve performance. labels Feb 2, 2024
@twoeths
Copy link
Contributor

twoeths commented Feb 2, 2024

yes I strongly support the idea because having shuffling inside both EpochCache and ShufflingCache is not great. Some more contexts:

  • the cache of shuffling inside state-transition is not for itself, it's only for beacon node to consume later
  • it does not look right for beacon-node to call state-transition for a cache, state-transition should be for state transition only, beacon-node should use the ShufflingCache
  • n-historical state work would limit the states due to its size but we can cache way more EpochShuffling because it's so lightweight
  • now when loading state from disk the api has to provide a ShufflingGetter function. It should be able to load state without it

@wemeetagain
Copy link
Member

@matthewkeil this approach sounds good to me

@wemeetagain
Copy link
Member

@twoeths
Copy link
Contributor

twoeths commented Apr 3, 2024

it's not possible to get rid of shuffling inside CachedBeaconState, it's needed in at least 2 places:

  • upgradeStateToAltair()
  • processSyncCommitteeUpdates()

this is required in the processSlots() where it spans through multiple epoch transitions

update: we don't need epoch shuffling for the above apis as long as we cache previous/current/next active validator indices

  • upgradeStateToAltair: can rebuild the committees/shuffling based on active validator indices and cached randao mixes inside the state
  • processSyncCommitteeUpdates: it only needs next epoch validator indices which is available after beforeProcessEpoch
  • but for processAttestations we need it
    const committeeIndices = epochCtx.getBeaconCommittee(data.slot, data.index);

@wemeetagain
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-discussion Indicates a topic that requires input from various developers. meta-feature-request Issues to track feature requests. scope-performance Performance issue and ideas to improve performance.
Projects
None yet
Development

No branches or pull requests

3 participants