-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] Selection of transition ID in finalize. #2575
base: staging
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very knowledgeable about this setup, but I don't see any engineering issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple comments about the migration
process + seeding the nonce
synthesizer/process/src/finalize.rs
Outdated
let call_graph = if state.block_height() >= N::CONSENSUS_V3_HEIGHT { | ||
Default::default() | ||
} else { | ||
self.construct_call_graph(execution)? | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a bit tangential to what's being fixed, but my fear is this approach of a CONSENSUS_VX_HEIGHT
is quickly creating a situation where we could have double digit consensus versions and migratory logic like this in many places within the the SnarkVM codebase. This could easily lead to nontrivial errors.
At some point soon, the community should come up with some kind of migration model and spec which lays out a standard format for migration.
I haven't thought this out in enough depth yet, but I imagine we might have some kind of migration decorator
-style pattern (perhaps a macro? Enums that apply migration functions, etc.?) that clearly denotes a change in consensus rules, specifies the logic at each and does a standard set of checks.
After the higher priority fixes mainnet
the community should do some active design to keep future updates and fixes maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good call out.
Your macro suggestion reminded me of the malice
approach, maybe we can borrow the design there.
We could also enforce stricter standards around migration logic, requiring that a README is maintained in this repo, which defines the invariants around each new CONSENSUS_V*_HEIGHT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 very much something worth thinking about and documenting.
One design principle I'm confident about (which also went wrong in the first design of the Fee lowering PR) is that wherever consensus changes are introduced, they should be independent full self-contained functionality, and not complex derivations of each other's functionality.
A second one: whether we should always use <
or >=
or a match for the comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iamalwaysuncomfortable this is a wonderful point. Thank you for bringing it up.
I also have a great dislike for CONSENSUS_VX_HEIGHT
as it will quickly grow out of control. It's definitely worth exploring cleaner solutions. There should be plenty of examples of other chains that have managed quick bug fixes, hard forks, etc. that we can learn from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provable protocol team will hereby promise to do a small lit review in january.
}; | ||
|
||
// Increment the nonce. | ||
nonce += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this random enough? It seems like this might be in danger of creating aborted transitions if say, the user accidentally broadcasts twice. Maybe some kind of replay shenanigans are possible? Nothing immediately comes to mind but it's worth thinking out scenarios.
Would it be worth initializing the nonce not as 0
but as an prn
within a range that couldn't reach saturation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to this change, the on-chain RNG for any given transition was seeded by the bits of
registers.state().random_seed(),
**registers.transition_id(),
stack.program_id(),
registers.function_name(),
self.destination.locator(),
self.destination_type.type_id(),
seeds
In the new model, we do
registers.state().random_seed(),
**registers.transition_id(),
stack.program_id(),
registers.function_name(),
registers.nonce(),
self.destination.locator(),
self.destination_type.type_id(),
seeds
The subtle difference is that in the new model, we use the transition ID of the root transition in a transaction. The nonce is incremented for each child transition, ensuring uniqueness in the seed without requiring the need to determine the child transition ID.
In general, the seed should be deterministic but unique for each on-chain execution of a transition.
synthesizer/process/src/finalize.rs
Outdated
@@ -104,7 +104,11 @@ impl<N: Network> Process<N> { | |||
lap!(timer, "Verify the number of transitions"); | |||
|
|||
// Construct the call graph. | |||
let call_graph = self.construct_call_graph(execution)?; | |||
let call_graph = if state.block_height() >= N::CONSENSUS_V3_HEIGHT { | |||
Default::default() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would None
be clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value here should be an empty hash map.
Are you suggesting to define call_graph
on L188 as an Option<HashMap<N::TransitionID, Vec<N::TransitionID>>>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is indeed the suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, I feel like adding the Option
requires changing the interface and destructuring later.
No strong opinions so I'll defer to reviewers recommendation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think
Default::default()
is fine here to reduce LoC changed. - I would prefer a match case instead of a
let else
structure. Just to be consistent with how we did it withCONSENSUS_V2_HEIGHT
. Example below:
match block_height < N::CONSENSUS_V2_HEIGHT {
true => coinbase_reward_v1(...),
false => coinbase_reward_v2(...),
}
synthesizer/process/src/finalize.rs
Outdated
// Insert the fee transition. | ||
call_graph.insert(*fee.transition_id(), Vec::new()); | ||
let call_graph = if state.block_height() >= N::CONSENSUS_V3_HEIGHT { | ||
Default::default() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would None
be clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
synthesizer/tests/tests/vm/execute_and_finalize/interleave_async_and_non_async.aleo
Outdated
Show resolved
Hide resolved
self.destination_type.type_id(), | ||
seeds | ||
]; | ||
let preimage = if registers.state().block_height() >= N::CONSENSUS_V3_HEIGHT { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto above: I think a match
case is preferred for clarity.
Overall, logic changes seem fairly sound to me. I only had a few nits. I think if we can run sufficient testing on dev/iso-nets, we can ship this out with the next migration (V3). |
Ran some tests with light load and validators resetting ledgers and reconnecting, seems stable so far 👍 |
Motivation
The motivating issue for this PR is here.
From the original discussion:
This PR proposes a solution to this issue by introducing a nonce to
FinalizeRegisters
. This nonce is used to seed therand
commands. Furthermore, instead of attempting to determine thechild_transition_id
that corresponds to an awaitedFuture
from the call graph, this approach uses the main transition ID to initialize allFinalizeRegisters
for a given transaction. The main transition ID along with the nonce ensures that each finalize context uses a unique seed for the RNG, while removing the need to correctly determine the transition ID for a givenFuture
(a complicated process).Migration
This proposal has been written to migrate at
N::CONSENSUS_V2_HEIGHT
.Given timelines, it's more likely that a new
N::CONSENSUS_V3_HEIGHT
would need to be introduced. The migration would follow the process introduced in ARC-0042.Test Plan
This PR includes a test, whose expected output demonstrates the the failure and the fix after
CONSENSUS_V2_HEIGHT
is reached.Included is the CI branch and the CI diff.
Impact
As stated above, there are two classes of programs that are impacted by this issue:
rand.chacha
but await the futures in a different order that the async functions were called.In scanning all Aleo programs deployed on Mainnet as of 11/12/24 5PM PT:
rand.chacha
command and 0 programs that import, and consequently, call them.This was confirmed by a static analyzer on all programs on mainnet at the above date and time and double-checked by manual audit.
Related PRs
StaticAnalysis
pass and add checks on usage of async code for safety. leo#28443To Reviewers
An important function of this PR is to provide an understanding of how async execution mechanism works and why this PR is needed. If reviewers need context or clarification, please feel to post your questions in this thread. I am also happy to do a call explaining the original and proposed design/code.