Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EIP-7594: PeerDAS open questions #3652

Open
ralexstokes opened this issue Apr 5, 2024 · 8 comments
Open

EIP-7594: PeerDAS open questions #3652

ralexstokes opened this issue Apr 5, 2024 · 8 comments
Labels
EIP-7594 PeerDAS

Comments

@ralexstokes
Copy link
Member

ralexstokes commented Apr 5, 2024

Context

General background for PeerDAS design and goals:

https://ethresear.ch/t/peerdas-a-simpler-das-approach-using-battle-tested-p2p-components/16541

https://ethresear.ch/t/from-4844-to-danksharding-a-path-to-scaling-ethereum-da/18046

Open questions

Parameterization

Determine final parameters for a robust and secure network.

Availability look-behind

One particular parameter is how tight the sampling has to be with respect to block/blob processing and fork choice. For example, nodes could sample in the same slot as a block and not consider a block valid until the sampling completes. In the event this requirement is too strict (e.g. because of network performance), we could relax the requirement to only complete sampling within some number of trailing slots from the head. If we go with a trailing approach, are there additional complications in the regime of long-range forks or network partitions? Does working in this "optimistic" setting cause undue complexity in implementations?

Syncing

Some questions around syncing relating to PeerDAS and also the possible deprecation of EIP-4844 style sampling.

Deprecate blob_sidecars_by_root and blob_sidecars_by_range?

Can we deprecate these RPC methods? Note you would still sample anything inside the blob retention window.

DataColumnSidecarsByRoot and DataColumnSidecarsByRange

Currently missing a method for ByRange. Required for syncing in the regime where clients are expected to retain samples.
What is the exact layout of the RPC method? Multiple columns or just one? See thread: #3574 (comment)

Peer scoring

How to downscore a peer who should custody some sample but can’t respond with it?

Network shards design

See here for more context on the proposal: #3623
Likely a good simplification. Would touch some of the PeerDAS details around mapping a given peer to their sample subnets.
Some additional implications: #3574 (comment)

Subnet design

Map one column per subnet, unless we need to do otherwise, see #3574 (comment)

ENR semantics

#3574 (comment)

Spec refactoring

Misc. refactoring to align with the general spec style:

#3574 (comment)
#3574 (comment)
#3574 (comment)
Ensure all comments with references to Deneb or 4844 are now EIP-7594
#3574 (comment)
#3574 (comment)

@dapplion
Copy link
Member

dapplion commented Apr 6, 2024

Does working in this "optimistic" setting cause undue complexity in implementations?

Big yes, but note a similar gadget is required by ILs in their current design

Deprecate blob_sidecars_by_root and blob_sidecars_by_range?

They don't appear necessary as the proposer should distribute columns directly.

DataColumnSidecarsByRange

Useful for column custodians to fetch all columns for given subnet and epoch, like we do now for blobs

@fradamt
Copy link
Contributor

fradamt commented Apr 8, 2024

If we go with a trailing approach, are there additional complications in the regime of long-range forks or network partitions? Does working in this "optimistic" setting cause undue complexity in implementations?

Imho we should avoid having the whole validator set operating in an optimistic setting, even if we were to ignore implementation complexity and just worry about consensus security. One attack that this enables is:

  • A proposer or a builder (importantly, not someone controlling much stake) proposes an unavailable block B, in particular available only in 15 out of 32 subnets.
  • Everyone in the 15 subnets where it is available votes for B because sampling is not required yet
  • Though B has a lot of votes, the next proposer does not build on it because sampling fails
  • Data is meanwhile made fully available. Sampling now succeeds for everyone.
  • No one votes for the new proposal because B has weight > proposer boost and the proposal does not extend it

This can perhaps be fixed by requiring the attesters to have their sampling done by 10s into the previous slot, while the proposer has a bit more time. More complexity, more timing assumptions. Also, this is just one attack, and it's not clear what the entire attack surface looks like.

There is a clear solution: the custody requirement needs to be high enough to provide strong guarantees even before we get to sampling (see here as well). High enough here means somewhere between 4 and 8, depending on the adversarial model we want to work with. With that, an attacker that does not control a lot of validators would fail at accruing many votes for a < 50% available block, and so it would be easily reorgable through proposer boost.

Some related things to keep in mind:

  • The efficiency gain we get in the distribution phase of PeerDAS compared to 4844 is DATA_COLUMN_SIDECAR_SUBNET_COUNT / CUSTODY_REQUIREMENT / 2, because nodes are required to custody CUSTODY_REQUIREMENT / DATA_COLUMN_SIDECAR_SUBNET_COUNT of the whole data, which is extended by 2x. For example, with current parameters PeerDAS would be 16x more efficient then 4844 (ignoring sampling): everyone downloads 1/32 of the 2x extended data, so an average throughput of 48 blobs would require the equivalent of the 4844 bandwidth for distribution. Even a much more modest ratio of 5x lets us move to 16/32 blobs with hardly any bandwidth increase (just a little bit for sampling)
  • By increasing the number of subnets, we can increase CUSTODY_REQUIREMENT without affecting the above mentioned ratio, or we can at least recover some of the lost efficiency. If we want to stick with 32 subnets, we could for example set the CUSTODY_REQUIREMENT to 4, which gives a 4x gain. In the initial rollout, we could even be more conservative, even if it does not allow much of a blob count increase. If we are ok with having 64 subnets like we do for attestations (and possibly all fitting together in the network shard paradigm?), then reasonable values could be 4/64 (8x), 6/64 (~5x), 8/64 (4x). Since in the short term we're likely not going to want to go past a max of 32 blobs, there might not be much reason to go beyond these values, e.g., up to 128 subnets.
  • A higher CUSTODY_REQUIREMENT / DATA_COLUMN_SIDECAR_SUBNET_COUNT ratio also means that we don't need as many honest peers in order to have good guarantees about being able to get our samples. Peer sampling can be generally more robust, and less dependent on there being many nodes with a high advertised custody.

Imo it makes a lot of sense to move from 4844 to PeerDAS gradually. We can do this not only by slowly increasing the blob count, but also by slowly decreasing the minimum proportion of data custodied by each node, i.e., the CUSTODY_REQUIREMENT / DATA_COLUMN_SIDECAR_SUBNET_COUNT ratio. For example, we could start with 3/6 blobs, 32 subnets, a custody requirement of 16, i.e., unchanged throughput and everyone still downloads the whole data, just changing the networking. At this point, we wouldn't even need sampling yet, and we could introduce it without it actually doing anything, just to test the behavior on mainnet. We could then fully introduce sampling while moving to 6/12 blobs and a custody requirement of 8, then 12/24 blobs and custody requirement of 4. From there, we can increase the subnet count to 64 etc...

How many SAMPLES_PER_SLOT to hit the security level we want?

I don't see why we would want more than 16, or even 16 - CUSTODY_REQUIREMENT.

@jimmygchen
Copy link
Contributor

Is it worth also increasing the TARGET_NUMBER_OF_PEERS (currently 70), in addition to increasing the CUSTODY_REQUIREMENT?

With a target peer count of 70, and each peer subscribing to one subnet (out of 32), a healthy target peer count per subnet would be ~2 on average. This would potentially impact the ability for proposer to disseminate data columns to all 32 subnets successfully, and could potentially lead to data loss - assuming proposer isn't custodying all columns - we could potentially make an exception for proposer to custody all columns, but feels like it would be cleaner to just make sure we disseminate the samples reliably.

Although if we increase CUSTODY_REQUIREMENT to 4 this would already significantly reduce the likelihood of having insufficient peers in a subnet.

@fradamt
Copy link
Contributor

fradamt commented May 3, 2024

Is it worth also increasing the TARGET_NUMBER_OF_PEERS (currently 70), in addition to increasing the CUSTODY_REQUIREMENT?

With a target peer count of 70, and each peer subscribing to one subnet (out of 32), a healthy target peer count per subnet would be ~2 on average. This would potentially impact the ability for proposer to disseminate data columns to all 32 subnets successfully, and could potentially lead to data loss - assuming proposer isn't custodying all columns - we could potentially make an exception for proposer to custody all columns, but feels like it would be cleaner to just make sure we disseminate the samples reliably.

Although if we increase CUSTODY_REQUIREMENT to 4 this would already significantly reduce the likelihood of having insufficient peers in a subnet.

We really shouldn't keep the CUSTODY_REQUIREMENT as is (even 4 is low) unless we go with a non-trailing fork-choice, so this shouldn't be as much of a problem in the short term. That said, if all clients agree that it's ok to do so, I think increasing the TARGET_NUMBER_OF_PEERS would be great, because even in the best case we'd have an average of ~7 peers per subnet (e.g. with CUSTODY_REQUIREMENT = 6 and 64 subnets). It also gives us more room to relax the custody ratio later.

@fradamt
Copy link
Contributor

fradamt commented May 3, 2024

Something that I think should be added to the open questions is validator custody: should validators have their own custody assignment, at the very least when they're voting, if not even in every slot? This has two benefits:

  • If an unavailable block is finalized, validators can be asked (out of protocol) to provide the data they were supposed to custody, and socially slashed if they fail to do so after some deadline
  • There are two reasons to increase the CUSTODY_REQUIREMENT. One is to ensure that the average number of peers per subnet is sufficiently high, and another is to ensure that most validators won't vote for an unavailable block (the pre-sampling guarantees discussed here). Depending on TARGET_NUMBER_OF_PEERS, the former might require less custody than the latter, so the extra load can just be on validators, which need the extra custody for voting securely, and not on simple full nodes, for which it is unnecessary extra work.

Just as an example, we could set CUSTODY_REQUIREMENT to 4 and VALIDATOR_CUSTODY_REQUIREMENT to 2.

cc @adietrichs

@cskiraly
Copy link
Contributor

How many SAMPLES_PER_SLOT to hit the security level we want?

I have my LossyDAS for PeerDAS notebook here:
https://colab.research.google.com/drive/18uUgT2i-m3CbzQ5TyP9XFKqTn1DImUJD

Of course it also covers the 0 losses allowed case.
The main question here is I think setting the security level we want to achieve. Any thoughts on that?

@cskiraly
Copy link
Contributor

I see the following in the spec:
TARGET_NUMBER_OF_PEERS should be tuned upward in the event of failed sampling.

What are we trying to address with this? If it remains in the spec, I think there should also be a mechanism (or recommendations) to return back to original values.

@cskiraly
Copy link
Contributor

Regarding TARGET_NUMBER_OF_PEERS:
We need peers for two different things:

  • building the overlays, which is at the subnet level
  • sampling, which is at the column level

For the sampling, peer count is important, because the mechanism to sample fast from nodes that are not peers is not yet there, os I see this driving TARGET_NUMBER_OF_PEERS requirements.
For the subnets, instead, my assumption would be that you can change your peerset based on the subnets assigned. If rotation is not too fast (or if there is no rotation), this should be doable. In that case, what you need is to reach target degree (plus some) on custody_size subnets.

TARGET_NUMBER_OF_PEERS I think should be tuned based on these two requirement, with sufficient safety margins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EIP-7594 PeerDAS
Projects
None yet
Development

No branches or pull requests

6 participants