Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reprovide Sweep #824

Open
guillaumemichel opened this issue Mar 7, 2023 · 5 comments
Open

Reprovide Sweep #824

guillaumemichel opened this issue Mar 7, 2023 · 5 comments
Assignees

Comments

@guillaumemichel
Copy link
Contributor

guillaumemichel commented Mar 7, 2023

Context

Currently the Reprovide Operation is triggered by Kubo for each Provider Record. Kubo periodically (every 22h) republished Provider Records using go-libp2p-kad-dht Provide method.

func (dht *IpfsDHT) Provide(ctx context.Context, key cid.Cid, brdcst bool) (err error) {

The DHT Provide method consists in performing a lookup request to find the 20 closest peers to the CID, open a connection to these peers and allocate them the Provider Record.

The problem

This means that for every Provider Record that a node is advertising, 1 lookup request needs to be performed every 22h, and 20 connections need to be opened. This may seem fine for small providers, however this is terrible for large Content Providers. The Reprovide operation is certainly the reason most large Content Provider don't use the DHT, and IPFS is forced to keep the infamous Bitswap broadcast. Improving the Reprovide operation would allow large Content Providers to advertise their content to the DHT. Once most of the content is published on the DHT, Bitswap broadcast can be significantly reduced. This is expected to significantly cut off the price of hosting content on IPFS, because all peers in the network won't get spammed with requests for CIDs they don't host.

Solution overview

By the pigeonhole principle, if a Content Provider is providing content for x CIDs, with $x \geq \frac{\#DhtServers}{repl}$, then multiple Provider Records are allocated on the same DHT Servers. The optimization consists in reproviding all Provider Records allocated to the same DHT Servers at once. It spares expensive DHT lookups and connections opening.

Without entering too much into details, all Provider Records are grouped by XOR proximity in the keyspace. All Provider Records in a group are allocated to the same set of DHT Servers. Perdiodically, the Content Provider sweeps the keyspace from left to right and reprovides the Provider Records corresponding to the visited keyspace region.

For a Content Provider providing 100K CIDs, and 25K DHT Servers the expected improvement is ~80x.

More details can be found on the WIP Notion document.

How to implement it

The Reprovide operation responsibility should be transferred from Kubo to the DHT implementation. This is generally desired because different Content Routers may have different reprovide logic, that kubo is unaware of, or cannot optimize for.

go-libp2p-kad-dht (and other Content Routers) should expose StartProviding(CID), StopProviding(CID) methods instead of the Provide(CID) method. Kubo then only needs to pass to the DHT which CIDs should be provided or not.

A lot of refactoring needs to happen around go-libp2p-routing-helpers.

References

@aschmahmann
Copy link
Contributor

+1 to reproviding in a sweep. A number of the comments below are somewhat kubo centric, although that's mostly to match comments raise in the original post.


Bitswap broadcast can be turned off for good

I thing you are misunderstanding what Bitswap broadcast is for and how this could impact it. The main things it provides are:

  1. Ability to query peers on a LAN/offline (note: it seems unlikely that a LAN DHT is the answer here. unless your LAN has hundreds+ of nodes it'll almost certainly be cheaper to broadcast than to do all the puts and gets of a structured lookup, not to mention the churn rate could be really high and make matters even worse)
  2. Ability to heal broken sessions. Bitswap is a block-based protocol, doing a DHT lookup for every block would be insanely expensive and slow so implementations that are even trying to be efficient will have some concept of sessions where they make educated guesses about which peers will have blocks based on previous responses.
    • Sometimes sessions are improperly tracked/handled and broadcast effectively repairs this -> the solution here is to fix the bugs
  3. Ability to discover sessions that are not necessarily known to the application
    • For example, maybe you have an IPFS HTTP Gateway that is tracking sessions on a per-HTTP request basis, but not noticing that there are many requests for different parts of the same graph -> these could be dropped in exchange for more DHT lookups but you could also just drastically reduce the broadcast amount and probably get most of the benefit [ipfs/go-bitswap] Reduce Broadcasting ipfs/boxo#81
  4. If you happen to bumble into a peer who has the data you are looking for for some reason (e.g. they are a DHT server and you're doing a lookup) you'll get the data from them even if they didn't advertise in the DHT
    • People relying on this behavior without really knowing what they're doing are making a mistake, the latencies tend to be really high for this kind of search anyway. Even if this made sense, reducing broadcast probably helps these people out sufficiently

This proposal only reduces 1 of the 4 ways Bitswap broadcast is used. It's still a good proposal to do, but landing it alone will not make it feasible to remove Bitswap broadcast in kubo.


Some notes from work on the FullRT client which already implements this sweeping behavior (albeit by taking shortcuts like doing providing externally and doing the reprovides all together mostly because this proposal, while more correct, was more time than was available then).

  1. Yes, the sweeping will help by reducing the number of connections you're using at once. However, if you're copying the approach from the standard client you may need to be better at handling timeouts and failures too. If you don't then your throughput might not be great
  2. This should account for if someone has turned off their node for longer than a reprovide period. At that point they probably want to speedily reprovide everything (although maybe not).
  3. If you're going through the effort to make a new database here to track data to reprovide it would be helpful to expose a function allowing the user to know what percentage of their data it think has been reprovided successfully and should be discoverable.
  4. Don't overly couple these components unless there's really something to be gained. The DHT server, client and reprovider code can all have different (protocol compliant) strategies so smushing them all into one monolith is going to make experimentation harder.
  5. Speaking of this coupling ideally figure out how to implement this database in a way that's easy to clean/version it. For example, IPNI groups its provider records by some arbitrary identifier. DHT consumers may want to do this as well to better enable bulk removal.
  6. Consider how you'll deal with database corruption here (e.g. Would you require kubo's repo GC to call StopProvider on every block it removes? Is the plan to keep providing even during block/disk corruption/deletion? What about if the process dies in between the provider write and the block write? ...)

@guillaumemichel
Copy link
Contributor Author

Thank you @aschmahmann for the valuable feedback!


I agree concerning the Bitswap broadcast. I think once better Content Routers (DHT, IPNI) are widely used, the broadcast can be drastically reduced, even if it is still lightly used for reasons 1-3 you mentioned.


Concerning 2., I plan on keeping track of the republish time of groups of Provider Records. If the node turns on after a period off and that the time to reprovide some groups of Provider Records has already passed, the node will republish all Provider Records to catch up with the sweep.

I agree having a (re)provider distinct from the client and server makes it more modular and is generally a good practice. Its interface could contain at least the following functions StartProviding(CID), StopProviding(CID), Provide(CID)to provide a CID just once,GetProvidedSet()` returning the CIDs that are being provided and when they have been published last.

The Kademlia Identifiers associated with the Provider Records (essentially Hash(mh)) will be stored in a binary trie, to facilitate the reprovide sweep, and quick access to the database (the binary trie is essentially a key-value store).

I am not very familiar with block/disk corruption/deletion, so suggestions are welcome. To address corruption and keep consistency between the blocks kubo is providing and the Provider Records being advertised, kubo could periodically call GetProvidedSet() and verify that it matches its set of blocks. It can then adapt with StartProviding(CID) and StopProviding(CID).

@aschmahmann
Copy link
Contributor

Yeah, there are any number of ways to deal with corruption. For example, you could require a database with transactions you can commit/rollback, you can add that kind of functionality on top of an arbitrary kv store (https://github.com/ipfs/go-ipfs-pinner does this kind of thing), you can do corruption checks, etc. Mostly just wanted to make sure it was on your radar so you don't end up with hard to diagnose problems down the road.

To some extent relying on a database to do the work for you is the easiest, but it requires all your downstream consumers to be willing to provide a database that does that which they may/may not be willing to do. Could be worth pushing on if the other approaches feel too computationally or development time expensive.

I'll leave my comments on changing the Routing interface on the other issue.

@markg85
Copy link

markg85 commented Dec 8, 2024

I can't read the linked notion document. Could it be put somewhere out in the open perhaps?

@guillaumemichel
Copy link
Contributor Author

I updated the link, thanks for the heads up 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants