Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor registry system: no direct dependencies; expose standard has…
…h.Hash; be a data carrier. --- Dependencies: Previously, depending on the go-mulithash package would create direct dependencies to several other modules for their various hash function implementations. This meant that instead of go-multihash being a lightweight, easy-to-accept dependency itself, it became something which would noticably increase the size of your go.mod file, your package graph, your download sizes during development, and most concerningly, your compile output size in final products. Now, there is a registry system (see the Register function), and the main go-multihash package *only* populates the registry with hashes that are available from the golang standard library by default. This means you gain no transitive dependencies on other libraries by importing the go-multihash package, and your binaries will not be bloated by hashers you don't use. (Your go.mod file may still show more repos; but they don't end up in your builds unless you actually refer to them). There are now several new packages in `go-multihash/register/*`. These can be imported to register the hashes in those packages. If you want all the hashes that were previously available, just make sure to import "go-mulithash/register/all" somewhere in your program. (You can register hashes, too, without making PRs to this library; these packages are just here for convenience and easy use.) **This is a breaking change** if you used hashes not found in the golang stdlib, such as blake2 and sha3. However, to update, all you need to do is ensure the relevant `go-multihash/register/*` package is imported anywhere in your program -- an easy change. --- Standard hash.Hash: Previously, go-multihash had its own definition of a `HashFunc` interface, and only exposed hashing through the `multihash.Sum` method. The problem with this was these definitions did not support streaming: one had to have an entire chunk of memory loaded at once, in a single contiguous byte slice, in order to hash it. (A second, admittedly much more minor, problem with this was that one often had to write glue code to turn a `hash.Hash` into a `multihash.HashFunc`, and since most of golang uses the standard lib `hash.Hash` definition already, this was generally avoidable friction.) Now, the Register function operates in terms of standard `hash.Hash`, and there is a `GetHasher` function which can get you a `hash.Hash`. (Okay, to be more precise, these functions take and return a factory function for a `hash.Hash`. You get the idea.) Since the standard `hash.Hash` interface can operate streamingly, now it's easy to use go-multihash in a streaming way. The `multihash.Sum` method works the same as always. --- Be a data carrier: Previously, go-multihash contained checks that any multihash indicator codes being handled were required to have a hash function registered for them. This made it very difficult to use go-multihash in a "forward compatible" way (and it also made a lot of practical bumps for this dependency-extraction refactor). Now, go-multihash is willing to carry data, even if it doesn't know what kind of hash function would be associated with an indicator code. (Methods that you'd expect to parse things do still parse the varints, making sure their sanely formatted. They just don't inspect and whitelist the actual integers anymore.) I removed the `ValidCode` predicate entirely. It doesn't seem to serve any good purpose anymore. --- Other: I have not touched the `Codes` and `Names` maps in this diff. I think we should probably review (and probably remove) these, and instead direct people to use the go-multicodec package instead, which has the two advantages of decoupling registration of an implementation versus simply having a description, and also being automatically generated from the multiformats table. However, I wanted to check on feelings about this before doing the work (especially because they're somewhat entangled with a bunch of the tests in this package, making their removal somewhat nontrivial). Most of the test files are now `package multihash_test`. This makes for some colorful diffs, but is not otherwise interesting. The reason for this is because the dependency separation process now requires the tests to import those `register/*` packages, and to avoid a cycle, that means, well, `package mulithash_test`. I think there's probably more work to be done in making this library really shine. For example, in reviewing the `Encode` function, I see some allocations that look very likely to be avoidable... if the function was redesigned to be more aware of how it's likely to be used. However, I took no action on this, in part because this diff is big enough already, and in part because I think it might be reasonable to re-examine the relationship of this code to go-cid at the same time. I dropped `TestSmallerLengthHashID`. It appeared to be testing an API that wasn't actually exported... and the nearest API that *is* exported (Sum) has a general contract of truncating a hash upon short length, so it was overall unclear what this test should be checking. Review might be needed on this. The situation for murmur3 is still in need of resolution. It's commented out entirely for now. Questions are noted in the diff. There's a 'register/miniosha256' package which sets the sha2-256 implementation to a non-stdlib one. If you don't import this package, you still get a sha2-256; it's just the stdlib one. I did not include this in the 'register/all' group. (Maybe it's faster; maybe it's not; but it's definitely not required, and I'm getting some reports it also shows weird on profiles, so I tend to think maybe one should really have to explicitly ask for this one.)
- Loading branch information