-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: incremental-hasher #261
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
@@ -1,4 +1,5 @@ | ||||||||||
// # Multihash | ||||||||||
import type { MulticodecCode } from '../block/interface.js' | ||||||||||
|
||||||||||
/** | ||||||||||
* Represents a multihash digest which carries information about the | ||||||||||
|
@@ -9,7 +10,7 @@ | |||||||||
// a bunch of places that parse it to extract (code, digest, size). By creating | ||||||||||
// this first class representation we avoid reparsing and things generally fit | ||||||||||
// really nicely. | ||||||||||
export interface MultihashDigest<Code extends number = number> { | ||||||||||
export interface MultihashDigest<Code extends MulticodecCode = MulticodecCode, Size extends number = number> { | ||||||||||
/** | ||||||||||
* Code of the multihash | ||||||||||
*/ | ||||||||||
|
@@ -23,7 +24,7 @@ export interface MultihashDigest<Code extends number = number> { | |||||||||
/** | ||||||||||
* byte length of the `this.digest` | ||||||||||
*/ | ||||||||||
size: number | ||||||||||
size: Size | ||||||||||
|
||||||||||
/** | ||||||||||
* Binary representation of this multihash digest. | ||||||||||
|
@@ -35,7 +36,7 @@ export interface MultihashDigest<Code extends number = number> { | |||||||||
* Hasher represents a hashing algorithm implementation that produces as | ||||||||||
* `MultihashDigest`. | ||||||||||
*/ | ||||||||||
export interface MultihashHasher<Code extends number = number> { | ||||||||||
export interface MultihashHasher<Code extends MulticodecCode = MulticodecCode> { | ||||||||||
/** | ||||||||||
* Takes binary `input` and returns it (multi) hash digest. Return value is | ||||||||||
* either promise of a digest or a digest. This way general use can `await` | ||||||||||
|
@@ -67,6 +68,76 @@ export interface MultihashHasher<Code extends number = number> { | |||||||||
* `SyncMultihashHasher` is useful in certain APIs where async hashing would be | ||||||||||
* impractical e.g. implementation of Hash Array Mapped Trie (HAMT). | ||||||||||
*/ | ||||||||||
export interface SyncMultihashHasher<Code extends number = number> extends MultihashHasher<Code> { | ||||||||||
export interface SyncMultihashHasher<Code extends MulticodecCode = MulticodecCode> extends MultihashHasher<Code> { | ||||||||||
digest: (input: Uint8Array) => MultihashDigest<Code> | ||||||||||
} | ||||||||||
|
||||||||||
/** | ||||||||||
* Incremental variant of the `MultihashHasher` that can be used to compute | ||||||||||
* digest of the payloads that would be impractical or impossible to load all | ||||||||||
* into a memory. | ||||||||||
*/ | ||||||||||
export interface IncrementalMultihashHasher< | ||||||||||
Code extends MulticodecCode, | ||||||||||
Size extends number, | ||||||||||
Digest = MultihashDigest<Code, Size> | ||||||||||
> { | ||||||||||
/** | ||||||||||
* Size of the digest this hasher produces. | ||||||||||
*/ | ||||||||||
size: Size | ||||||||||
|
||||||||||
/** | ||||||||||
* Code of the multihash | ||||||||||
*/ | ||||||||||
code: Code | ||||||||||
|
||||||||||
/** | ||||||||||
* Name of the multihash | ||||||||||
*/ | ||||||||||
name: string | ||||||||||
|
||||||||||
/** | ||||||||||
* Number of bytes that were consumed. | ||||||||||
*/ | ||||||||||
count(): bigint | ||||||||||
Comment on lines
+100
to
+103
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Let's just drop this method, we can revisit if we find it really necessary. |
||||||||||
|
||||||||||
/** | ||||||||||
* Returns multihash digest of the bytes written so far. Should not have | ||||||||||
* side-effects, meaning you should be able to write some more bytes and | ||||||||||
* call `digest` again to get the digest for all the bytes written from | ||||||||||
* creation (or from reset) | ||||||||||
*/ | ||||||||||
digest(): Digest | ||||||||||
|
||||||||||
/** | ||||||||||
* Encodes multihash of the bytes written so far (since creation or | ||||||||||
* reset) into provided `target` at given `offset`. If `offset` not | ||||||||||
* provided it is implicitly `0`. | ||||||||||
* | ||||||||||
* @param [offset=0] - Byte offset in the `target`. | ||||||||||
*/ | ||||||||||
readDigest(target: Uint8Array, offset?: number): this | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you describe the use-case for this? it seems like this makes it an onerous API to have to implement There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, my mistake, this is the output function! I think maybe the naming could be better here. We have ample precedent of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I also see I'm discussing history here - There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I mean if you think of it as a transform stream, it makes sense to have write and read ops. I don't mind renaming it to something else, but please don't make me come up with a name that everyone will like.
I'm not completely opposed to returning back the target, however I would caution against it as it mixes two very different modes into one and can also lead to mistakes (e.g. you may have passed undefined reference which will no through but happily give you back Uint8Array) Idea was that if you want to compute digest you just call |
||||||||||
|
||||||||||
/** | ||||||||||
* Encodes raw digest (without multihash header) of the bytes written | ||||||||||
* so far (since creation or reset) into provided `target` at given | ||||||||||
* `offset`. If `offset` not provided it is implicitly `0`. | ||||||||||
* | ||||||||||
* @param [offset=0] - Byte offset in the `target`. | ||||||||||
*/ | ||||||||||
read(target: Uint8Array, offset?: number): this | ||||||||||
|
||||||||||
/** | ||||||||||
* Writes bytes to be digested. | ||||||||||
*/ | ||||||||||
write(bytes: Uint8Array): this | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Typically in streaming hashers this is called There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm fine with calling it update although I do find that name confusing personally as I think of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, but streaming hashers aren't appending to a buffer they are updating their internal state with the new data you pass. |
||||||||||
|
||||||||||
/** | ||||||||||
* Resets this hasher to its initial state. Can be used to recycle this | ||||||||||
* instance. It resets `count` and and discards all the bytes that were | ||||||||||
* written prior. | ||||||||||
*/ | ||||||||||
reset(): this | ||||||||||
} | ||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if someone is hashing >9PiB of data in JS then 👏👏👏.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah ... is this overkill?