Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode Chunked Transfer encoded payload prior to pushing to IPFS instead of decoding at replay #126

Open
machawk1 opened this issue Mar 2, 2017 · 2 comments

Comments

@machawk1
Copy link
Member

machawk1 commented Mar 2, 2017

In #125 @ibnesayeed mentioned that a page is replayed more often than archived. The same content at different URI-Rs should yield the same IPFS hash when pushed. If chunking is different on different servers, this will not be the case if the chunk lengths are considered as part of the payload an used as part of the basis for the hash.

The logic to decode chunked responses has been implemented in replay.py. Move and adapt this implementation for chunked payloads prior to pushing to IPFS. Use the dechunked payload as the basis for the IPFS hash when writing the CDXJ.

@machawk1 machawk1 changed the title Decode Chunked Transfer encoded payload prior to pushing to IPFS instead of decoding at recplay Decode Chunked Transfer encoded payload prior to pushing to IPFS instead of decoding at replay Mar 2, 2017
@ibnesayeed
Copy link
Member

@machawk1 you misinterpreted some of the arguments I put. Despite, I think dechunking at storage time would be more beneficial.

@machawk1
Copy link
Member Author

machawk1 commented Mar 2, 2017

@ibnesayeed I referred to #125 so your rationale will persist. Part of the first ¶ was what I considered when we first spoke of implementing it in replay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants