Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a different attribute name than "locator" in CDXJ #41

Open
machawk1 opened this issue Oct 20, 2016 · 10 comments
Open

Using a different attribute name than "locator" in CDXJ #41

machawk1 opened this issue Oct 20, 2016 · 10 comments

Comments

@machawk1
Copy link
Member

The value for this field is a URN, not a "locator" per se. @ibnesayeed Do you have a suggestion for a better name? @phonedude noted this at one point.

@ibnesayeed
Copy link
Member

It's a difficult situation. It could be name, identifier, or a location. The role of this field in this context is something that might change in a broader perspective or even when this system evolves to other models. Finding a term that is generic enough while being accurate is challenging. I will give more thoughts about it.

@machawk1
Copy link
Member Author

Any further thoughts since October about a better name, @ibnesayeed ?

@ibnesayeed
Copy link
Member

We can perhaps call it urn or uri.

@machawk1
Copy link
Member Author

@ibnesayeed Those seem fitting albeit not nearly as "user-friendly" as "locator", which might be a moot point if the intention of ipwb CDXJs is to be machine readable. Any other recommendations beyond urn or uri before we switch over, @phonedude ?

@phonedude
Copy link
Member

I've lost the thread -- where is "locator" used as an attribute?

@ibnesayeed
Copy link
Member

@phonedude in the CDXJ (index) files we store references to the hashes of the headers and payload blocks of responses in the following manner.

- - {"..": "..", "locator": "urn:ipfs/{header_digest}/{payload_digest}", "..": ".."}

The term locator was something that @weiglemc questioned about if it is really something that tells about the location of the resources. That's why we were looking for better alternatives.

@phonedude
Copy link
Member

Definitely should not be called a "locator", since that would suggest URL, which it clearly is not. URI or URN would be more accurate, but repetitive and not nearly as descriptive as something like "header-payload-digests".

@ibnesayeed
Copy link
Member

I would stay away with something like header-payload-digests because we are thinking about it in a more general terms so that the same field can be used in other replay systems such as OWB or PyWB where the field would hold reference to the corresponding WARC file with offset and length like urn:warcs/{offset}/{length}/{warc_file_name_or_path}/. In fact the upcoming model of IPWB is planned to not have references to the header and payload, but a single standard ipfs: URI reference to a memento node that will internally point to all the related pieces using IPLD.

@machawk1
Copy link
Member Author

Any further thoughts on this naming, @ibnesayeed? Could the field value ever be a uri but not a urn?

Once we change this name, should we have some adaptation considerations for older versions of ipwb that used locator?

@ibnesayeed
Copy link
Member

Any further thoughts on this naming

I don't have a good name right now.

Could the field value ever be a uri but not a urn?

Yes! The reason why we used this style in the first place rather than keeping headers and payload hashes under separate attributes, so that we can generalize it. If a record is stored on an HTTP URL we can use that directly or if a content is to be fetched from WARC file we can have something like urin:warcs/{offset}/{length}/{warc_file_name_or_path}/. So, it was a generalization effort.

Once we change this name, should we have some adaptation considerations for older versions of ipwb that used locator?

Changing this name is about standardizing terminologies used in CDXJ files for archival indexing purposes, irrespective of the tool they are used in. Once such a change is made, we will have a few choices: 1) have an fallback keyword in the replay to look for the old name for a while, 2) provide a migration script/command that changes old CDXJ files in the new style, or 3) if the user base of the tool is small, we can just introduce this breaking change and inform in the release not and the README file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants