Replies: 1 comment
-
Implemented in 5664535 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
URLItem messages are used by this endpoint
which returns the URL. At the moment, this doesn't allow to distinguish between two similar URLs but belonging to different crawlIDs, which one should be considered successfully processed?
Returning just the URL as an indication of success is also a bit vague.
Instead we could add a messageID to URLItems, the value of which is an arbitrary String set by the client (e.g. a long hash of the URL + crawlID + metadata?). PutURLs would return not just a String, but a structured object containing the original messageID as well as a code indicating whether the URLItem was successfully added, not added and if so why (e.g. URL too long) or triggered an error. In the last case, the client could decide to resend the URLItem if necessary.
This would allow to distinguish the processing for two similar URLs, whether they belong to different crawlIDs or not.
This is a noticeable change to the API. I haven't found a generic messageID as part of the grpc messages so we'd have to add one explicitely.
Any thoughts on this?
Beta Was this translation helpful? Give feedback.
All reactions