-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry search at current cursor on total:None #548
base: master
Are you sure you want to change the base?
Conversation
I'm not a huge fan of this band-aid fix, especially when considering the possibility of the infinite None loop. |
Thanks for the PR @hornc, and thanks for taking a look, maxz! I think we should retry a few times when we run into total: None, and then throw a ReadTimeout exception like we do here if all retries fail/return None for total. I think we should back off for 1 second on each retry, based on feedback from our search team. We should probably also expose these values (retry count, back off delay) via optional parameters. @hornc I'm happy to make these changes if you'd like. I have a few other projects that need tending to, but hopefully I can get to it soon! Thanks again. |
@mekarpeles This was the behavior we were seeing on Friday. |
First off, I want to ➕ 1 @jjjake's proposal of having some sort of retry w/ exponential backoff. However, today the existing solution is worse than the PR that @hornc offers in that it throws exceptions up stream. I would support merging @hornc's changes to unbreak code that relies on this functionality and then also support an effort which adds retries + backoff and warnings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would prefer exponential backoff as well, though this would love problems purl
is seeing (we are currently swallowing the exceptions on our side)
This problem can still occur on v3.2.0 -- I had been using a older patched version with the fix in this PR daily (without exp. backoff, and without problems), but recently upgraded. It's a sporadic issue, but I've seen it under v3.2.0. I've now upgraded to v3.5.0. If it occurs again I'll rebase this PR and see about adding a backoff. |
This is more of a workaround for an issue I've run into on search paging results. I'm not sure why the client is sporadically receiving an unexpected response -- it seems like a problem with the search API which might need investigation.
Getting the following error occasionally on
ia search
with largish result sets (eg: 92669) part way through:Example API JSON response (from adding some debug code to capture the unexpected response):
This was occurring sporadically for me over the weekend at 10K multiples on various queries from 30K to 90K
num_results
.The query I was using is admittedly a bit complex:
ia search "mediatype:texts AND boxid:IA* AND scribe3_search_catalog:isbn AND identifier:t*" -f "scribe3_search_id" -p scope:all
But I have in the past run 'worse' ones with more clauses, fields, and expected results without problems :)
I haven't tested yet whether the
scope:all
contributes to the buggy behaviour. I started filtering on identifier first letter to reduce the size of the queries.The code change here simply checks
total
before assuming it is anint
, and retries the 10K limit request at the current cursor iftotal
isNone
-- which seems an unexpected result. Current code looks to be expecting either0
or a positive integer.This was enough to let me complete a number of large search tasks.
Possible issues:
total: None
total:None
looks like an API bug which would be masked by this workaround without understanding of the root cause.