Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENG-6654] fixfix: correct misunderstanding, handle conflicts #836

Merged

Conversation

aaxelb
Copy link
Contributor

@aaxelb aaxelb commented Dec 3, 2024

  • partly revert [ENG-6654] indexer recover from connection errors #835 (remove ensure_ack, ack_callback) -- messages can only be ack'd thru the channel they were received by, so trying to recover a channel to ack thru does nothing
  • handle easily-detectable delete_by_query conflicts in a followup celery task (so the indexer doesn't fall over on it)
  • local setup: allow the worker's daemon user to access elastic8
  • update consumer prefetch limit to match the indexing chunk size (seems to avoid connection errors, maybe this is why)
  • try avoiding connection errors when calling message.ack() -- do a bit more book-keeping to know when a bulk elastic action is the last for its index-card, so it can ack immediately then instead of waiting 'til the whole chunk has completed
  • test updates to handle changes...

@aaxelb aaxelb force-pushed the fix/indexer-fallback-failure branch 2 times, most recently from 8f54393 to 99f3eb2 Compare December 3, 2024 21:10
@coveralls
Copy link

coveralls commented Dec 3, 2024

Coverage Status

coverage: 91.219% (+0.04%) from 91.179%
when pulling 0880650 on aaxelb:fix/indexer-fallback-failure
into c85fb0d on CenterForOpenScience:develop.

- messages can _only_ be ack'd thru the channel they were received by --
  trying to recover a channel to ack thru does nothing
- handle easily-detectable delete_by_query conflicts in a followup
  celery task (so the indexer doesn't fall over on it)
@aaxelb aaxelb force-pushed the fix/indexer-fallback-failure branch from 99f3eb2 to adc9b83 Compare December 4, 2024 16:06
@aaxelb aaxelb changed the title fixfix: correct misunderstanding, handle conflicts [ENG-6654] fixfix: correct misunderstanding, handle conflicts Dec 4, 2024
try avoiding connection errors when calling `message.ack()` --
do a bit more book-keeping to know when a bulk elastic action is the
last for its index-card, so it can ack immediately then instead of
waiting 'til the whole chunk has completed
@aaxelb aaxelb marked this pull request as ready for review December 4, 2024 22:07
@aaxelb aaxelb requested a review from mfraezz December 4, 2024 22:08
Copy link
Member

@mfraezz mfraezz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One Q, one minor thing to mitigate TODOs, but generally looks good.

Pass complete :octocat:

share/search/daemon.py Show resolved Hide resolved
'card_pks': messages_chunk.target_ids_chunk,
'timestamp': messages_chunk.timestamp,
},
countdown=3, # TODO: config?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Should probably be configurable, yes.
Something like
settings.ELASTICSEARCH['COUNTDOWN_DELAY'] = int(os.environ.get('ELASTICSEARCH_COUNTDOWN_DELAY', 3))
would probably be reasonable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

went with ELASTICSEARCH_POST_INDEX_DELAY

@aaxelb aaxelb merged commit 317b7dc into CenterForOpenScience:develop Dec 5, 2024
3 checks passed
@aaxelb aaxelb deleted the fix/indexer-fallback-failure branch December 5, 2024 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants