Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception during ADO ingest queue item processing causes the queue to be stuck. #221

Closed
patdunlavey opened this issue Oct 29, 2024 · 4 comments

Comments

@patdunlavey
Copy link
Contributor

If an exception is encountered during ami_ingest_ado queue processing (specifically, inside the "try-catch" section) in \Drupal\ami\AmiBatchQueue::takeOne, the problematic queue item is not removed from the queue and is picked up again to be processed again. This causes the queue to become permanently stuck on that item if the exception repeats every time the queue processor attempts to process it. The only solution is to manually go in (assuming you're using drupal database for queue management and not redis) and manually delete the offending queue item.

We've seen an AMI ingest queue get stuck for several days before anybody notices (e.g. over a holiday weekend)! We've seen AMI update queues repeatedly generate new revisions for objects over and over, hundreds of times. (It would seem that the queue item is hitting an exception after the ADO has been updated.)

If the exception type is "RequeueException" or "SuspendQueueException", the lease is released, making the item become immediately first in line to be processed again. Other exceptions simply keep the item leased until the lease expires, and then they're ready to process it again.

Perhaps this makes sense for "RequeueException" - assuming that whichever place threw the exception has a valid reason for determining that the item should be re-enqueued (so hard to follow in the code!). I can't see the logic for keeping an item in the queue for any other kind of exception.

Thoughts:

  • For any kind of exception other than "RequeueException", I think we could just delete the queue item and log an error.
  • For "RequeueException", I'm not sure. I thought of deleting the queue item and re-enqueueing it at the end of the queue, but...
  • Both of these ideas could cause problems if later queue items depend on the failed item - e.g. a parent item fails and then is followed by a child. Can we depend on the child creation item to fail if the parent item does?
  • Exceptions can be transient - not usually, but they can be. But I don't think it makes sense to have a "re-try" limit (of, say, three tries). Best, I think, to fail and do a really good job of logging the exception.
@DiegoPino
Copy link
Member

@patdunlavey could you share what generated the exception?

@patdunlavey
Copy link
Contributor Author

I really am not sure! Initial thought was bad character encoding, but I actually think it was strings in the csv that were not intended as json encoded, but were decoded as valid json.

@DiegoPino
Copy link
Member

DiegoPino commented Oct 29, 2024

@patdunlavey the link to the try/catch you shared is the Live Queue (Ajax/Realtime). If a template fails to render an array {{ myarray }} in Drupal 10.2.8+, because of https://www.drupal.org/project/drupal/issues/3427177#comment-15754054 will inside the background queue worker here throw an unhandled exception at

if ($method == 'template') {
$processed_metadata = $this->AmiUtilityService->processMetadataDisplay($data);
if (!$processed_metadata) {
$message = $this->t('Sorry, we can not cast ADO with @uuid into proper Metadata. Check the Metadata Display Template used, your permissions and/or your data ROW in your CSV for set @setid.',[
'@uuid' => $data->info['row']['uuid'],
'@setid' => $data->info['set_id']
]);
$this->loggerFactory->get('ami_file')->error($message ,[
'setid' => $data->info['set_id'] ?? NULL,
'time_submitted' => $data->info['time_submitted'] ?? '',
]);
$this->setStatus(amiSetEntity::STATUS_PROCESSING_WITH_ERRORS, $data);
return;
}
}
because of this

ami/src/AmiUtilityService.php

Lines 2473 to 2478 in d389b63

$cacheabledata = \Drupal::service('renderer')->executeInRenderContext(
new RenderContext(),
function () use ($context, $metadatadisplay_entity) {
return $metadatadisplay_entity->renderNative($context);
}
);

And then trigger an automatic re-enqueueing the Drupal's internal queue magic (on any exception) and the type will depend on the actual type of queue (DB one will not push immediately in the same place... I think) but others might ...

I can make a work around there an try/catch the actual template rendering but ideally we should also tackle the issue that with the new Drupal Core change (good for the tests) render array validation will cancel a complete template rendering and that can be only resolved by working with Drupal core so they add a catch/try on their invocation code on the overridden default Twig Filter Drupal provides, OR, we override that filter (code duplication) and to it ourselves until someone else notices this is an issue.

@DiegoPino
Copy link
Member

Solved via f8e42ac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants