Do better retries when Mastodon fails (and maybe Twitter too?) #366

sentry-io · 2023-08-23T00:39:23Z

JobTimeoutException: Task exceeded maximum timeout value (180 seconds)
(7 additional frame(s) were not displayed)
...
  File "bc/subscription/tasks.py", line 379, in make_post_for_webhook_event
    api_post_id = api.add_status(message, image, files)
  File "bc/channel/utils/connectors/masto.py", line 94, in add_status
    media_id = self.upload_media(
  File "bc/channel/utils/connectors/masto.py", line 57, in upload_media
    media_dict = self.api.media_post(

The text was updated successfully, but these errors were encountered:

mlissner · 2023-08-23T00:40:23Z

38 errors so far since this came up three days ago. I wonder:

Does this break Twitter too?
Which masto instance/account is affected?
Any ideas to fix it?

ERosendo · 2023-08-23T02:00:34Z

Does this break Twitter too?

No, The bot schedules one independent task for each channel linked to a case. If one of the tasks fails, it won't affect the other channels.

Which masto instance/account is affected?

I checked some events in sentry and it seems the only channel affected is [email protected] ( the API is rate-limiting the bot).

Any ideas to fix it?

I think we're getting this exception because the wrapper we're using implements the sleep function and a while loop to handle status code 429(you can find the implementation of the API request method here) so it seems that the sleep function adds a delay that causes the job to reach the default timeout for queues. We could tweak one of the arguments of the Mastodon class to throw an exception when the bot gets rate-limited so we could use retries instead of the sleep and while loop to handle the status code.

mlissner · 2023-08-23T18:06:28Z

OK, great. So we killed the bankr cases bot on Twitter, perhaps it needs to die on Masto too. I'll go bug the [email protected] folks a second time...

TheCleric · 2024-05-07T14:17:46Z

So I have a few ideas on this one and would like some input before I just start applying my own assumptions. Looking at the code @ERosendo referenced, the Mastodon API setting to allow us to do our own rate limit handling isn't great for our purposes. This is the exception that you get when you tell it you want to handle rate limit errors yourself:

raise MastodonRatelimitError('Hit rate limit.')

In the background it gathers stuff from the response headers telling it how long it's rate limited and when it should try again. But then that's the very helpful message it gives us. Thanks Mastodon.py.

So this leaves us in a position where we can certainly detect the FACT that we've been rate limited, but would have no idea how long.

So here's a few options:

Option 1

Have a special handler for adding mastodon messages to the queue that can detect rate limit errors
When it detects a rate limit error, use the queue's enqueue_at or enqueue_in function to resubmit the status in X time (How long? We don't know, so this would be a guess.)
When we enqueue it again, do we apply the rate limit protection in case we retried to early? If so, we would need our own retry counter in the function to decrement it (not ideal)

Option 2

Space out our retries on either just Mastodon add_status calls, or on all add_status calls. We currently tell rq the number of and time interval of retries (by default I think it retries again in 20 seconds), but instead of just telling it a single interval, we can tell it a series of intervals. For example we could tell it to retry three times with intervals of 20, 60, and 300. So its first retry would be after 20 seconds, its second would be 60 seconds after that, and its last 300 seconds after that one.
The upside is this would be a relatively small change to the code (comparatively) and could do a lot of the same things as Option 1 via rq's own builtin functionality.
The downside is: these Sentry errors would go away, but we'd start seeing a bunch of MastodonRatelimitErrors in their place.

Option 3

A combination of the first 2 options
We'll queue the initial message with rate limit protection as in Option 1, but subsequent retries would be queued without it and use the staggered retry intervals of Option 2
This would essentially give us 1 try to do it without throwing a sentry error, but any retries would log a sentry error

All of the options share the same weakness: guessing at what the rate limit actually is. This leaves us in a position where if we guess too low then we'll just error out until all of our retries are gone, but if we guess too high, it would severely delay the sending of Mastodon messages.

Technically there is an option to replace the Mastodon.py library with another one that supports better rate limit handling (or our own), but that's an even bigger unknown that I'd have to research. As well we could try to convince the maintainers of Mastodon.py (with an issue and PR) to provide us with the data on the rate limit in the exception, but I'm not sure what their appetite for that would be.

mlissner · 2024-05-07T21:12:19Z

Hm, my gut is that the simplest thing is the right answer, or at least the best place to begin to answer this. It's not a problem we get all the time, so maybe we can get away with the really simple thing and call it good enough.

I'm also, if I'm being honest, not super concerned if we miss a mastodon post due to this, because there just aren't many people there, and it's more or less fallen in popularity. I don't want to contribute to that, but also I don't want to bend over backwards if nobody is there.

My other thought is that it probably wouldn't be hard to tweak the mastodon exception to have a useful attribute, so maybe if just retrying using rq isn't enough, that could be the next step (it'd be nice to contribute to the masto community).

…reelawproject#195)

mlissner added this to @erosendo's backlog Aug 23, 2023

mlissner moved this to In Discussion / Later in @erosendo's backlog Sep 12, 2023

mlissner changed the title ~~Unable to post to Mastodon due to Timeout?~~ Do better retries when Mastodon fails (and maybe Twitter too?) Sep 12, 2023

mlissner moved this from In Discussion / Later to Bots Backlog in @erosendo's backlog Sep 12, 2023

mlissner mentioned this issue Apr 29, 2024

Tweak some docs links verbiage #520

Merged

TheCleric added a commit to TheCleric/bigcases2 that referenced this issue May 9, 2024

fix: Add increasing intervals to posting retries (freelawproject#366 f…

e741649

…reelawproject#195)

TheCleric mentioned this issue May 9, 2024

fix: Add increasing intervals to posting retries (#366 #195) #539

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do better retries when Mastodon fails (and maybe Twitter too?) #366

Do better retries when Mastodon fails (and maybe Twitter too?) #366

sentry-io bot commented Aug 23, 2023

mlissner commented Aug 23, 2023

ERosendo commented Aug 23, 2023 •

edited

Loading

mlissner commented Aug 23, 2023

TheCleric commented May 7, 2024

mlissner commented May 7, 2024

Do better retries when Mastodon fails (and maybe Twitter too?) #366

Do better retries when Mastodon fails (and maybe Twitter too?) #366

Comments

sentry-io bot commented Aug 23, 2023

mlissner commented Aug 23, 2023

ERosendo commented Aug 23, 2023 • edited Loading

mlissner commented Aug 23, 2023

TheCleric commented May 7, 2024

Option 1

Option 2

Option 3

mlissner commented May 7, 2024

ERosendo commented Aug 23, 2023 •

edited

Loading