Adaptive retries #257

Kamil-Lontkowski · 2024-12-19T13:25:22Z

Closes #252

Implementation of adaptive retries based on implementation from aws-sdk-java-v2. It is backed by simple token bucket, where for every retry we take tokens, and replenish them for every successful operation. This way if we are making for example request for very resource-constrained set of resources, after system failure we don't generate more load by retrying every operation that failed.

Kamil-Lontkowski · 2024-12-19T13:39:37Z

I have a few questions regarding this draft.

AdaptiveRetryConfig added few parameters, I am wondering if we want to go in similar direction as before, providing factory methods for simple cases, or make them with all parameters with sensible defaults.
Currently when there are no tokens for retry we just fail as if we used all retries. I wonder if we can introduce "blocking mode" so client waits for enough tokens in bucket.
There is already implementation of leakyBucket in RateLimiter. I wonder if maybe it would be worth sharing some logic between them, but at the same time it is very simple logic.

lukaszlenart · 2024-12-19T14:06:58Z

What about adding token on failure/giving up?

adamw · 2024-12-19T16:47:19Z

Isn't the "adaptiveness" or "token bucket" an orthogonal concern to the other pieces of configuration that we have currently: schedule, resultPolicy and onRetry? That is - for any values of the schedule etc., we might add (or not) a token bucket, which needs to be consulted before retries. In that scenario, we wouldn't need separate Config implementations, just extending the current one?

As for your questions:

if we have a single config class, then we should also have factory methods with sensible defaults
I think that would be making the retries kind of a rate limiter - and it's a different mechanism. So let's keep those separate.
if you could extract a Bucket abstraction, great, but not unless straighforward

Kamil-Lontkowski · 2024-12-20T10:47:53Z

What about adding token on failure/giving up?

I think that it would contradict what TokenBucketRetryCondition and AdaptiveRetryStrategy do. If we add token on failure we would effectively take token for retry and add it back after it fails again. I don't know if I understood that question correctly.

adamw · 2024-12-20T11:33:21Z

Thanks for the changes, but I've got yet another variant to consider. Up until now, retries where stateless - it was just a config saying how many time to retry, regardless of other invocations. With the adaptive variant, retries become stateful - the token bucket must be shared. State is usually hard and we have to be cautious not to introduce "surprising" APIs.

That's why maybe it would be better to make the state possible explicit. That is - keep retry stateless as it currently is. But, introduce an AdaptiveRetry(TokenBucket) class, which would have an apply method, mirroring the current retry. That way, the stateful instance would be made explicit, and that's what users would have to share. Usage would then be:

// this will be defined at the top-level, shared among request handlers etc.
val adaptiveRetry = AdaptiveRetry(TokenBucket(...))

// usage:
adaptiveRetry(RetryConfig.backoff(maxRetries, initialDelay, maxDelay))(...)

As for the implementation, we might use the callbacks in the retry policy (respecting the callbacks that already are there). Also, shouldn't we specify both failureCost and successReward for the token bucket?

What do you think? :)

Kamil-Lontkowski · 2024-12-20T11:38:14Z

This WIP is mostly based on mechanism implemented in aws-sdk. There is only exceptionCost. But we can add also successReward. Maybe with new retry AdaptiveRetry those can be functions on errors and result allowing even more flexibility(in aws-sdk throttling exception were treated differently).

adamw · 2024-12-20T11:55:45Z

Hm, I think that's a good feature of the AWS SDK - if the call didn't even make it over the network, then we shouldn't count this as an "adaptive failure". So just as we have "isSuccess" etc in the policy, here we could also have one piece of config for adaptive retries - whether the exception should incur the retry penalty

Kamil-Lontkowski · 2024-12-20T15:05:41Z

I am trying two approaches for writing AdaptiveRetry class. One defines methods like failureCost as class params. This approach makes the apply methods take less parameters but it forces us to define E and T params when defining AdaptiveRetry. So this seems like a bad idea if we want to reuse policy for different actions. It also does't allow for retry alternative

def apply(config: RetryConfig[E, T])(operation: => T): T

because we can't narrow T to Throwable.

Second approach pushes definitions of this methods to apply. This also have some drawbacks. We can't have multiple apply methods with the same defaults, so we need to force user to define these, or hardcode defaults. Two argument lists also makes it hard for compiler to pick proper overloaded method. In the version I pushed I needed to specify with config E and T explicitly because without it compiler would pick method that returns T even though operation returns Either. Also the infered type of result was Any(but I don't know if it is from IntelliJ or compiler).

That means that we would probably need to get rid of overloaded apply methods, because if compiler picks wrong one we only find out if it blows up in runtime. These are my thoughts after trying this variant, or maybe there is more ways to define this class.

adamw · 2024-12-23T09:15:52Z

But isn't failure / success cost constant? Having them variable seems like a very advanced scenario

adamw · 2024-12-23T09:16:18Z

I'd imagine them simply as failureCost: Int, successReward: Int

Kamil-Lontkowski · 2024-12-23T10:31:30Z

These can be easily changed. I was wondering more if we are okey with signature like this:

def apply[E, T, F[_]](
      config: RetryConfig[E, T],
      failureCost: Int,
      successReward: Int,
      isFailure: E => Boolean,
      errorMode: ErrorMode[E, F]
  )(operation: => F[T]): F[T]

Because in this scenario overloading apply methods is not good idea because things like I mentioned in the comment can sneak by and blow out in runtime. But writing few methods, like retryEither should be enough. If so I will write tests and complete docs.

adamw · 2024-12-23T10:40:53Z

I think I'd push all the "adaptive" configuration to the AdaptiveRetry constructor, so the: token bucket, success & failure token counts. This is all independent of specific T/E parameters. Not sure if that the problem, though :)

Kamil-Lontkowski · 2024-12-23T14:03:33Z

I think that would be nice middle ground for how much configuration should be provided at every invocation. Maybe running operations on AdaptiveRetry instance should have the same api that retry.scala provides to avoid confusion with this apply method. But that is just matter of method name.

I didn't add more tests in BackoffRetryTest as they felt redundant. I think all "adaptive" cases where tested with different configs.

core/src/main/scala/ox/resilience/AdaptiveRetry.scala

adamw · 2024-12-23T14:47:53Z

core/src/main/scala/ox/resilience/AdaptiveRetry.scala

+    */
+  def apply[E, T, F[_]](
+      config: RetryConfig[E, T],
+      isFailure: E => Boolean = (_: E) => true,


I think if we get an error value (E), that's always a failure. The additional flexibility that we provide in retries, is that success values T might also cause a retry - so there's no exception thrown, for example (if E == Throwable), but we get back a value which signals a retry anyway. But if we get an E, that's a failure for sure

Changed signature to isFailure: T => Boolean

I pushed changes according to this comment. shouldPayPenaltyCost: T => Boolean = (_: T) => true decides if penalty will be paid for result T that was decided to be retried.

But I am not sure about the use case. What if we use library that has client side throttling and operation ends with error. It was throttling so we didn't make the call to the service so we should not pay for this retry but now we can't handle this case. Maybe it should be Either[E, T] => Boolean. Just a thought, I don't know which case might be more useful.

Yeah you're right, Either[E, T] => Boolean would be much more useful. Also let's rename this to shouldPayFailureCost to keep the vocabulary consistent

core/src/main/scala/ox/resilience/AdaptiveRetry.scala

adamw · 2024-12-31T09:44:53Z

The code examples in the docs didn't have mdoc:compile-only and contained multiple errors (I fixed those)

AdaptiveRetryConfig

726dc1a

Kamil-Lontkowski force-pushed the adaptive-retries branch from 25c41cf to 726dc1a Compare December 19, 2024 13:38

adaptive as extension of ResultPolicy

62059c0

AdaptiveRetry class

0032a96

Kamil-Lontkowski changed the title ~~WIP: Adaptive retries~~ Adaptive retries Dec 23, 2024

Kamil-Lontkowski marked this pull request as ready for review December 23, 2024 13:58

AdaptiveRetry refinement, docs

e2616e6

Kamil-Lontkowski force-pushed the adaptive-retries branch from 17a3462 to e2616e6 Compare December 23, 2024 14:04

adamw reviewed Dec 23, 2024

View reviewed changes

core/src/main/scala/ox/resilience/AdaptiveRetry.scala Outdated Show resolved Hide resolved

adamw reviewed Dec 23, 2024

View reviewed changes

core/src/main/scala/ox/resilience/AdaptiveRetry.scala Outdated Show resolved Hide resolved

adamw reviewed Dec 23, 2024

View reviewed changes

core/src/main/scala/ox/resilience/AdaptiveRetry.scala Outdated Show resolved Hide resolved

adamw reviewed Dec 23, 2024

View reviewed changes

core/src/main/scala/ox/resilience/AdaptiveRetry.scala Outdated Show resolved Hide resolved

adamw reviewed Dec 23, 2024

View reviewed changes

core/src/main/scala/ox/resilience/AdaptiveRetry.scala Outdated Show resolved Hide resolved

adamw reviewed Dec 23, 2024

View reviewed changes

changes after review

04980f2

Kamil-Lontkowski force-pushed the adaptive-retries branch from 367fc4c to 04980f2 Compare December 27, 2024 11:30

changes after review

bd97a64

changes after review

de125ee

Kamil-Lontkowski requested a review from adamw December 27, 2024 16:05

adamw reviewed Dec 30, 2024

View reviewed changes

core/src/main/scala/ox/resilience/AdaptiveRetry.scala Outdated Show resolved Hide resolved

adamw reviewed Dec 30, 2024

View reviewed changes

core/src/main/scala/ox/resilience/AdaptiveRetry.scala Show resolved Hide resolved

Kamil-Lontkowski and others added 4 commits December 30, 2024 15:56

change callbacks in ScheduledConfig

08310b0

update docs

c901e3c

RepeatConfig - stop on error

6b9b99e

Docs

1f48726

adamw and others added 4 commits December 31, 2024 10:50

Simplify ScheduleContinue

05136b1

Change ScheduleContinue to ScheduleStop

70e44ec

change shouldPayFailureCost

1d88f1e

Fix docs

45000a7

adamw merged commit daefc1e into master Dec 31, 2024
5 checks passed

adamw deleted the adaptive-retries branch December 31, 2024 11:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adaptive retries #257

Adaptive retries #257

Kamil-Lontkowski commented Dec 19, 2024

Kamil-Lontkowski commented Dec 19, 2024

lukaszlenart commented Dec 19, 2024

adamw commented Dec 19, 2024

Kamil-Lontkowski commented Dec 20, 2024 •

edited

Loading

adamw commented Dec 20, 2024

Kamil-Lontkowski commented Dec 20, 2024

adamw commented Dec 20, 2024

Kamil-Lontkowski commented Dec 20, 2024 •

edited

Loading

adamw commented Dec 23, 2024

adamw commented Dec 23, 2024

Kamil-Lontkowski commented Dec 23, 2024

adamw commented Dec 23, 2024

Kamil-Lontkowski commented Dec 23, 2024 •

edited

Loading

adamw Dec 23, 2024

Kamil-Lontkowski Dec 27, 2024

Kamil-Lontkowski Dec 27, 2024

adamw Dec 31, 2024

adamw commented Dec 31, 2024

Adaptive retries #257

Adaptive retries #257

Conversation

Kamil-Lontkowski commented Dec 19, 2024

Kamil-Lontkowski commented Dec 19, 2024

lukaszlenart commented Dec 19, 2024

adamw commented Dec 19, 2024

Kamil-Lontkowski commented Dec 20, 2024 • edited Loading

adamw commented Dec 20, 2024

Kamil-Lontkowski commented Dec 20, 2024

adamw commented Dec 20, 2024

Kamil-Lontkowski commented Dec 20, 2024 • edited Loading

adamw commented Dec 23, 2024

adamw commented Dec 23, 2024

Kamil-Lontkowski commented Dec 23, 2024

adamw commented Dec 23, 2024

Kamil-Lontkowski commented Dec 23, 2024 • edited Loading

adamw Dec 23, 2024

Choose a reason for hiding this comment

Kamil-Lontkowski Dec 27, 2024

Choose a reason for hiding this comment

Kamil-Lontkowski Dec 27, 2024

Choose a reason for hiding this comment

adamw Dec 31, 2024

Choose a reason for hiding this comment

adamw commented Dec 31, 2024

Kamil-Lontkowski commented Dec 20, 2024 •

edited

Loading

Kamil-Lontkowski commented Dec 20, 2024 •

edited

Loading

Kamil-Lontkowski commented Dec 23, 2024 •

edited

Loading