-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(vault): fix vault config neg_ttl behavior #14157
base: master
Are you sure you want to change the base?
Conversation
|
I think the reason was that we don't have a clear picture whether something is miss or something else, like a network error. Thus misses we decided to fetch every rotation cycle. We have talked also about n-number of failures, or crowing the time gradually on continuous failures, and ultimately removing the secret from the rotation. I do not have strong feeling on any direction of this though. |
Yes you are true. If there's a network error when fetching vault, it should not be cached for long time, but retry every minute (or using exponential backoffs). |
d643f09
to
f60376d
Compare
Fixed test and rebased onto master in the force push. |
As per documentation, neg_ttl specifies the time to cache a vault miss. However in the current implementation the secret miss is not cached for this duration and is fetched from vault backend every minute. This PR first fixes the check in the secret rotation timer to not fetch negatively cached vaules unconditionally, but only after the neg_ttl. Then it changes the shdict ttl for negative cache from neg_ttl to neg_ttl + SECRETS_CACHE_MIN_TTL, or else the negative cache will expire from shdict and there's no chance to update it after neg_ttl.
f60376d
to
165f34c
Compare
Yes, but how to know that it was an error or missing vault key (we may need to consult each vault implementation about it, if even possible)? E.g. 404 does that come from ill configured proxy or from vault (thus we may need to check the payload, there is no standards, so each vault may be different)? But sure if you want to explore this option, I have nothing against it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve with nitpicks
I think the simpler way would be just caching the negative value for neg_ttl
time regardless of what kind of error it encounters. IMO this seems to be more aligned with what the name of this config field "neg_ttl" describes.
As per documentation, neg_ttl specifies the time to cache a vault miss. However in the current implementation the secret miss is not cached for this duration and is fetched from vault backend every minute.
This PR first fixes the check in the secret rotation timer to not fetch negatively cached vaules unconditionally, but only after the neg_ttl. Then it changes the shdict ttl for negative cache from neg_ttl to neg_ttl + SECRETS_CACHE_MIN_TTL, or else the negative cache will expire from shdict and there's no chance to update it after neg_ttl.
Summary
Checklist
changelog/unreleased/kong
orskip-changelog
label added on PR if changelog is unnecessary. README.mdIssue reference
Fix FTI-6240