Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit shared cache durations in cache hit, miss and error conditions #1162

Merged
merged 6 commits into from
Dec 20, 2024

Conversation

muddyfish
Copy link
Contributor

@muddyfish muddyfish commented Nov 22, 2024

Adds additional duration metrics to the shared cache for cache hits, misses, and errors.

Example metrics:

2024-12-06T14:11:43.012775Z  INFO mountpoint_s3::metrics: express_data_cache.block_err[reason=invalid_block_offset,type=read]: 189 (n=189)
2024-12-06T14:11:43.012802Z  INFO mountpoint_s3::metrics: express_data_cache.block_hit: 0 (n=189)
2024-12-06T14:11:43.012817Z  INFO mountpoint_s3::metrics: express_data_cache.read_duration_us[type=error]: n=189: min=3 p10=3 p50=4 avg=3.87 p90=5 p99=5 p99.9=6 max=6
2024-12-06T14:11:43.012831Z  INFO mountpoint_s3::metrics: express_data_cache.total_bytes[type=write]: 380 (n=190)
2024-12-06T14:11:43.012844Z  INFO mountpoint_s3::metrics: express_data_cache.write_duration_us[type=ok]: n=190: min=8256 p10=8511 p50=8895 avg=8882.19 p90=9343 p99=9535 p99.9=9663 max=9663

And

2024-12-06T16:06:14.462602Z  INFO mountpoint_s3::metrics: express_data_cache.block_hit: 98 (n=100)
2024-12-06T16:06:14.462628Z  INFO mountpoint_s3::metrics: express_data_cache.read_duration_us[type=miss]: n=2: min=21120 p10=21247 p50=21247 avg=21824.00 p90=22527 p99=22527 p99.9=22527 max=22527
2024-12-06T16:06:14.462641Z  INFO mountpoint_s3::metrics: express_data_cache.read_duration_us[type=ok]: n=98: min=5888 p10=6015 p50=6271 avg=6378.94 p90=6559 p99=14079 p99.9=14079 max=14079
2024-12-06T16:06:14.462652Z  INFO mountpoint_s3::metrics: express_data_cache.total_bytes[type=read]: 196 (n=98)
2024-12-06T16:06:14.462663Z  INFO mountpoint_s3::metrics: express_data_cache.total_bytes[type=write]: 4 (n=2)
2024-12-06T16:06:14.462673Z  INFO mountpoint_s3::metrics: express_data_cache.write_duration_us[type=ok]: n=2: min=9408 p10=9471 p50=9471 avg=19280.00 p90=29183 p99=29183 p99.9=29183 max=29183

Additionally refactors the cache in response to comments in #1146

Does this change impact existing behavior?

Yes, the express_data_cache.read_duration_us metric now has a type associated with if it was a cache hit or not.

Does this change need a changelog entry?

No, changes to metrics don't need changelog entries.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

@muddyfish muddyfish temporarily deployed to PR integration tests November 22, 2024 13:44 — with GitHub Actions Inactive
@muddyfish muddyfish temporarily deployed to PR integration tests November 22, 2024 13:44 — with GitHub Actions Inactive
@muddyfish muddyfish temporarily deployed to PR integration tests November 22, 2024 13:44 — with GitHub Actions Inactive
@muddyfish muddyfish temporarily deployed to PR integration tests November 22, 2024 13:44 — with GitHub Actions Inactive
@muddyfish muddyfish requested a review from passaro November 22, 2024 13:44
@muddyfish muddyfish temporarily deployed to PR integration tests November 22, 2024 13:44 — with GitHub Actions Inactive
@muddyfish muddyfish temporarily deployed to PR integration tests November 22, 2024 13:44 — with GitHub Actions Inactive
@muddyfish muddyfish temporarily deployed to PR integration tests November 22, 2024 13:44 — with GitHub Actions Inactive
Copy link
Contributor

@passaro passaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main concern is the return type with the additional reason. I see two possible alternatives:

  • Add more specific cases to DataCacheResult, so you can get the "reason" from the error itself,
  • or emit the express_data_cache.block_err metric in read_block/write_block before returning the error.

mountpoint-s3/src/data_cache/express_data_cache.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/express_data_cache.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/express_data_cache.rs Outdated Show resolved Hide resolved
@muddyfish muddyfish force-pushed the shared-cache-additional-metrics branch from d1bee2f to 3705db8 Compare December 6, 2024 12:47
@muddyfish muddyfish requested a deployment to PR integration tests December 6, 2024 12:47 — with GitHub Actions Waiting
@muddyfish muddyfish requested a deployment to PR integration tests December 6, 2024 12:47 — with GitHub Actions Waiting
@muddyfish muddyfish requested a deployment to PR integration tests December 6, 2024 12:47 — with GitHub Actions Waiting
@muddyfish muddyfish requested a deployment to PR integration tests December 6, 2024 12:47 — with GitHub Actions Waiting
@muddyfish muddyfish requested a deployment to PR integration tests December 6, 2024 12:47 — with GitHub Actions Waiting
@muddyfish muddyfish requested a deployment to PR integration tests December 6, 2024 12:47 — with GitHub Actions Waiting
@muddyfish muddyfish requested a deployment to PR integration tests December 6, 2024 12:47 — with GitHub Actions Waiting
@muddyfish muddyfish requested a deployment to PR integration tests December 6, 2024 15:42 — with GitHub Actions Waiting
@muddyfish muddyfish requested a deployment to PR integration tests December 6, 2024 15:42 — with GitHub Actions Waiting
@muddyfish muddyfish requested a deployment to PR integration tests December 6, 2024 15:42 — with GitHub Actions Waiting
@muddyfish muddyfish requested a deployment to PR integration tests December 6, 2024 15:42 — with GitHub Actions Waiting
@muddyfish muddyfish requested a deployment to PR integration tests December 6, 2024 15:42 — with GitHub Actions Waiting
@muddyfish muddyfish requested a deployment to PR integration tests December 6, 2024 15:42 — with GitHub Actions Waiting
@muddyfish muddyfish requested a deployment to PR integration tests December 6, 2024 15:42 — with GitHub Actions Waiting
@muddyfish muddyfish force-pushed the shared-cache-additional-metrics branch from 27d29a4 to 9766e08 Compare December 6, 2024 15:46
@muddyfish muddyfish temporarily deployed to PR integration tests December 6, 2024 15:46 — with GitHub Actions Inactive
@muddyfish muddyfish temporarily deployed to PR integration tests December 6, 2024 15:46 — with GitHub Actions Inactive
@muddyfish muddyfish temporarily deployed to PR integration tests December 6, 2024 15:46 — with GitHub Actions Inactive
@muddyfish muddyfish temporarily deployed to PR integration tests December 6, 2024 15:46 — with GitHub Actions Inactive
@muddyfish muddyfish temporarily deployed to PR integration tests December 6, 2024 15:46 — with GitHub Actions Inactive
Signed-off-by: Simon Beal <[email protected]>
@muddyfish muddyfish temporarily deployed to PR integration tests December 20, 2024 15:11 — with GitHub Actions Inactive
@muddyfish muddyfish temporarily deployed to PR integration tests December 20, 2024 15:11 — with GitHub Actions Inactive
@muddyfish muddyfish temporarily deployed to PR integration tests December 20, 2024 15:11 — with GitHub Actions Inactive
@muddyfish muddyfish temporarily deployed to PR integration tests December 20, 2024 15:11 — with GitHub Actions Inactive
@muddyfish muddyfish temporarily deployed to PR integration tests December 20, 2024 15:11 — with GitHub Actions Inactive
@muddyfish muddyfish temporarily deployed to PR integration tests December 20, 2024 15:11 — with GitHub Actions Inactive
@muddyfish muddyfish temporarily deployed to PR integration tests December 20, 2024 15:11 — with GitHub Actions Inactive
@muddyfish muddyfish requested a review from passaro December 20, 2024 15:12
@muddyfish muddyfish enabled auto-merge December 20, 2024 15:12
@muddyfish muddyfish added this pull request to the merge queue Dec 20, 2024
Merged via the queue into awslabs:main with commit 641f613 Dec 20, 2024
23 checks passed
@muddyfish muddyfish deleted the shared-cache-additional-metrics branch December 20, 2024 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants