You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 29, 2025. It is now read-only.
Graphite could be smarter when caching requests. Currently only the full HTTP response of render query can be cached. The results of find queries are also cached but only when using remote storage. This can be greatly improved.
webapp/graphite/metrics/views.py
Add cache support using the exising configuration (FIND_CACHE_DURATION).
Remove the manually added cache headers, use Django's add_never_cache_headers() instead.
Benefits: even if this isn't the main API doing these is often expensive. This way we
also don't need to implement caching in every finder.
webapp/graphite/readers.py
Most readers already now how to add buffered (not cached) points gathered from Carbon. On top of that past points could be cached, in slices, putting a much higher ttl for slices that should not be able to be modified anymore (configurable, set to something like 30m by default).
The splitting would basically work like what is done in BigGraphite. TimeSeries are split into chunks for 100k points or so (to avoid hitting the 1MB limit for memcached values). The key contains the start timestamp and the associated expression (which is the metric name here). The value contains the points plus some metadata (when this was generated, start and stop times, associated expression, period).
cache.get_multi() and cache.set_multi() can be used to be more efficient. The cache needs to be querried before the actual fetch happens. Once values have been fetched the start and end times can be adjusted to fetch the remaining points. Cache is updated when new values are being fetched.
Since the cache is sliced, we might have to tweak that a little bit to avoid re-writting
cache that is already good.
Benefits: if we know that the points in the past are not going to change we get store them with a higher ttl in a pre-serialized way and bypass most of the code.
webapp/graphite/{views.py,evaluator.py}
Cache expressions (sum(foo.*)). If you know that points in the past are not changing, you can cache the whole expression and only re-compute the last x minutes then merge the results together. This may not work with a few functions like now() and alike that could be blacklisted and should generally be fine.
The easiest way is probably to fetch data from the cache first (which includes time_start and time_stop) and to only execute the rest of the query. For the initial implementation it might be easier to re-execute all the query if the cache doesn't query all the "old" data, this will also allow opportunistic repairs to happen. If this proves efficient we could split the query in multiple sub-queries.
Benefits: we also cache computed values, which is nice.
biggraphite/plugin/graphite.py
The finder could probably use the cache (until the above things are implemented).
Django cache
Use a Cassandra based cache backend instead of memcached because we might want a relatively high ttl for expressions (since the past doesn't change much).
The text was updated successfully, but these errors were encountered:
Graphite could be smarter when caching requests. Currently only the full HTTP response of render query can be cached. The results of find queries are also cached but only when using remote storage. This can be greatly improved.
webapp/graphite/metrics/views.py
FIND_CACHE_DURATION
).add_never_cache_headers()
instead.Benefits: even if this isn't the main API doing these is often expensive. This way we
also don't need to implement caching in every finder.
webapp/graphite/readers.py
Most readers already now how to add buffered (not cached) points gathered from Carbon. On top of that past points could be cached, in slices, putting a much higher ttl for slices that should not be able to be modified anymore (configurable, set to something like 30m by default).
The splitting would basically work like what is done in BigGraphite. TimeSeries are split into chunks for 100k points or so (to avoid hitting the 1MB limit for memcached values). The key contains the start timestamp and the associated expression (which is the metric name here). The value contains the points plus some metadata (when this was generated, start and stop times, associated expression, period).
cache.get_multi()
andcache.set_multi()
can be used to be more efficient. The cache needs to be querried before the actual fetch happens. Once values have been fetched the start and end times can be adjusted to fetch the remaining points. Cache is updated when new values are being fetched.Proposed API:
Since the cache is sliced, we might have to tweak that a little bit to avoid re-writting
cache that is already good.
Benefits: if we know that the points in the past are not going to change we get store them with a higher ttl in a pre-serialized way and bypass most of the code.
webapp/graphite/{views.py,evaluator.py}
Cache expressions (
sum(foo.*)
). If you know that points in the past are not changing, you can cache the whole expression and only re-compute the last x minutes then merge the results together. This may not work with a few functions likenow()
and alike that could be blacklisted and should generally be fine.The easiest way is probably to fetch data from the cache first (which includes time_start and time_stop) and to only execute the rest of the query. For the initial implementation it might be easier to re-execute all the query if the cache doesn't query all the "old" data, this will also allow opportunistic repairs to happen. If this proves efficient we could split the query in multiple sub-queries.
Benefits: we also cache computed values, which is nice.
biggraphite/plugin/graphite.py
The finder could probably use the cache (until the above things are implemented).
Django cache
The text was updated successfully, but these errors were encountered: