Better caching #245

iksaif · 2017-02-08T12:06:20Z

Graphite could be smarter when caching requests. Currently only the full HTTP response of render query can be cached. The results of find queries are also cached but only when using remote storage. This can be greatly improved.

webapp/graphite/metrics/views.py

Add cache support using the exising configuration (FIND_CACHE_DURATION).
Remove the manually added cache headers, use Django's add_never_cache_headers() instead.

Benefits: even if this isn't the main API doing these is often expensive. This way we
also don't need to implement caching in every finder.

webapp/graphite/readers.py

Most readers already now how to add buffered (not cached) points gathered from Carbon. On top of that past points could be cached, in slices, putting a much higher ttl for slices that should not be able to be modified anymore (configurable, set to something like 30m by default).

The splitting would basically work like what is done in BigGraphite. TimeSeries are split into chunks for 100k points or so (to avoid hitting the 1MB limit for memcached values). The key contains the start timestamp and the associated expression (which is the metric name here). The value contains the points plus some metadata (when this was generated, start and stop times, associated expression, period).

cache.get_multi() and cache.set_multi() can be used to be more efficient. The cache needs to be querried before the actual fetch happens. Once values have been fetched the start and end times can be adjusted to fetch the remaining points. Cache is updated when new values are being fetched.

Proposed API:

def get_cached_points(expression, start_time, end_time):
  "Return cached points for an expression.

  Args:
    expression: the expression to look for in the cache, can be either a metric name
      or an expression containing functions.
    start_time: starting at this time, included.
    stop_time: stoping at this time (not sure if included, check readers API)

  Returns:
    ((start, end, step), values)
  "
  pass

def set_cached_points(expression, time_info, values):
  "Cache datapoints for an expression

  Args:
    expression: the expression to look for in the cache, can be either a metric name
    time_info: (start, end, step)
    values: iterable of values as double
  "
  pass

Since the cache is sliced, we might have to tweak that a little bit to avoid re-writting
cache that is already good.

Benefits: if we know that the points in the past are not going to change we get store them with a higher ttl in a pre-serialized way and bypass most of the code.

webapp/graphite/{views.py,evaluator.py}

Cache expressions (sum(foo.*)). If you know that points in the past are not changing, you can cache the whole expression and only re-compute the last x minutes then merge the results together. This may not work with a few functions like now() and alike that could be blacklisted and should generally be fine.

The easiest way is probably to fetch data from the cache first (which includes time_start and time_stop) and to only execute the rest of the query. For the initial implementation it might be easier to re-execute all the query if the cache doesn't query all the "old" data, this will also allow opportunistic repairs to happen. If this proves efficient we could split the query in multiple sub-queries.

Benefits: we also cache computed values, which is nice.

biggraphite/plugin/graphite.py

The finder could probably use the cache (until the above things are implemented).

Django cache

Use a Cassandra based cache backend instead of memcached because we might want a relatively high ttl for expressions (since the past doesn't change much).

The text was updated successfully, but these errors were encountered:

iksaif added this to the v0.8.0 milestone Feb 8, 2017

iksaif modified the milestones: 0.9.0, v0.8.0 Mar 7, 2017

iksaif modified the milestones: v1.0.0, v0.9.0 Jun 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better caching #245

Better caching #245

iksaif commented Feb 8, 2017

Better caching #245

Better caching #245

Comments

iksaif commented Feb 8, 2017

webapp/graphite/metrics/views.py

webapp/graphite/readers.py

webapp/graphite/{views.py,evaluator.py}

biggraphite/plugin/graphite.py

Django cache