You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PAG doesn't keep the track of all metrics in memory, like the normal prom gateway, but just the last version of the merged metric. That reduces the memory usage and makes it possible to use it to handle heavy metrics input loads, like the ones generated from browser-side apps.
Even with that "merge and keep the last value" optimization, as the metrics are never cleaned up, considering time < infinite, PAG will eventually deplete the MEM/CPU resourcing and blow up, as it happened a couple of times in my company. As we have cortex keeping track of metrics, PAG getting restarted every now and then is not a huge problem, but before blowing up we have an increase on the number of "bad requests", which makes us lose some good metrics while it doesn't restart.
There's no need to keep the metrics always there on PAG, as they are constantly scraped and stored on Prometheus or Cortex long living storage, so we should have a way to detect and remove old metrics from memory.
The text was updated successfully, but these errors were encountered:
PAG doesn't keep the track of all metrics in memory, like the normal prom gateway, but just the last version of the merged metric. That reduces the memory usage and makes it possible to use it to handle heavy metrics input loads, like the ones generated from browser-side apps.
Even with that "merge and keep the last value" optimization, as the metrics are never cleaned up, considering time < infinite, PAG will eventually deplete the MEM/CPU resourcing and blow up, as it happened a couple of times in my company. As we have cortex keeping track of metrics, PAG getting restarted every now and then is not a huge problem, but before blowing up we have an increase on the number of "bad requests", which makes us lose some good metrics while it doesn't restart.
There's no need to keep the metrics always there on PAG, as they are constantly scraped and stored on Prometheus or Cortex long living storage, so we should have a way to detect and remove old metrics from memory.
The text was updated successfully, but these errors were encountered: