You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 29, 2025. It is now read-only.
Wondering if someone could assist with an issue I'm having with BigGraphite [BG]. It currently receives a large number of metrics, but appears to drop a noticable proportion randomly... this was highlighted when looking at metrics from Apache Spark, which has frequent gaps per hour (of one minute each).
I can see traffic coming in to the interface (tcpdump/tcpflow), and can see logs to bg-carbon.log with references to 'cache query', but almost no datapoint logs for spark metrics.
Any assistance in troubleshooting would be greatly appreciated!
The text was updated successfully, but these errors were encountered:
I've rebuilt the cache container to only run carbon cache. Previously, it was running statds+carbon+etc, and this was all under supervisord, or similar. The container now runs carbon exclusively.
At first, and under low load, there were no metric drop-outs at all. We were shipping all metrics for spark, and it was bulletproof. As soon as we started shipping more metrics from other services, we began to see drop-outs of 1-2 minutes. across multiple metrics. Another interesting observation is that metrics appear to disappear at times - I'm not sure if they are being overwritten by null values? What I can tell you is that metrics are being fed into now what is a dedicated carbon ingress, and being inspected from another graphite endpoint, so whisper data is not a thing.
I've made multiple tweaks to the configs, but I'm at a bit of a loss as to how to eradicate the intermittent data loss.
Any help would be GREATLY appreciated!
TIA!
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hey guys,
Wondering if someone could assist with an issue I'm having with BigGraphite [BG]. It currently receives a large number of metrics, but appears to drop a noticable proportion randomly... this was highlighted when looking at metrics from Apache Spark, which has frequent gaps per hour (of one minute each).
Infrastructure Setup:
1 root 0:00 {entrypoint} /bin/sh /entrypoint
49 root 0:00 runsvdir -P /etc/service
51 root 0:00 runsv bg-carbon
52 root 0:03 runsv brubeck
53 root 0:00 runsv carbon
54 root 0:00 runsv carbon-aggregator
55 root 0:03 runsv carbon-relay
56 root 0:03 runsv collectd
57 root 0:00 runsv cron
58 root 0:00 runsv go-carbon
59 root 0:00 runsv graphite
60 root 0:00 runsv nginx
61 root 0:03 runsv redis
62 root 0:00 runsv statsd
63 root 0:00 tee -a /var/log/carbon.log
65 root 0:00 tee -a /var/log/carbon-relay.log
68 root 0:00 tee -a /var/log/statsd.log
69 root 0:01 {gunicorn} /opt/graphite/bin/python3 /opt/graphite/bin/gunicorn wsgi --pythonpath=/opt/graphite/webapp/graphite --preload --threads=1 --worker-class=sync --workers=4 --limit-request-line=0 --max-requests=1000 --timeout=65 --bind=0.0
70 root 0:09 {node} statsd /opt/statsd/config/tcp.js
71 root 0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
76 root 0:00 /usr/sbin/crond -f
79 nginx 0:00 nginx: worker process
80 nginx 0:00 nginx: worker process
81 nginx 0:00 nginx: worker process
82 nginx 0:00 nginx: worker process
85 root 0:35 tee -a /var/log/bg-carbon.log
86 root 45:27 /opt/graphite/bin/python3 /opt/graphite/bin/bg-carbon-cache start --nodaemon --debug
88 root 0:00 tee -a /var/log/carbon-aggregator.log
156 root 0:41 {gunicorn} /opt/graphite/bin/python3 /opt/graphite/bin/gunicorn wsgi --pythonpath=/opt/graphite/webapp/graphite --preload --threads=1 --worker-class=sync --workers=4 --limit-request-line=0 --max-requests=1000 --timeout=65 --bind=0.0
157 root 0:49 {gunicorn} /opt/graphite/bin/python3 /opt/graphite/bin/gunicorn wsgi --pythonpath=/opt/graphite/webapp/graphite --preload --threads=1 --worker-class=sync --workers=4 --limit-request-line=0 --max-requests=1000 --timeout=65 --bind=0.0
158 root 0:46 {gunicorn} /opt/graphite/bin/python3 /opt/graphite/bin/gunicorn wsgi --pythonpath=/opt/graphite/webapp/graphite --preload --threads=1 --worker-class=sync --workers=4 --limit-request-line=0 --max-requests=1000 --timeout=65 --bind=0.0
159 root 0:47 {gunicorn} /opt/graphite/bin/python3 /opt/graphite/bin/gunicorn wsgi --pythonpath=/opt/graphite/webapp/graphite --preload --threads=1 --worker-class=sync --workers=4 --limit-request-line=0 --max-requests=1000 --timeout=65 --bind=0.0
I can see traffic coming in to the interface (tcpdump/tcpflow), and can see logs to bg-carbon.log with references to 'cache query', but almost no datapoint logs for spark metrics.
Any assistance in troubleshooting would be greatly appreciated!
The text was updated successfully, but these errors were encountered: