forked from grobian/carbon-c-relay
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrelay.1
815 lines (775 loc) · 51.9 KB
/
relay.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
.\" generated with Ronn/v0.7.3
.\" http://github.com/rtomayko/ronn/tree/0.7.3
.
.TH "CARBON\-C\-RELAY" "1" "September 2022" "Graphite" "Graphite data collection and visualisation"
.
.SH "NAME"
\fBcarbon\-c\-relay\fR \- graphite relay, aggregator and rewriter
.
.SH "SYNOPSIS"
\fBcarbon\-c\-relay\fR \fB\-f\fR \fIconfig\-file\fR \fB[ options\fR \.\.\. \fB]\fR
.
.SH "DESCRIPTION"
\fBcarbon\-c\-relay\fR accepts, cleanses, matches, rewrites, forwards and aggregates graphite metrics by listening for incoming connections and relaying the messages to other servers defined in its configuration\. The core functionality is to route messages via flexible rules to the desired destinations\.
.
.P
\fBcarbon\-c\-relay\fR is a simple program that reads its routing information from a file\. The command line arguments allow to set the location for this file, as well as the amount of dispatchers (worker threads) to use for reading the data from incoming connections and passing them onto the right destination(s)\. The route file supports two main constructs: clusters and matches\. The first define groups of hosts data metrics can be sent to, the latter define which metrics should be sent to which cluster\. Aggregation rules are seen as matches\. Rewrites are actions that directly affect the metric at the point in which they appear in the configuration\.
.
.P
For every metric received by the relay, cleansing is performed\. The following changes are performed before any match, aggregate or rewrite rule sees the metric:
.
.IP "\(bu" 4
double dot elimination (necessary for correctly functioning consistent hash routing)
.
.IP "\(bu" 4
trailing/leading dot elimination
.
.IP "\(bu" 4
whitespace normalisation (this mostly affects output of the relay to other targets: metric, value and timestamp will be separated by a single space only, ever)
.
.IP "\(bu" 4
irregular char replacement with underscores (_), currently irregular is defined as not being in \fB[0\-9a\-zA\-Z\-_:#]\fR, but can be overridden on the command line\. Note that tags (when present and allowed) are not processed this way\.
.
.IP "" 0
.
.SH "OPTIONS"
These options control the behaviour of \fBcarbon\-c\-relay\fR\.
.
.IP "\(bu" 4
\fB\-v\fR: Print version string and exit\.
.
.IP "\(bu" 4
\fB\-d\fR: Enable debug mode, this prints statistics to stdout and prints extra messages about some situations encountered by the relay that normally would be too verbose to be enabled\. When combined with \fB\-t\fR (test mode) this also prints stub routes and consistent\-hash ring contents\.
.
.IP "\(bu" 4
\fB\-s\fR: Enable submission mode\. In this mode, internal statistics are not generated\. Instead, queue pressure and metrics drops are reported on stdout\. This mode is useful when used as submission relay which\' job is just to forward to (a set of) main relays\. Statistics about the submission relays in this case are not needed, and could easily cause a non\-desired flood of metrics e\.g\. when used on each and every host locally\.
.
.IP "\(bu" 4
\fB\-S\fR: Enable iostat\-like mode where every second the current state of statistics are reported\. This implies submission mode \fB\-s\fR\.
.
.IP "\(bu" 4
\fB\-t\fR: Test mode\. This mode doesn\'t do any routing at all, but instead reads input from stdin and prints what actions would be taken given the loaded configuration\. This mode is very useful for testing relay routes for regular expression syntax etc\. It also allows to give insight on how routing is applied in complex configurations, for it shows rewrites and aggregates taking place as well\. When \fB\-t\fR is repeated, the relay will only test the configuration for validity and exit immediately afterwards\. Any standard output is suppressed in this mode, making it ideal for start\-scripts to test a (new) configuration\.
.
.IP "\(bu" 4
\fB\-f\fR \fIconfig\-file\fR: Read configuration from \fIconfig\-file\fR\. A configuration consists of clusters and routes\. See \fICONFIGURATION SYNTAX\fR for more information on the options and syntax of this file\.
.
.IP "\(bu" 4
\fB\-l\fR \fIlog\-file\fR: Use \fIlog\-file\fR for writing messages\. Without this option, the relay writes both to \fIstdout\fR and \fIstderr\fR\. When logging to file, all messages are prefixed with \fBMSG\fR when they were sent to \fIstdout\fR, and \fBERR\fR when they were sent to \fIstderr\fR\.
.
.IP "\(bu" 4
\fB\-p\fR \fIport\fR: Listen for connections on port \fIport\fR\. The port number is used for both \fBTCP\fR, \fBUDP\fR and \fBUNIX sockets\fR\. In the latter case, the socket file contains the port number\. The port defaults to \fI2003\fR, which is also used by the original \fBcarbon\-cache\.py\fR\. Note that this only applies to the defaults, when \fBlisten\fR directives are in the config, this setting is ignored\.
.
.IP "\(bu" 4
\fB\-w\fR \fIworkers\fR: Use \fIworkers\fR number of threads\. The default number of workers is equal to the amount of detected CPU cores\. It makes sense to reduce this number on many\-core machines, or when the traffic is low\.
.
.IP "\(bu" 4
\fB\-b\fR \fIbatchsize\fR: Set the amount of metrics that sent to remote servers at once to \fIbatchsize\fR\. When the relay sends metrics to servers, it will retrieve \fBbatchsize\fR metrics from the pending queue of metrics waiting for that server and send those one by one\. The size of the batch will have minimal impact on sending performance, but it controls the amount of lock\-contention on the queue\. The default is \fI2500\fR\.
.
.IP "\(bu" 4
\fB\-q\fR \fIqueuesize\fR: Each server from the configuration where the relay will send metrics to, has a queue associated with it\. This queue allows for disruptions and bursts to be handled\. The size of this queue will be set to \fIqueuesize\fR which allows for that amount of metrics to be stored in the queue before it overflows, and the relay starts dropping metrics\. The larger the queue, more metrics can be absorbed, but also more memory will be used by the relay\. The default queue size is \fI25000\fR\.
.
.IP "\(bu" 4
\fB\-L\fR \fIstalls\fR: Sets the max mount of stalls to \fIstalls\fR before the relay starts dropping metrics for a server\. When a queue fills up, the relay uses a mechanism called stalling to signal the client (writing to the relay) of this event\. In particular when the client sends a large amount of metrics in very short time (burst), stalling can help to avoid dropping metrics, since the client just needs to slow down for a bit, which in many cases is possible (e\.g\. when catting a file with \fBnc\fR(1))\. However, this behaviour can also obstruct, artificially stalling writers which cannot stop that easily\. For this the stalls can be set from \fI0\fR to \fI15\fR, where each stall can take around 1 second on the client\. The default value is set to \fI4\fR, which is aimed at the occasional disruption scenario and max effort to not loose metrics with moderate slowing down of clients\.
.
.IP "\(bu" 4
\fB\-C\fR \fICAcertpath\fR: Read CA certs (for use with TLS/SSL connections) from given path or file\. When not given, the default locations are used\. Strict verfication of the peer is performed, so when using self\-signed certificates, be sure to include the CA cert in the default location, or provide the path to the cert using this option\.
.
.IP "\(bu" 4
\fB\-T\fR \fItimeout\fR: Specifies the IO timeout in milliseconds used for server connections\. The default is \fI600\fR milliseconds, but may need increasing when WAN links are used for target servers\. A relatively low value for connection timeout allows the relay to quickly establish a server is unreachable, and as such failover strategies to kick in before the queue runs high\.
.
.IP "\(bu" 4
\fB\-c\fR \fIchars\fR: Defines the characters that are next to \fB[A\-Za\-z0\-9]\fR allowed in metrics to \fIchars\fR\. Any character not in this list, is replaced by the relay with \fB_\fR (underscore)\. The default list of allowed characters is \fI\-_:#\fR\.
.
.IP "\(bu" 4
\fB\-m\fR \fIlength\fR: Limits the metric names to be of at most \fIlength\fR bytes long\. Any lines containing metric names larger than this will be discarded\.
.
.IP "\(bu" 4
\fB\-M\fR \fIlength\fR Limits the input to lines of at most \fIlength\fR bytes\. Any excess lines will be discarded\. Note that \fB\-m\fR needs to be smaller than this value\.
.
.IP "\(bu" 4
\fB\-H\fR \fIhostname\fR: Override hostname determined by a call to \fBgethostname\fR(3) with \fIhostname\fR\. The hostname is used mainly in the statistics metrics \fBcarbon\.relays\.<hostname>\.<\.\.\.>\fR sent by the relay\.
.
.IP "\(bu" 4
\fB\-B\fR \fIbacklog\fR: Sets TCP connection listen backlog to \fIbacklog\fR connections\. The default value is \fI32\fR but on servers which receive many concurrent connections, this setting likely needs to be increased to avoid connection refused errors on the clients\.
.
.IP "\(bu" 4
\fB\-U\fR \fIbufsize\fR: Sets the socket send/receive buffer sizes in bytes, for both TCP and UDP scenarios\. When unset, the OS default is used\. The maximum is also determined by the OS\. The sizes are set using setsockopt with the flags SO_RCVBUF and SO_SNDBUF\. Setting this size may be necessary for large volume scenarios, for which also \fB\-B\fR might apply\. Checking the \fIRecv\-Q\fR and the \fIreceive errors\fR values from \fInetstat\fR gives a good hint about buffer usage\.
.
.IP "\(bu" 4
\fB\-E\fR: Disable disconnecting idle incoming connections\. By default the relay disconnects idle client connections after 10 minutes\. It does this to prevent resources clogging up when a faulty or malicious client keeps on opening connections without closing them\. It typically prevents running out of file descriptors\. For some scenarios, however, it is not desirable for idle connections to be disconnected, hence passing this flag will disable this behaviour\.
.
.IP "\(bu" 4
\fB\-D\fR: Deamonise into the background after startup\. This option requires \fB\-l\fR and \fB\-P\fR flags to be set as well\.
.
.IP "\(bu" 4
\fB\-P\fR \fIpidfile\fR: Write the pid of the relay process to a file called \fIpidfile\fR\. This is in particular useful when daemonised in combination with init managers\.
.
.IP "\(bu" 4
\fB\-O\fR \fIthreshold\fR: The minimum number of rules to find before trying to optimise the ruleset\. The default is \fB50\fR, to disable the optimiser, use \fB\-1\fR, to always run the optimiser use \fB0\fR\. The optimiser tries to group rules to avoid spending excessive time on matching expressions\.
.
.IP "" 0
.
.SH "CONFIGURATION SYNTAX"
The config file supports the following syntax, where comments start with a \fB#\fR character and can appear at any position on a line and suppress input until the end of that line:
.
.IP "" 4
.
.nf
cluster <name>
< <forward | any_of | failover> [useall] |
<carbon_ch | fnv1a_ch | jump_fnv1a_ch> [replication <count>] [dynamic] >
<host[:port][=instance] [proto <udp | tcp>]
[type linemode]
[transport <plain | gzip | lz4 | snappy>
[ssl | mtls <pemcert> <pemkey>]]> \.\.\.
;
cluster <name>
file [ip]
</path/to/file> \.\.\.
;
match
<* | expression \.\.\.>
[validate <expression> else <log | drop>]
send to <cluster \.\.\. | blackhole>
[stop]
;
rewrite <expression>
into <replacement>
;
aggregate
<expression> \.\.\.
every <interval> seconds
expire after <expiration> seconds
[timestamp at <start | middle | end> of bucket]
compute <sum | count | max | min | average |
median | percentile<%> | variance | stddev> write to
<metric>
[compute \.\.\.]
[send to <cluster \.\.\.>]
[stop]
;
send statistics to <cluster \.\.\.>
[stop]
;
statistics
[submit every <interval> seconds]
[reset counters after interval]
[prefix with <prefix>]
[send to <cluster \.\.\.>]
[stop]
;
listen
type linemode [transport <plain | gzip | lz4 | snappy>
[<ssl | mtls> <pemcert>
[protomin <tlsproto>] [protomax <tlsproto>]
[ciphers <ssl\-ciphers>] [ciphersuites <tls\-suite>]
]
]
<<interface[:port] | port> proto <udp | tcp>> \.\.\.
</ptah/to/file proto unix> \.\.\.
;
# tlsproto: <ssl3 | tls1\.0 | tls1\.1 | tls1\.2 | tls1\.3>
# ssl\-ciphers: see ciphers(1)
# tls\-suite: see SSL_CTX_set_ciphersuites(3)
include </path/to/file/or/glob>
;
.
.fi
.
.IP "" 0
.
.SS "CLUSTERS"
Multiple clusters can be defined, and need not to be referenced by a match rule\. All clusters point to one or more hosts, except the \fBfile\fR cluster which writes to files in the local filesystem\. \fBhost\fR may be an IPv4 or IPv6 address, or a hostname\. Since host is followed by an optional \fB:\fR and port, for IPv6 addresses not to be interpreted wrongly, either a port must be given, or the IPv6 address surrounded by brackets, e\.g\. \fB[::1]\fR\. Optional \fBtransport\fR and \fBproto\fR clauses can be used to wrap the connection in a compression or encryption layer or specify the use of UDP or TCP to connect to the remote server\. When omitted the connection defaults to a plain TCP connection\. \fBtype\fR can only be \fBlinemode\fR at the moment, e\.g\. Python\'s pickle mode is not supported\.
.
.P
DNS hostnames are resolved to a single address, according to the preference rules in RFC 3484 \fIhttps://www\.ietf\.org/rfc/rfc3484\.txt\fR\. The \fBany_of\fR, \fBfailover\fR and \fBforward\fR clusters have an explicit \fBuseall\fR flag that enables expansion for hostnames resolving to multiple addresses\. Using this option, each address of any type becomes a cluster destination\. This means for instance that both IPv4 and IPv6 addresses are added\.
.
.P
There are two groups of cluster types, simple forwarding clusters and consistent hashing clusters\.
.
.IP "\(bu" 4
\fBforward\fR and \fBfile\fR clusters
.
.IP
The \fBforward\fR and \fBfile\fR clusters simply send everything they receive to the defined members (host addresses or files)\. When a cluster has multiple members, all incoming metrics are sent to \fIall\fR members, basically duplicating the input metric stream over all members\.
.
.IP "\(bu" 4
\fBany_of\fR cluster
.
.IP
The \fBany_of\fR cluster is a small variant of the \fBforward\fR cluster, but instead of sending the input metrics to all defined members, it sends each incoming metric to only one of the defined members\. The purpose of this is a load\-balanced scenario where any of the members can receive any metric\. As \fBany_of\fR suggests, when any of the members become unreachable, the remaining available members will immediately receive the full input stream of metrics\. This specifically mean that when 4 members are used, each will receive approximately 25% of the input metrics\. When one member becomes unavailable (e\.g\. network interruption, or a restart of the service), the remaining 3 members will each receive about 25% / 3 = ~8% more traffic (33%)\. Alternatively, they will receive 1/3rd the total input\. When designing cluster capacity, one should take into account that in the most extreme case, the final remaining member will receive all input traffic\.
.
.IP
An \fBany_of\fR cluster can in particular be useful when the cluster points to other relays or caches\. When used with other relays, it effectively load\-balances, and adapts immediately to inavailability of targets\. When used with caches, there is a small detail to how \fBany_of\fR works, that makes it very suitable\. The implementation of this router is not to round\-robin over any available members, but instead it uses a consistent hashing strategy to deliver the same metrics to the same destination all the time\. This helps caches, and makes it easier to retrieve uncommitted datapoints (from a single cache), but still allows for a rolling restart of the caches\. When a member becomes unavailable, the hash destinations are not changed, but instead traffic destined for the unavailable node is spread evenly over available nodes\.
.
.IP "\(bu" 4
\fBfailover\fR cluster
.
.IP
The \fBfailover\fR cluster is like the \fBany_of\fR cluster, but sticks to the order in which servers are defined\. This is to implement a pure failover scenario between servers\. All metrics are sent to at most 1 member, so no hashing or balancing is taking place\. For example, a \fBfailover\fR cluster with two members will only send metrics to the second member when the first member becomes unavailable\. As soon as the first member returns, all metrics are sent to the first node again\.
.
.IP "\(bu" 4
\fBcarbon_ch\fR cluster
.
.IP
The \fBcarbon_ch\fR cluster sends the metrics to the member that is responsible according to the consistent hash algorithm as used in the original carbon python relay\. Multiple members are possible if replication is set to a value higher than 1\. When \fBdynamic\fR is set, failure of any of the servers does not result in metrics being dropped for that server, but instead the undeliverable metrics are sent to any other server in the cluster in order for the metrics not to get lost\. This is most useful when replication is 1\.
.
.IP
The calculation of the hashring, that defines the way in which metrics are distributed, is based on the server host (or IP address) and the optional \fBinstance\fR of the member\. This means that using \fBcarbon_ch\fR two targets on different ports but on the same host will map to the same hashkey, which means no distribution of metrics takes place\. The instance is used to remedy that situation\. An instance is appended to the member after the port, and separated by an equals sign, e\.g\. \fB127\.0\.0\.1:2006=a\fR for instance \fBa\fR\. Instances are a concept introduced by original python carbon\-cache, and need to be used in accordance with the configuration of those\.
.
.IP
Consistent hashes are consistent in the sense that removal of a member from the cluster should not result in a complete re\-mapping of all metrics to members, but instead only add the metrics from the removed member to all remaining members, approximately each gets its fair share\. The other way around, when a member is added, each member should see a subset of its metrics now being addressed to the new member\. This is an important advantage over a normal hash, where each removal or addition of members (also via e\.g\. a change in their IP address or hostname) would cause a full re\-mapping of all metrics over all available metrics\.
.
.IP "\(bu" 4
\fBfnv1a_ch\fR cluster
.
.IP
The \fBfnv1a_ch\fR cluster is an incompatible improvement to \fBcarbon_ch\fR introduced by carbon\-c\-relay\. It uses a different hash technique (FNV1a) which is faster than the MD5\-hashing used by \fBcarbon_ch\fR\. More importantly, \fBfnv1a_ch\fR clusters use both host and port to distinguish the members\. This is useful when multiple targets live on the same host just separated by port\.
.
.IP
Since the \fBinstance\fR property is no longer necessary with \fBfnv1a_ch\fR this way, it is used to completely override the host:port string that the hashkey would be calculated off\. This is an important aspect, since the hashkey defines which metrics a member receives\. Such override allows for many things, including masquerading old IP addresses e\.g\. when a machine was migrated to newer hardware\. An example of this would be \fB10\.0\.0\.5:2003=10\.0\.0\.2:2003\fR where a machine at address 5 now receives the metrics for a machine that were at address 2\.
.
.IP
While using instances this way can be very useful to perform migrations in existing clusters, for newly to setup clusters, instances allow to avoid this work by using an instance from day one to detach the machine location from the metrics it receives\. Consider for instance \fB10\.0\.0\.1:2003=4d79d13554fa1301476c1f9fe968b0ac\fR where a random hash as instance is used\. This would allow to change port and/or ip address of the server that receives data as many times without having deal with any legacy being visible, assuming the random hash is retained\. Note that since the instance name is used as full hash input, instances as \fBa\fR, \fBb\fR, etc\. will likely result in poor hash distribution, as their hashes have very little input\. For that reason, consider using longer and mostly differing instance names such as random hashes as used in above example for better hash distribution behaviour\.
.
.IP "\(bu" 4
\fBjump_fnv1a_ch\fR cluster
.
.IP
The \fBjump_fnv1a_ch\fR cluster is also a consistent hash cluster like the previous two, but it does not take the member host, port or instance into account at all\. This means this cluster type looks at the order in which members are defined, see also below for more on this order\. Whether this is useful to you depends on your scenario\. In contrast to the previous two consistent hash cluster types, the jump hash has an almost perfect balancing over the members defined in the cluster\. However, this comes at the expense of not being able to remove any member but the last from the cluster without causing a complete re\-mapping of all metrics over all members\. What this basically means is that this hash is fine to use with constant or ever growing clusters where older nodes are never removed, but replaced instead\.
.
.IP
If you have a cluster where removal of old nodes takes place, the jump hash is not suitable for you\. Jump hash works with servers in an ordered list\. Since this order is important, it can be made explicit using the instance as used in previous cluster types\. When an instance is given with the members, it will be used as sorting key\. Without this instance, the order will be as given in the configuration file, which may be prone to changes when e\.g\. generated by some config management software\. As such, it is probably a good practice to fix the order of the servers with instances such that it is explicit what the right nodes for the jump hash are\. One can just use numbers for these, but be aware that sorting of 1, 2 and 10 results in 1, 10, 2, so better to use something like P0001, P0002, P0010 instead\.
.
.IP "" 0
.
.SS "MATCHES"
Match rules are the way to direct incoming metrics to one or more clusters\. Match rules are processed top to bottom as they are defined in the file\. It is possible to define multiple matches in the same rule\. Each match rule can send data to one or more clusters\. Since match rules "fall through" unless the \fBstop\fR keyword is added, carefully crafted match expression can be used to target multiple clusters or aggregations\. This ability allows to replicate metrics, as well as send certain metrics to alternative clusters with careful ordering and usage of the \fBstop\fR keyword\. The special cluster \fBblackhole\fR discards any metrics sent to it\. This can be useful for weeding out unwanted metrics in certain cases\. Because throwing metrics away is pointless if other matches would accept the same data, a match with as destination the blackhole cluster, has an implicit \fBstop\fR\. The \fBvalidation\fR clause adds a check to the data (what comes after the metric) in the form of a regular expression\. When this expression matches, the match rule will execute as if no validation clause was present\. However, if it fails, the match rule is aborted, and no metrics will be sent to destinations, this is the \fBdrop\fR behaviour\. When \fBlog\fR is used, the metric is logged to stderr\. Care should be taken with the latter to avoid log flooding\. When a validate clause is present, destinations need not to be present, this allows for applying a global validation rule\. Note that the cleansing rules are applied before validation is done, thus the data will not have duplicate spaces\. The \fBroute using\fR clause is used to perform a temporary modification to the key used for input to the consistent hashing routines\. The primary purpose is to route traffic so that appropriate data is sent to the needed aggregation instances\.
.
.SS "REWRITES"
Rewrite rules take a regular expression as input to match incoming metrics, and transform them into the desired new metric name\. In the replacement, backreferences are allowed to match capture groups defined in the input regular expression\. A match of \fBserver\e\.(x|y|z)\e\.\fR allows to use e\.g\. \fBrole\.\e1\.\fR in the substitution\. A few caveats apply to the current implementation of rewrite rules\. First, their location in the config file determines when the rewrite is performed\. The rewrite is done in\-place, as such a match rule before the rewrite would match the original name, a match rule after the rewrite no longer matches the original name\. Care should be taken with the ordering, as multiple rewrite rules in succession can take place, e\.g\. \fBa\fR gets replaced by \fBb\fR and \fBb\fR gets replaced by \fBc\fR in a succeeding rewrite rule\. The second caveat with the current implementation, is that the rewritten metric names are not cleansed, like newly incoming metrics are\. Thus, double dots and potential dangerous characters can appear if the replacement string is crafted to produce them\. It is the responsibility of the writer to make sure the metrics are clean\. If this is an issue for routing, one can consider to have a rewrite\-only instance that forwards all metrics to another instance that will do the routing\. Obviously the second instance will cleanse the metrics as they come in\. The backreference notation allows to lowercase and uppercase the replacement string with the use of the underscore (\fB_\fR) and carret (\fB^\fR) symbols following directly after the backslash\. For example, \fBrole\.\e_1\.\fR as substitution will lowercase the contents of \fB\e1\fR\. The dot (\fB\.\fR) can be used in a similar fashion, or followed after the underscore or caret to replace dots with underscores in the substitution\. This can be handy for some situations where metrics are sent to graphite\.
.
.SS "AGGREGATIONS"
The aggregations defined take one or more input metrics expressed by one or more regular expresions, similar to the match rules\. Incoming metrics are aggregated over a period of time defined by the interval in seconds\. Since events may arrive a bit later in time, the expiration time in seconds defines when the aggregations should be considered final, as no new entries are allowed to be added any more\. On top of an aggregation multiple aggregations can be computed\. They can be of the same or different aggregation types, but should write to a unique new metric\. The metric names can include back references like in rewrite expressions, allowing for powerful single aggregation rules that yield in many aggregations\. When no \fBsend to\fR clause is given, produced metrics are sent to the relay as if they were submitted from the outside, hence match and aggregation rules apply to those\. Care should be taken that loops are avoided this way\. For this reason, the use of the \fBsend to\fR clause is encouraged, to direct the output traffic where possible\. Like for match rules, it is possible to define multiple cluster targets\. Also, like match rules, the \fBstop\fR keyword applies to control the flow of metrics in the matching process\.
.
.SS "STATISTICS"
The \fBsend statistics to\fR construct is deprecated and will be removed in the next release\. Use the special \fBstatistics\fR construct instead\.
.
.P
The \fBstatistics\fR construct can control a couple of things about the (internal) statistics produced by the relay\. The \fBsend to\fR target can be used to avoid router loops by sending the statistics to a certain destination cluster(s)\. By default the metrics are prefixed with \fBcarbon\.relays\.<hostname>\fR, where hostname is determinted on startup and can be overridden using the \fB\-H\fR argument\. This prefix can be set using the \fBprefix with\fR clause similar to a rewrite rule target\. The input match in this case is the pre\-set regular expression \fB^(([^\.]+)(\e\.\.*)?)$\fR on the hostname\. As such, one can see that the default prefix is set by \fBcarbon\.relays\.\e\.1\fR\. Note that this uses the replace\-dot\-with\-underscore replacement feature from rewrite rules\. Given the input expression, the following match groups are available: \fB\e1\fR the entire hostname, \fB\e2\fR the short hostname and \fB\e3\fR the domainname (with leading dot)\. It may make sense to replace the default by something like \fBcarbon\.relays\.\e_2\fR for certain scenarios, to always use the lowercased short hostname, which following the expression doesn\'t contain a dot\. By default, the metrics are submitted every 60 seconds, this can be changed using the \fBsubmit every <interval> seconds\fR clause\.
.
.br
To obtain a more compatible set of values to carbon\-cache\.py, use the \fBreset counters after interval\fR clause to make values non\-cumulative, that is, they will report the change compared to the previous value\.
.
.SS "LISTENERS"
The ports and protocols the relay should listen for incoming connections can be specified using the \fBlisten\fR directive\. Currently, all listeners need to be of \fBlinemode\fR type\. An optional compression or encryption wrapping can be specified for the port and optional interface given by ip address, or unix socket by file\. When interface is not specified, the any interface on all available ip protocols is assumed\. If no \fBlisten\fR directive is present, the relay will use the default listeners for port 2003 on tcp and udp, plus the unix socket \fB/tmp/\.s\.carbon\-c\-relay\.2003\fR\. This typically expands to 5 listeners on an IPv6 enabled system\. The default matches the behaviour of versions prior to v3\.2\.
.
.SS "INCLUDES"
In case configuration becomes very long, or is managed better in separate files, the \fBinclude\fR directive can be used to read another file\. The given file will be read in place and added to the router configuration at the time of inclusion\. The end result is one big route configuration\. Multiple \fBinclude\fR statements can be used throughout the configuration file\. The positioning will influence the order of rules as normal\. Beware that recursive inclusion (\fBinclude\fR from an included file) is supported, and currently no safeguards exist for an inclusion loop\. For what is worth, this feature likely is best used with simple configuration files (e\.g\. not having \fBinclude\fR in them)\.
.
.SH "EXAMPLES"
\fBcarbon\-c\-relay\fR evolved over time, growing features on demand as the tool proved to be stable and fitting the job well\. Below follow some annotated examples of constructs that can be used with the relay\.
.
.P
Clusters can be defined as much as necessary\. They receive data from match rules, and their type defines which members of the cluster finally get the metric data\. The simplest cluster form is a \fBforward\fR cluster:
.
.IP "" 4
.
.nf
cluster send\-through
forward
10\.1\.0\.1
;
.
.fi
.
.IP "" 0
.
.P
Any metric sent to the \fBsend\-through\fR cluster would simply be forwarded to the server at IPv4 address \fB10\.1\.0\.1\fR\. If we define multiple servers, all of those servers would get the same metric, thus:
.
.IP "" 4
.
.nf
cluster send\-through
forward
10\.1\.0\.1
10\.2\.0\.1
;
.
.fi
.
.IP "" 0
.
.P
The above results in a duplication of metrics send to both machines\. This can be useful, but most of the time it is not\. The \fBany_of\fR cluster type is like \fBforward\fR, but it sends each incoming metric to any of the members\. The same example with such cluster would be:
.
.IP "" 4
.
.nf
cluster send\-to\-any\-one
any_of 10\.1\.0\.1:2010 10\.1\.0\.1:2011;
.
.fi
.
.IP "" 0
.
.P
This would implement a multipath scenario, where two servers are used, the load between them is spread, but should any of them fail, all metrics are sent to the remaining one\. This typically works well for upstream relays, or for balancing carbon\-cache processes running on the same machine\. Should any member become unavailable, for instance due to a rolling restart, the other members receive the traffic\. If it is necessary to have true fail\-over, where the secondary server is only used if the first is down, the following would implement that:
.
.IP "" 4
.
.nf
cluster try\-first\-then\-second
failover 10\.1\.0\.1:2010 10\.1\.0\.1:2011;
.
.fi
.
.IP "" 0
.
.P
These types are different from the two consistent hash cluster types:
.
.IP "" 4
.
.nf
cluster graphite
carbon_ch
127\.0\.0\.1:2006=a
127\.0\.0\.1:2007=b
127\.0\.0\.1:2008=c
;
.
.fi
.
.IP "" 0
.
.P
If a member in this example fails, all metrics that would go to that member are kept in the queue, waiting for the member to return\. This is useful for clusters of carbon\-cache machines where it is desirable that the same metric ends up on the same server always\. The \fBcarbon_ch\fR cluster type is compatible with carbon\-relay consistent hash, and can be used for existing clusters populated by carbon\-relay\. For new clusters, however, it is better to use the \fBfnv1a_ch\fR cluster type, for it is faster, and allows to balance over the same address but different ports without an instance number, in constrast to \fBcarbon_ch\fR\.
.
.P
Because we can use multiple clusters, we can also replicate without the use of the \fBforward\fR cluster type, in a more intelligent way:
.
.IP "" 4
.
.nf
cluster dc\-old
carbon_ch replication 2
10\.1\.0\.1
10\.1\.0\.2
10\.1\.0\.3
;
cluster dc\-new1
fnv1a_ch replication 2
10\.2\.0\.1
10\.2\.0\.2
10\.2\.0\.3
;
cluster dc\-new2
fnv1a_ch replication 2
10\.3\.0\.1
10\.3\.0\.2
10\.3\.0\.3
;
match *
send to dc\-old
;
match *
send to
dc\-new1
dc\-new2
stop
;
.
.fi
.
.IP "" 0
.
.P
In this example all incoming metrics are first sent to \fBdc\-old\fR, then \fBdc\-new1\fR and finally to \fBdc\-new2\fR\. Note that the cluster type of \fBdc\-old\fR is different\. Each incoming metric will be send to 2 members of all three clusters, thus replicating to in total 6 destinations\. For each cluster the destination members are computed independently\. Failure of clusters or members does not affect the others, since all have individual queues\. The above example could also be written using three match rules for each dc, or one match rule for all three dcs\. The difference is mainly in performance, the number of times the incoming metric has to be matched against an expression\. The \fBstop\fR rule in \fBdc\-new\fR match rule is not strictly necessary in this example, because there are no more following match rules\. However, if the match would target a specific subset, e\.g\. \fB^sys\e\.\fR, and more clusters would be defined, this could be necessary, as for instance in the following abbreviated example:
.
.IP "" 4
.
.nf
cluster dc1\-sys \.\.\. ;
cluster dc2\-sys \.\.\. ;
cluster dc1\-misc \.\.\. ;
cluster dc2\-misc \.\.\. ;
match ^sys\e\. send to dc1\-sys;
match ^sys\e\. send to dc2\-sys stop;
match * send to dc1\-misc;
match * send to dc2\-misc stop;
.
.fi
.
.IP "" 0
.
.P
As can be seen, without the \fBstop\fR in dc2\-sys\' match rule, all metrics starting with \fBsys\.\fR would also be send to dc1\-misc and dc2\-misc\. It can be that this is desired, of course, but in this example there is a dedicated cluster for the \fBsys\fR metrics\.
.
.P
Suppose there would be some unwanted metric that unfortunately is generated, let\'s assume some bad/old software\. We don\'t want to store this metric\. The \fBblackhole\fR cluster is suitable for that, when it is harder to actually whitelist all wanted metrics\. Consider the following:
.
.IP "" 4
.
.nf
match
some_legacy1$
some_legacy2$
send to blackhole
stop;
.
.fi
.
.IP "" 0
.
.P
This would throw away all metrics that end with \fBsome_legacy\fR, that would otherwise be hard to filter out\. Since the order matters, it can be used in a construct like this:
.
.IP "" 4
.
.nf
cluster old \.\.\. ;
cluster new \.\.\. ;
match * send to old;
match unwanted send to blackhole stop;
match * send to new;
.
.fi
.
.IP "" 0
.
.P
In this example the old cluster would receive the metric that\'s unwanted for the new cluster\. So, the order in which the rules occur does matter for the execution\.
.
.P
Validation can be used to ensure the data for metrics is as expected\. A global validation for just number (no floating point) values could be:
.
.IP "" 4
.
.nf
match *
validate ^[0\-9]+\e [0\-9]+$ else drop
;
.
.fi
.
.IP "" 0
.
.P
(Note the escape with backslash \fB\e\fR of the space, you might be able to use \fB\es\fR or \fB[:space:]\fR instead, this depends on your configured regex implementation\.)
.
.P
The validation clause can exist on every match rule, so in principle, the following is valid:
.
.IP "" 4
.
.nf
match ^foo
validate ^[0\-9]+\e [0\-9]+$ else drop
send to integer\-cluster
;
match ^foo
validate ^[0\-9\.e+\-]+\e [0\-9\.e+\-]+$ else drop
send to float\-cluster
stop;
.
.fi
.
.IP "" 0
.
.P
Note that the behaviour is different in the previous two examples\. When no \fBsend to\fR clusters are specified, a validation error makes the match behave like the \fBstop\fR keyword is present\. Likewise, when validation passes, processing continues with the next rule\. When destination clusters are present, the \fBmatch\fR respects the \fBstop\fR keyword as normal\. When specified, processing will always stop when specified so\. However, if validation fails, the rule does not send anything to the destination clusters, the metric will be dropped or logged, but never sent\.
.
.P
The relay is capable of rewriting incoming metrics on the fly\. This process is done based on regular expressions with capture groups that allow to substitute parts in a replacement string\. Rewrite rules allow to cleanup metrics from applications, or provide a migration path\. In it\'s simplest form a rewrite rule looks like this:
.
.IP "" 4
.
.nf
rewrite ^server\e\.(\.+)\e\.(\.+)\e\.([a\-zA\-Z]+)([0\-9]+)
into server\.\e_1\.\e2\.\e3\.\e3\e4
;
.
.fi
.
.IP "" 0
.
.P
In this example a metric like \fBserver\.DC\.role\.name123\fR would be transformed into \fBserver\.dc\.role\.name\.name123\fR\. For rewrite rules hold the same as for matches, that their order matters\. Hence to build on top of the old/new cluster example done earlier, the following would store the original metric name in the old cluster, and the new metric name in the new cluster:
.
.IP "" 4
.
.nf
match * send to old;
rewrite \.\.\. ;
match * send to new;
.
.fi
.
.IP "" 0
.
.P
Note that after the rewrite, the original metric name is no longer available, as the rewrite happens in\-place\.
.
.P
Aggregations are probably the most complex part of carbon\-c\-relay\. Two ways of specifying aggregates are supported by carbon\-c\-relay\. The first, static rules, are handled by an optimiser which tries to fold thousands of rules into groups to make the matching more efficient\. The second, dynamic rules, are very powerful compact definitions with possibly thousands of internal instantiations\. A typical static aggregation looks like:
.
.IP "" 4
.
.nf
aggregate
^sys\e\.dc1\e\.somehost\-[0\-9]+\e\.somecluster\e\.mysql\e\.replication_delay
^sys\e\.dc2\e\.somehost\-[0\-9]+\e\.somecluster\e\.mysql\e\.replication_delay
every 10 seconds
expire after 35 seconds
timestamp at end of bucket
compute sum write to
mysql\.somecluster\.total_replication_delay
compute average write to
mysql\.somecluster\.average_replication_delay
compute max write to
mysql\.somecluster\.max_replication_delay
compute count write to
mysql\.somecluster\.replication_delay_metric_count
;
.
.fi
.
.IP "" 0
.
.P
In this example, four aggregations are produced from the incoming matching metrics\. In this example we could have written the two matches as one, but for demonstration purposes we did not\. Obviously they can refer to different metrics, if that makes sense\. The \fBevery 10 seconds\fR clause specifies in what interval the aggregator can expect new metrics to arrive\. This interval is used to produce the aggregations, thus each 10 seconds 4 new metrics are generated from the data received sofar\. Because data may be in transit for some reason, or generation stalled, the \fBexpire after\fR clause specifies how long the data should be kept before considering a data bucket (which is aggregated) to be complete\. In the example, 35 was used, which means after 35 seconds the first aggregates are produced\. It also means that metrics can arrive 35 seconds late, and still be taken into account\. The exact time at which the aggregate metrics are produced is random between 0 and interval (10 in this case) seconds after the expiry time\. This is done to prevent thundering herds of metrics for large aggregation sets\. The \fBtimestamp\fR that is used for the aggregations can be specified to be the \fBstart\fR, \fBmiddle\fR or \fBend\fR of the bucket\. Original carbon\-aggregator\.py uses \fBstart\fR, while carbon\-c\-relay\'s default has always been \fBend\fR\. The \fBcompute\fR clauses demonstrate a single aggregation rule can produce multiple aggregates, as often is the case\. Internally, this comes for free, since all possible aggregates are always calculated, whether or not they are used\. The produced new metrics are resubmitted to the relay, hence matches defined before in the configuration can match output of the aggregator\. It is important to avoid loops, that can be generated this way\. In general, splitting aggregations to their own carbon\-c\-relay instance, such that it is easy to forward the produced metrics to another relay instance is a good practice\.
.
.P
The previous example could also be written as follows to be dynamic:
.
.IP "" 4
.
.nf
aggregate
^sys\e\.dc[0\-9]\.(somehost\-[0\-9]+)\e\.([^\.]+)\e\.mysql\e\.replication_delay
every 10 seconds
expire after 35 seconds
compute sum write to
mysql\.host\.\e1\.replication_delay
compute sum write to
mysql\.host\.all\.replication_delay
compute sum write to
mysql\.cluster\.\e2\.replication_delay
compute sum write to
mysql\.cluster\.all\.replication_delay
;
.
.fi
.
.IP "" 0
.
.P
Here a single match, results in four aggregations, each of a different scope\. In this example aggregation based on hostname and cluster are being made, as well as the more general \fBall\fR targets, which in this example have both identical values\. Note that with this single aggregation rule, both per\-cluster, per\-host and total aggregations are produced\. Obviously, the input metrics define which hosts and clusters are produced\.
.
.P
With use of the \fBsend to\fR clause, aggregations can be made more intuitive and less error\-prone\. Consider the below example:
.
.IP "" 4
.
.nf
cluster graphite fnv1a_ch ip1 ip2 ip3;
aggregate ^sys\e\.somemetric
every 60 seconds
expire after 75 seconds
compute sum write to
sys\.somemetric
send to graphite
stop
;
match * send to graphite;
.
.fi
.
.IP "" 0
.
.P
It sends all incoming metrics to the graphite cluster, except the sys\.somemetric ones, which it replaces with a sum of all the incoming ones\. Without a \fBstop\fR in the aggregate, this causes a loop, and without the \fBsend to\fR, the metric name can\'t be kept its original name, for the output now directly goes to the cluster\.
.
.P
When configuring cluster you might want to check how the metrics will be routed and hashed\. That\'s what the \fB\-t\fR flag is for\. For the following configuration:
.
.IP "" 4
.
.nf
cluster graphite_swarm_odd
fnv1a_ch replication 1
host01\.dom:2003=31F7A65E315586AC198BD798B6629CE4903D089947
host03\.dom:2003=9124E29E0C92EB63B3834C1403BD2632AA7508B740
host05\.dom:2003=B653412CD96B13C797658D2C48D952AEC3EB667313
;
cluster graphite_swarm_even
fnv1a_ch replication 1
host02\.dom:2003=31F7A65E315586AC198BD798B6629CE4903D089947
host04\.dom:2003=9124E29E0C92EB63B3834C1403BD2632AA7508B740
host06\.dom:2003=B653412CD96B13C797658D2C48D952AEC3EB667313
;
match *
send to
graphite_swarm_odd
graphite_swarm_even
stop
;
.
.fi
.
.IP "" 0
.
.P
Running the command: \fBecho "my\.super\.metric" | carbon\-c\-relay \-f config\.conf \-t\fR, will result in:
.
.IP "" 4
.
.nf
[\.\.\.]
match
* \-> my\.super\.metric
fnv1a_ch(graphite_swarm_odd)
host03\.dom:2003
fnv1a_ch(graphite_swarm_even)
host04\.dom:2003
stop
.
.fi
.
.IP "" 0
.
.P
You now know that your metric \fBmy\.super\.metric\fR will be hashed and arrive on the host03 and host04 machines\. Adding the \fB\-d\fR flag will increase the amount of information by showing you the hashring
.
.SH "STATISTICS"
When \fBcarbon\-c\-relay\fR is run without \fB\-d\fR or \fB\-s\fR arguments, statistics will be produced\. By default they are sent to the relay itself in the form of \fBcarbon\.relays\.<hostname>\.*\fR\. See the \fBstatistics\fR construct to override this prefix, sending interval and values produced\. While many metrics have a similar name to what carbon\-cache\.py would produce, their values are likely different\. By default, most values are running counters which only increase over time\. The use of the nonNegativeDerivative() function from graphite is useful with these\.
.
.P
The following metrics are produced under the \fBcarbon\.relays\.<hostname>\fR namespace:
.
.IP "\(bu" 4
metricsReceived
.
.IP
The number of metrics that were received by the relay\. Received here means that they were seen and processed by any of the dispatchers\.
.
.IP "\(bu" 4
metricsSent
.
.IP
The number of metrics that were sent from the relay\. This is a total count for all servers combined\. When incoming metrics are duplicated by the cluster configuration, this counter will include all those duplications\. In other words, the amount of metrics that were successfully sent to other systems\. Note that metrics that are processed (received) but still in the sending queue (queued) are not included in this counter\.
.
.IP "\(bu" 4
metricsDiscarded
.
.IP
The number of input lines that were not considered to be a valid metric\. Such lines can be empty, only containing whitespace, or hitting the limits given for max input length and/or max metric length (see \fB\-m\fR and \fB\-M\fR options)\.
.
.IP "\(bu" 4
metricsQueued
.
.IP
The total number of metrics that are currently in the queues for all the server targets\. This metric is not cumulative, for it is a sample of the queue size, which can (and should) go up and down\. Therefore you should not use the derivative function for this metric\.
.
.IP "\(bu" 4
metricsDropped
.
.IP
The total number of metric that had to be dropped due to server queues overflowing\. A queue typically overflows when the server it tries to send its metrics to is not reachable, or too slow in ingesting the amount of metrics queued\. This can be network or resource related, and also greatly depends on the rate of metrics being sent to the particular server\.
.
.IP "\(bu" 4
metricsBlackholed
.
.IP
The number of metrics that did not match any rule, or matched a rule with blackhole as target\. Depending on your configuration, a high value might be an indication of a misconfiguration somewhere\. These metrics were received by the relay, but never sent anywhere, thus they disappeared\.
.
.IP "\(bu" 4
metricStalls
.
.IP
The number of times the relay had to stall a client to indicate that the downstream server cannot handle the stream of metrics\. A stall is only performed when the queue is full and the server is actually receptive of metrics, but just too slow at the moment\. Stalls typically happen during micro\-bursts, where the client typically is unaware that it should stop sending more data, while it is able to\.
.
.IP "\(bu" 4
connections
.
.IP
The number of connect requests handled\. This is an ever increasing number just counting how many connections were accepted\.
.
.IP "\(bu" 4
disconnects
.
.IP
The number of disconnected clients\. A disconnect either happens because the client goes away, or due to an idle timeout in the relay\. The difference between this metric and connections is the amount of connections actively held by the relay\. In normal situations this amount remains within reasonable bounds\. Many connections, but few disconnections typically indicate a possible connection leak in the client\. The idle connections disconnect in the relay here is to guard against resource drain in such scenarios\.
.
.IP "\(bu" 4
dispatch_wallTime_us
.
.IP
The number of microseconds spent by the dispatchers to do their work\. In particular on multi\-core systems, this value can be confusing, however, it indicates how long the dispatchers were doing work handling clients\. It includes everything they do, from reading data from a socket, cleaning up the input metric, to adding the metric to the appropriate queues\. The larger the configuration, and more complex in terms of matches, the more time the dispatchers will spend on the cpu\. But also time they do /not/ spend on the cpu is included in this number\. It is the pure wallclock time the dispatcher was serving a client\.
.
.IP "\(bu" 4
dispatch_sleepTime_us
.
.IP
The number of microseconds spent by the dispatchers sleeping waiting for work\. When this value gets small (or even zero) the dispatcher has so much work that it doesn\'t sleep any more, and likely can\'t process the work in a timely fashion any more\. This value plus the wallTime from above sort of sums up to the total uptime taken by this dispatcher\. Therefore, expressing the wallTime as percentage of this sum gives the busyness percentage draining all the way up to 100% if sleepTime goes to 0\.
.
.IP "\(bu" 4
server_wallTime_us
.
.IP
The number of microseconds spent by the servers to send the metrics from their queues\. This value includes connection creation, reading from the queue, and sending metrics over the network\.
.
.IP "\(bu" 4
dispatcherX
.
.IP
For each indivual dispatcher, the metrics received and blackholed plus the wall clock time\. The values are as described above\.
.
.IP "\(bu" 4
destinations\.X
.
.IP
For all known destinations, the number of dropped, queued and sent metrics plus the wall clock time spent\. The values are as described above\.
.
.IP "\(bu" 4
aggregators\.metricsReceived
.
.IP
The number of metrics that were matched an aggregator rule and were accepted by the aggregator\. When a metric matches multiple aggregators, this value will reflect that\. A metric is not counted when it is considered syntactically invalid, e\.g\. no value was found\.
.
.IP "\(bu" 4
aggregators\.metricsDropped
.
.IP
The number of metrics that were sent to an aggregator, but did not fit timewise\. This is either because the metric was too far in the past or future\. The expire after clause in aggregate statements controls how long in the past metric values are accepted\.
.
.IP "\(bu" 4
aggregators\.metricsSent
.
.IP
The number of metrics that were sent from the aggregators\. These metrics were produced and are the actual results of aggregations\.
.
.IP "" 0
.
.SH "BUGS"
Please report them at: \fIhttps://github\.com/grobian/carbon\-c\-relay/issues\fR
.
.SH "AUTHOR"
Fabian Groffen <grobian@gentoo\.org>
.
.SH "SEE ALSO"
All other utilities from the graphite stack\.
.
.P
This project aims to be a fast replacement of the original Carbon relay \fIhttp://graphite\.readthedocs\.org/en/1\.0/carbon\-daemons\.html#carbon\-relay\-py\fR\. \fBcarbon\-c\-relay\fR aims to deliver performance and configurability\. Carbon is single threaded, and sending metrics to multiple consistent\-hash clusters requires chaining of relays\. This project provides a multithreaded relay which can address multiple targets and clusters for each and every metric based on pattern matches\.
.
.P
There are a couple more replacement projects out there, which are carbon\-relay\-ng \fIhttps://github\.com/graphite\-ng/carbon\-relay\-ng\fR and graphite\-relay \fIhttps://github\.com/markchadwick/graphite\-relay\fR\.
.
.P
Compared to carbon\-relay\-ng, this project does provide carbon\'s consistent\-hash routing\. graphite\-relay, which does this, however doesn\'t do metric\-based matches to direct the traffic, which this project does as well\. To date, carbon\-c\-relay can do aggregations, failover targets and more\.
.
.SH "ACKNOWLEDGEMENTS"
This program was originally developed for Booking\.com, which approved that the code was published and released as Open Source on GitHub, for which the author would like to express his gratitude\. Development has continued since with the help of many contributors suggesting features, reporting bugs, adding patches and more to make carbon\-c\-relay into what it is today\.