otlpmetrichttp: load-balance between multiple endpoint ips #5838
Labels
enhancement
New feature or request
response needed
Waiting on user input before progress can be made
Problem Statement
I want to horizontally scale the OTel collector and have the SDK (somewhat evenly) distribute requests to collector instances.
I have a Headless Service for my collector that returns all instances when querying via DNS:
$ dig otelcol ;; ANSWER SECTION: otelcol. 600 IN A 172.22.0.5 otelcol. 600 IN A 172.22.0.8
However, because the Go HTTP Client which this package uses keeps the tcp connection alive, the SDK sticks to the first ever returned address until it becomes unreachable.
This also applies to regular k8s Services, because once the tcp conn is opened, no further loadbalancing from the k8s side takes place.
There is golang/go#34511 requesting this for the standard library, but no real progress has been made since 2019.
Proposed Solution
Instead of relying on the HTTP Client to determine the endpoint out of the DNS list, do the following:
If deemed acceptable, I am happy to contribute this functionality
Alternatives
Disable Keepalive
By disabling TCP keepalive, a new connection is made on every request, which includes a DNS lookup.
I confirmed this works by mangling with SDK internals, but is inefficient.
Use custom RoundTripper
In the Go issue the use of https://github.com/CAFxX/balancer is suggested.
This however leads to a DNS lookup on every request, which is undesirable
Have users deploy server-side loadbalancers
Of course this can be fixed server-side by deploying another layer of load-balancing proxies (nginx, etc) in front of the otel collector.
This greatly complicates the pipeline setup though, as one might end up with 3 layers (http loadbalancing, stateless collector for sticky otlp loadbalancing, stateful collector for processing)
The text was updated successfully, but these errors were encountered: