-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Libbeat][Filebeat][outputs] Add Codec and exporter to serialize and export batch of beat events in OTLP
format
#32549
Comments
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
@andrewkroh , can you help me in getting this reviewed? So that I can open a PR. |
Hi @shivanshu1333, we are thinking about this internally. I'm not sure when we'll have a decision. For context, adding new outputs has a long tail of work in the form of ongoing user deployment support after they are introduced. Writing the code is only part of work here. We need to be careful about when we introduce new outputs to ensure we can properly support and test them in the various environments and configurations they will be deployed in. |
Hi @cmacknz, thanks for responding. Do we have any decision yet? The changes required for outputs are ready and tested locally; the changes include
Can you please elaborate on
To resolve this
How about letting the changes in? Let's not release it until it's thoroughly tested and mark this as an alpha feature whenever we're ready to release; eventually, we can mark it as a beta feature. This way, we can start progressing in the right direction, the changes will be well tested, and we'll make Filebeat compatible with OpenTelemetry Collector, increasing adoption of Filebeat among users of OpenTelemetry Collector. Kindly let me know your thoughts. |
We are still thinking about when and how best to support OpenTelemetry, I don't think there will be a decision with respect to Beats soon. Besides the Beats' maintainers making the time to support and test an OpenTelemetry output to the extent we would like, we would ideally like to wait for the OpenTelemetry logging protocol to stabilize (currently experimental). There is also an active proposal to adopt the Elastic Common Schema we use in our logs into OpenTelemetry (see open-telemetry/oteps#197) and we would ideally wait for a decision on that proposal as well. |
There are multiple configuration options in filebeat which are also indicated as experimental (see full config for v8.4. Having this considered, would it not be reasonable to add in this feature and clearly state that it is experimental? I think it's safe to assume Opentelemetry is here to stay especially as it's been a CNCF project since May 2019. It's starting to gain some popularity in the observability realm for many reasons. Having said that, I realize that Elastic does have ECS to adress consistency with log schemas but that's not an option that everyone wants to necessarily implement as it's solution specific. |
@joshdover this is the original tracking issue with the idea to introduce OTLP codec and exporter, now we have modified our code to do serialization on the fly in the exporter itself. And we've a separate package for metrics. This is in continuation with your discussion with Lalit. |
Regarding the following comment in the linked issue which asks if a special receiver should be built in the opentelemetry-collector or if filebeat should implement an OTLP output option, it would be beneficial for filebeat to have the capability to encode in the OTLP format. I'm wondering if filebeat could also add a new processor (e.g.: encode_otlp) which could allow the log events collected to be transformed to the OTLP log data model defined here. This could also potentially simplify applying a specific schema directly in filebeat for some of the sub-fields. (e.g.: standardizing key names in the "Attributes" section). Considering OpenTelemetry has adopted ECS for schema standardization, I feel this approach seems it would make sense. As an example, someone could encode messages in the OTLP model and then publish them to Kakfa. They could then consume them from the topic, process them with any tool and then index them into Elasticsearch. |
This is a similar setup we use and we are considering dropping beats and logstash as we want to standardize on OTLP. |
Hey everyone, I'd like to add a small update from the Elastic side here. We've been discussing how best to add OpenTelemetry support to our ingestion components and while there will be many different paths, we do think that an OTLP output directly in Beats is a good move for the ecosystem. We're open to pull requests for this and we've already been discussing with some contributors about donating their own private implementations (eg @shivanshu1333 and Lalit). For now, we'd like to focus on support for logs as it's a simpler translation than metrics, which require more metadata, such as the metric type (counter, gauge, etc.). We prefer an OTLP output in Beats over building a lumberjack receiver in the OpenTelemetry collector, as there are several deficiencies in the lumberjack protocol (such as no mechanism for backpressure) that we don't believe set up Beats users for success with OTel. We'd rather put effort towards improving the OTel data model (through the ECS/SemConv merger) and protocols directly (such as open-telemetry/opentelemetry-proto#470). We also think enabling the large installation base of Beats to start sending data to OTel systems natively, without requiring OTel Collector in the middle, is a win for interoperability and making OpenTelemetry more widely available.
@hartfordfive What's the use case for breaking this out as a separate processor instead of embedding this logic into an As the ECS+SemConv merger makes progress and stabilizes, we'll likely have several options for translating from one schema to the new merged schema. For now, I lean towards keeping things simple until more of these details are figured out, and keep any translation in the output itself. Definitely open to more discussion on this point. |
Any update on this? Is there some existing code available for testing? |
There is nothing in the Beats repository to try yet but we are actively working on integrating with the OTel ecosystem on a few different fronts. In no specific order:
We are still very early in this process and the initial work is focused on making changes in the upstream OTel repositories to make them easier to use with the Elastic stack. There will be more updates in the coming months. |
@cmacknz is there upstream issues in OpenTelemetry that you can add as reference? |
There isn't a single tracking issue yet, we are still early in the process and haven't finalized the plan for publishing+accepting OTLP data given the data model is different from what exists in the Elastic stack today. |
Any update on this? |
There is still no official tracking issue yet as we are still in what we'd consider the prototyping phase, though we are getting close to the end of it. Since this issue was created a few things have happened:
For now, the Beat inputs running as receivers (Beats receivers) are only being tested with the elasticsearch exporter and output documents in ECS format that look identical to what you'd get out of Beats. This is so they can be used with existing modules and integration assets without breaking everything because OTLP is a totally different data schema. The follow up work once we have this working with the ECS schema would be to let the Beats receivers output data in OTLP format, providing what this issue is asking for. However it will happen in a different way than originally proposed here, the Beat inputs will run inside the Elastic OTel collector distribution instead of putting an OTLP output in Beats. There will be some clearer communication once we have the whole end to end story here worked out and all of the prototyping has wrapped up. |
Thanks for this great and detailed answer! |
Every Filebeat input will be runnable as a receiver in the collector. This would includes the Filebeat lumberjack input implemented at https://github.com/elastic/beats/tree/main/x-pack/filebeat/input/lumberjack (though it looks like we don't document it). I would think if you can run Beat inputs directly in the EDOT collector you probably wouldn't need to use a lumberjack receiver as a way of gluing things together anymore, but it would be possible. We are planning to add a Lumberjack/Logstash exporter to our EDOT collector distribution as part of this overall project, but I can't give a specific timeline for when that is going to happen. |
Describe the enhancement:
To make Filebeat compatible with Opentelemetry, i.e. exporting beat events in
OTLP
format so that opentelemtry collector can ingest Filebeat beat events, we need a codec to serialize the batch of beat events into OTLP Log Data Model.To do that, we need to add:
Codec
which will be converting a batch of beat events into an OTLP log data model, the codec should be added here libbeat/outputs/codec in addition to existingformat
andJSON
codecs.Exporter
which will be supporting OTLP over HTTP/gRPC protocol, this exporter needs to be added here libbeat/outputs alongsidekafka
,logstash
, andredis
exporters and can be called asotlp
Implementation details:
The above diagram explains the translation of a batch of beat events into OTLP Log Data Model.
Describe a specific use case for the enhancement or feature:
Filebeat is a great tool, and with the increased adoption of Opentelemetry it's really nice to make Filebeat compatible with Opentelemetry.
The text was updated successfully, but these errors were encountered: