Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new exporter helper with batching option #8122

Open
3 of 7 tasks
dmitryax opened this issue Jul 22, 2023 · 9 comments
Open
3 of 7 tasks

Introduce new exporter helper with batching option #8122

dmitryax opened this issue Jul 22, 2023 · 9 comments
Assignees
Labels
area:exporter release:required-for-ga Must be resolved before GA release

Comments

@dmitryax
Copy link
Member

dmitryax commented Jul 22, 2023

This is a tracking issue for introducing the new exporter helper and migrating the existing exporters to use it.

The primary reason for introducing the new exporter helper is to move the batching to the exporter side and deprecate the batch processor as part of making the delivery pipeline reliable, as reported in #7460. More details about moving batching to the exporter helper can be found in #4646.

Shifting batching to the exporter side grants us the opportunity to leverage the exporter's data model instead of relying on OTLP. As a result, we can achieve the following benefits:

  • Ability to place failed requests back into the queue without the need for converting them back to OTLP format.
  • Enhanced control in counting queue and batch sizes using basic items (like spans, data points, or log records for OTLP) tailored to different exporters, resolving the concern raised in issue Processor: Splitting summary metrics into timeseries. opentelemetry-collector-contrib#7134.
  • Optional counting of queue and batch sizes in bytes of serialized data.

Adapting to the new exporter helper requires exporter developers to implement at least two functions:

  1. Converter: to translate pdata Metrics/Traces/Logs into a user-defined Request.
  2. Request sender: to send the user-defined Request

Design document: https://docs.google.com/document/d/1uxnn5rMHhCBLP1s8K0Pg_1mAs4gCeny8OWaYvWcuibs

Essential sub-issues to mark this as complete:

Additional sub-issues to get feature parity with the batch processor:

@dmitryax dmitryax self-assigned this Jul 22, 2023
dmitryax added a commit that referenced this issue Aug 21, 2023
Introduce a new exporter helper that operates over client-provided
requests instead of pdata. The helper user now has to provide
`Converter` - an interface with a function implementing translation of
pdata Metrics/Traces/Logs into a user-defined `Request`. `Request` is an
interface with only one required function `Export`.

It opens a door for moving batching to the exporter, where batches will
be built from client data format, instead of pdata. The batches can be
properly sized by custom request size, which can be different from OTLP.
The same custom request sizing will be applied to the sending queue. It
will also improve the performance of the sending queue retries for
non-OTLP exporters, they don't need to translate pdata on every retry.

This is an implementation alternative to
#7874 as
suggested in
#7874 (comment)

Tracking Issue:
#8122

---------

Co-authored-by: Alex Boten <[email protected]>
dmitryax added a commit that referenced this issue Aug 21, 2023
This change adds collector's internal metrics and tracing to the new
request-based exporter helpers. Only those metrics and traces are added
that are already adopted by the existing exporter helpers for backward
compatibility. The new exporter helpers can and should expose more
metrics in the future, e.g. for tracking converter errors.

Tracking Issue:
#8122
@bogdandrutu
Copy link
Member

Some feedback about the interface:

  1. [Traces|Metrics|Logs]Converter can be replaced with a simple func instead of having an interface. If you need in the future to extend capabilities of that, they can be options. Comparing interfaces with nil is problematic sometimes, see https://mangatmodi.medium.com/go-check-nil-interface-the-right-way-d142776edef1, referring to https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/metrics.go#L142
  2. Can you make the old NewMetricsExporter to call into NewMetricsRequestExporter to remove duplicate code and to actually start testing the new path more intensively? Also removes lots of duplicate tests.

@dmitryax
Copy link
Member Author

Thanks for the feedback!

Can you make the old NewMetricsExporter to call into NewMetricsRequestExporter to remove duplicate code and to actually start testing the new path more intensively? Also removes lots of duplicate tests.

That's what I wanted to do after #8248 is merged 👍

dmitryax added a commit to dmitryax/opentelemetry-collector that referenced this issue Oct 26, 2023
dmitryax added a commit to dmitryax/opentelemetry-collector that referenced this issue Oct 26, 2023
dmitryax added a commit that referenced this issue Nov 17, 2023
…on (#8764)

As proposed in
#8122 (comment)

If we need backward conversion, we will use an optional argument to the
helper function instead of an optional interface.
@atoulme atoulme added the release:required-for-ga Must be resolved before GA release label Dec 19, 2023
dmitryax added a commit that referenced this issue Dec 22, 2023
#9164)

Introduce an option to limit the queue size by the number of items
instead of number of requests. This is preliminary step for having the
exporter helper v2 with a batcher sender placed after the queue sender.
Otherwise, it'll be hard for the users to estimate the queue size based
on the number of requests without batch processor in front of it.

This change doesn't effect the existing functionality and the items
based queue limiting cannot be utilized yet.

Updates
#8122

Alternative to
#9147
bogdandrutu pushed a commit that referenced this issue Feb 3, 2024
Introduce a way to enable queue in the new exporter helper with a
developer interface suggested in
#8248 (comment).

The new configuration interface for the end users provides a new
`queue_size_items` option to limit the queue by a number of spans, log
records, or metric data points. The previous way to limit the queue by
number of requests is preserved under the same field, `queue_size,`
which will later be deprecated through a longer transition process.

Tracking issue:
#8122
dmitryax pushed a commit that referenced this issue Nov 13, 2024
…tdown (#11666)

#### Description

This PR changes exporter queue batcher to flush the current batch on
shutdown.

#### Link to tracking issue

#10368
#8122
sfc-gh-sili added a commit to sfc-gh-sili/opentelemetry-collector that referenced this issue Nov 13, 2024
…tdown (open-telemetry#11666)

This PR changes exporter queue batcher to flush the current batch on
shutdown.

open-telemetry#10368
open-telemetry#8122
sfc-gh-sili added a commit to sfc-gh-sili/opentelemetry-collector that referenced this issue Nov 13, 2024
…tdown (open-telemetry#11666)

This PR changes exporter queue batcher to flush the current batch on
shutdown.

open-telemetry#10368
open-telemetry#8122
djaglowski pushed a commit to djaglowski/opentelemetry-collector that referenced this issue Nov 21, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR adds a public function `GetNextItem` to queue (both persistent
queue and bounded memory queue)

Why this change?
Instead of blocking until consumption of the item is done, we would like
to separate the API for reading and committing consumption.

Before:
`Consume(consumeFunc)`

After:
`idx, item = Read()`
`OnProcessingFinished(idx)`

<!-- Issue number if applicable -->
#### Link to tracking issue

open-telemetry#8122
open-telemetry#10368

<!--Describe what testing was performed and which tests were added.-->
#### Testing

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->
djaglowski pushed a commit to djaglowski/opentelemetry-collector that referenced this issue Nov 21, 2024
… that operates on batch sender (open-telemetry#11448)

#### Description

As part of the effort to solve
open-telemetry#10368,
we no longer guarantee to initialize a `batchSender` when `batcher` is
enabled. Therefore, we would like to remove the interface to set
`mergeFunc` and `mergeSplitFunc` as a callback that operates on
`batchSender`. Instead, users should use the alternative
`WithBatchFuncs` that is a callback that operates `baseExporter`.

Context:
open-telemetry#11414

#### Link to tracking issue

open-telemetry#8122
open-telemetry#10368

---------

Co-authored-by: Bogdan Drutu <[email protected]>
djaglowski pushed a commit to djaglowski/opentelemetry-collector that referenced this issue Nov 21, 2024
…n-telemetry#11532)

#### Description

This PR is a bare minimum implementation of a component called queue
batcher. On completion, this component will replace `consumers` in
`queue_sender`, and thus moving queue-batch from a pulling model instead
of pushing model.

Limitations of the current code
* This implements only the case where batching is disabled, which means
no merge of splitting of requests + no timeout flushing.
* This implementation does not enforce an upper bound on concurrency

All these code paths are marked as panic currently, and they will be
replaced with actual implementation in coming PRs. This PR is split from
open-telemetry#11507 for
easier review.

Design doc:

https://docs.google.com/document/d/1y5jt7bQ6HWt04MntF8CjUwMBBeNiJs2gV4uUZfJjAsE/edit?usp=sharing


#### Link to tracking issue

open-telemetry#8122
open-telemetry#10368
djaglowski pushed a commit to djaglowski/opentelemetry-collector that referenced this issue Nov 21, 2024
djaglowski pushed a commit to djaglowski/opentelemetry-collector that referenced this issue Nov 21, 2024
…d exports (open-telemetry#11546)

#### Description

This PR follows
open-telemetry#11540 and
implements support for item-count based batching for queue batcher.

Limitation: This PR supports merging request but not splitting request.
In other words, it support specifying a minimum request size but not a
maximum request size.

Design doc:

https://docs.google.com/document/d/1y5jt7bQ6HWt04MntF8CjUwMBBeNiJs2gV4uUZfJjAsE/edit?usp=sharing

#### Link to tracking issue

open-telemetry#8122
open-telemetry#10368
djaglowski pushed a commit to djaglowski/opentelemetry-collector that referenced this issue Nov 21, 2024
…batcher (open-telemetry#11580)

#### Description

This PR follows
open-telemetry#11546 and
add support for splitting (i.e. support setting a maximum request size)

Design doc:

https://docs.google.com/document/d/1y5jt7bQ6HWt04MntF8CjUwMBBeNiJs2gV4uUZfJjAsE/edit?usp=sharing

#### Link to tracking issue

open-telemetry#8122
open-telemetry#10368
djaglowski pushed a commit to djaglowski/opentelemetry-collector that referenced this issue Nov 21, 2024
…equest.export() (open-telemetry#11636)

#### Description

This PR changes queue batcher to use `exportFunc` instead of
`request.export()`. This makes testing easier and avoid passing
unnecessary detail to the exporter batcher.

#### Link to tracking issue

open-telemetry#8122
open-telemetry#10368
djaglowski pushed a commit to djaglowski/opentelemetry-collector that referenced this issue Nov 21, 2024
…tdown (open-telemetry#11666)

#### Description

This PR changes exporter queue batcher to flush the current batch on
shutdown.

#### Link to tracking issue

open-telemetry#10368
open-telemetry#8122
bogdandrutu pushed a commit that referenced this issue Nov 22, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR proceeds
#11637. It
* Introduces a noop feature gate that will be used for queue batcher.
* Updates exporter tests to run with both the feature gate on and off.

<!-- Issue number if applicable -->
#### Link to tracking issue
#10368
#8122


<!--Describe what testing was performed and which tests were added.-->
#### Testing

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->
dmitryax added a commit that referenced this issue Dec 2, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR solves
#10368.

Previously we use a pushing model between the queue and the batch,
resulting the batch size to be constrained by the
`sending_queue.num_consumers`, because the batch cannot accumulate more
than `sending_queue.num_consumers` blocked goroutines provide.

This PR changes it to a pulling model. We read from the queue until
threshold is met or timeout, then allocate a worker to asynchronously
send out the request.

<!-- Issue number if applicable -->
#### Link to tracking issue
Fixes
#10368
#8122

---------

Co-authored-by: Dmitrii Anoshin <[email protected]>
HongChenTW pushed a commit to HongChenTW/opentelemetry-collector that referenced this issue Dec 19, 2024
… to a class function instead of a callback (open-telemetry#11338)

#### Description

Why this change?
Each request from the queue contains multiple items, and those items
could be merge-split into multiple batches when they are sent out (see
open-telemetry#8122
for more about exporter batcher). We would like to book-keep those
cases, and only call `onProcessingFinished` when all such batches has
gone out. In this PR, `onProcessingFinished` is changed from a callback
to a method function because it is easier to book keep index instead of
functions.

#### Link to tracking issue
open-telemetry#8122
open-telemetry#10368

#### Testing
`exporter/internal/queue/persistent_queue_test.go`

#### Documentation

This is an internal change invisible to the users.

---------

Co-authored-by: Dmitrii Anoshin <[email protected]>
HongChenTW pushed a commit to HongChenTW/opentelemetry-collector that referenced this issue Dec 19, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR adds a public function `GetNextItem` to queue (both persistent
queue and bounded memory queue)

Why this change?
Instead of blocking until consumption of the item is done, we would like
to separate the API for reading and committing consumption.

Before:
`Consume(consumeFunc)`

After:
`idx, item = Read()`
`OnProcessingFinished(idx)`

<!-- Issue number if applicable -->
#### Link to tracking issue

open-telemetry#8122
open-telemetry#10368

<!--Describe what testing was performed and which tests were added.-->
#### Testing

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->
HongChenTW pushed a commit to HongChenTW/opentelemetry-collector that referenced this issue Dec 19, 2024
… that operates on batch sender (open-telemetry#11448)

#### Description

As part of the effort to solve
open-telemetry#10368,
we no longer guarantee to initialize a `batchSender` when `batcher` is
enabled. Therefore, we would like to remove the interface to set
`mergeFunc` and `mergeSplitFunc` as a callback that operates on
`batchSender`. Instead, users should use the alternative
`WithBatchFuncs` that is a callback that operates `baseExporter`.

Context:
open-telemetry#11414

#### Link to tracking issue

open-telemetry#8122
open-telemetry#10368

---------

Co-authored-by: Bogdan Drutu <[email protected]>
HongChenTW pushed a commit to HongChenTW/opentelemetry-collector that referenced this issue Dec 19, 2024
…n-telemetry#11532)

#### Description

This PR is a bare minimum implementation of a component called queue
batcher. On completion, this component will replace `consumers` in
`queue_sender`, and thus moving queue-batch from a pulling model instead
of pushing model.

Limitations of the current code
* This implements only the case where batching is disabled, which means
no merge of splitting of requests + no timeout flushing.
* This implementation does not enforce an upper bound on concurrency

All these code paths are marked as panic currently, and they will be
replaced with actual implementation in coming PRs. This PR is split from
open-telemetry#11507 for
easier review.

Design doc:

https://docs.google.com/document/d/1y5jt7bQ6HWt04MntF8CjUwMBBeNiJs2gV4uUZfJjAsE/edit?usp=sharing


#### Link to tracking issue

open-telemetry#8122
open-telemetry#10368
HongChenTW pushed a commit to HongChenTW/opentelemetry-collector that referenced this issue Dec 19, 2024
HongChenTW pushed a commit to HongChenTW/opentelemetry-collector that referenced this issue Dec 19, 2024
…d exports (open-telemetry#11546)

#### Description

This PR follows
open-telemetry#11540 and
implements support for item-count based batching for queue batcher.

Limitation: This PR supports merging request but not splitting request.
In other words, it support specifying a minimum request size but not a
maximum request size.

Design doc:

https://docs.google.com/document/d/1y5jt7bQ6HWt04MntF8CjUwMBBeNiJs2gV4uUZfJjAsE/edit?usp=sharing

#### Link to tracking issue

open-telemetry#8122
open-telemetry#10368
HongChenTW pushed a commit to HongChenTW/opentelemetry-collector that referenced this issue Dec 19, 2024
…batcher (open-telemetry#11580)

#### Description

This PR follows
open-telemetry#11546 and
add support for splitting (i.e. support setting a maximum request size)

Design doc:

https://docs.google.com/document/d/1y5jt7bQ6HWt04MntF8CjUwMBBeNiJs2gV4uUZfJjAsE/edit?usp=sharing

#### Link to tracking issue

open-telemetry#8122
open-telemetry#10368
HongChenTW pushed a commit to HongChenTW/opentelemetry-collector that referenced this issue Dec 19, 2024
…equest.export() (open-telemetry#11636)

#### Description

This PR changes queue batcher to use `exportFunc` instead of
`request.export()`. This makes testing easier and avoid passing
unnecessary detail to the exporter batcher.

#### Link to tracking issue

open-telemetry#8122
open-telemetry#10368
HongChenTW pushed a commit to HongChenTW/opentelemetry-collector that referenced this issue Dec 19, 2024
…tdown (open-telemetry#11666)

#### Description

This PR changes exporter queue batcher to flush the current batch on
shutdown.

#### Link to tracking issue

open-telemetry#10368
open-telemetry#8122
HongChenTW pushed a commit to HongChenTW/opentelemetry-collector that referenced this issue Dec 19, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR proceeds
open-telemetry#11637. It
* Introduces a noop feature gate that will be used for queue batcher.
* Updates exporter tests to run with both the feature gate on and off.

<!-- Issue number if applicable -->
#### Link to tracking issue
open-telemetry#10368
open-telemetry#8122


<!--Describe what testing was performed and which tests were added.-->
#### Testing

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->
HongChenTW pushed a commit to HongChenTW/opentelemetry-collector that referenced this issue Dec 19, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR solves
open-telemetry#10368.

Previously we use a pushing model between the queue and the batch,
resulting the batch size to be constrained by the
`sending_queue.num_consumers`, because the batch cannot accumulate more
than `sending_queue.num_consumers` blocked goroutines provide.

This PR changes it to a pulling model. We read from the queue until
threshold is met or timeout, then allocate a worker to asynchronously
send out the request.

<!-- Issue number if applicable -->
#### Link to tracking issue
Fixes
open-telemetry#10368
open-telemetry#8122

---------

Co-authored-by: Dmitrii Anoshin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:exporter release:required-for-ga Must be resolved before GA release
Projects
Status: In Progress
Development

No branches or pull requests

8 participants