Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporterhelper] User interface for controling queue measurement #9462

Open
Tracked by #8122
dmitryax opened this issue Feb 5, 2024 · 7 comments
Open
Tracked by #8122

[exporterhelper] User interface for controling queue measurement #9462

dmitryax opened this issue Feb 5, 2024 · 7 comments
Labels
area:exporter discussion-needed Community discussion needed release:required-for-ga Must be resolved before GA release

Comments

@dmitryax
Copy link
Member

dmitryax commented Feb 5, 2024

Following the discussion #8853 (comment)

As part of #8122, the batching capability is moved to the exporter side. The batch sizes are configured in terms of items by default. The current queue can be configured only in terms of requests. The different measures can be confusing to the end user. At least, we need to provide a way to configure the queue in terms of items that would be consistent with the batching.

This issue is intended to answer the following questions:

  1. How to define the user interface for the end user to configure different queue measurements.
  2. Do we want to migrate the default measure for the queue from requests to items to be consistent with the batching configuration?

Currently, the sending_queue config group in the exporter configuration has the only field to control the queue capacity called queue_size, which means the maximum number of requests the queue can hold. The queue prefix can potentially be dropped as redundant during the migration to the new interface.

The following measures can be supported by the new exporter queue additionally. The options 1 and 2 are already implemented, just not exposed to the user until this issue is resolved. Option 3 can be added in the future.

  1. Number of basic items: log records, spans, data points - this should be the default measure going forward to be consistent with config for batching
  2. Number of incoming requests - this is the current measure used by the queue_size field. We should keep it as an option.
  3. Size in MiB - will be nice to have in the future. We have similar options in the memory limiter: limit_mib and spike_limit_mib.

We have the following options how the different queue measurements can be configured:

A. Provide different fields for each option and validate that only one is set. The following options can be added under sending_queue config group in the exporter configuration

  • max_size_items, max_size_requests and max_size_mib (vote with 😄 )
  • size_limit_items, size_limit_requests and size_limit_mib (vote with 🎉 )
  • items_limit, requests_limit and mib_limit (vote with 😕 )

B. Keep queue_size (or chose another name like size) and introduce another field for the queue_measure (or measure) enum which can be items, requests or MiB. (vote with ❤️ )

C. Keep queue_size (or chose another name like size) and support suffixes like 10requests, 10items, 10[B|KB|MB..] and keep the queue_size field. (vote with 🚀 )

Keep in mind that the batching interface has to be aligned with what we come up with. #8685 currently suggests the following options: min_size_items, max_size_items. So it's aligned with option A at the moment. Potentially, we can provide different measures for batching as well, but initially, the batching will be items-based only.

If we decide to migrate from requests to items by default in the sending_queue group, we need a migration plan. This cannot be done under the hood since it'll affect users. Option A requires deprecating queue_size and moving to another config option. So we can align that change with this migration and mention the default measurement change in the field deprecation warning. However, options 2 and 3 don't require a config option change if we keep using queue_size. So it's better to change it to something else like size` instead to provide the same migration experience.

Emojis for voting:
👍 Yes, migrate the default queue measure to items
👎 Keep the requests as measure by default, even if it's not consistent with the batcher.
👀 Hold off on the migration until we have a byte-count measure, which eventually can be a new default

It's important to mention that any decision we make for this issue won't take place immediately. It'll be applied only after #8122 is ready and stable by components moving to the new exporter helper.

@open-telemetry/collector-approvers please share your thoughts.

@dmitryax dmitryax added discussion-needed Community discussion needed release:required-for-ga Must be resolved before GA release area:exporter labels Feb 5, 2024
@dmitryax dmitryax changed the title User interface for controling queue measurement [exporterhelper] User interface for controling queue measurement Feb 5, 2024
@dmitryax
Copy link
Member Author

dmitryax commented Feb 7, 2024

I added emojis for voting but feel free to suggest other options as well

@OverOrion
Copy link

OverOrion commented Feb 7, 2024

I think that having the queue size in MiB would be awesome and makes it easier to understand (especially for first time users).
Also the operator could utilize it to automatically create Persistent Volumes based on these values.

@jmacd
Copy link
Contributor

jmacd commented Feb 7, 2024

No emoji option was given, I don't like either option:

👍 Yes, migrate the default queue measure to items
👎 Keep the requests as measure by default, even if it's not consistent with the batcher.

The third option not listed (must be 👀), is to wait for option 3 and migrate the default queue measure to size_limit_mib. This behavior is the most consistent with the batch processor, since both items and requests can vary in size. For the queue to have a fixed-size memory footprint, which is my preference, we should migrate to a byte-count measure.

@dmitryax
Copy link
Member Author

dmitryax commented Feb 7, 2024

Thanks @jmacd, added

@quentinmit
Copy link

quentinmit commented Apr 2, 2024

I think a unit smaller than MiB should be used for batch sizes; most common export APIs have limits near the 1 MB mark. For instance, CloudWatch's max batch size is 1 MiB. Google Cloud Logging's max batch size is 10 MiB. Promtail limits requests to 4 MiB. Splunk has a max batch size of 2 MiB. DataDog's limit is 3.2 MiB.

I would suggest using KiB or even just bytes.

This is also something that should probably be controllable by the individual exporter; I would expect the max batch size to be limited to an exporter-controlled max byte size, since there's no point in trying to send 20 MiB batches to CloudWatch.

@mx-psi mx-psi moved this to Todo in Collector: v1 Apr 18, 2024
@jpkrohling
Copy link
Member

Agree with @jmacd and @quentinmit, size_limit_mib (or rather, size_limit_bytes) would be preferable from a UX perspective. Users usually have no idea how big their batches or spans are, and are more comfortable making decisions when there's a clear correlation with cost: "I need X GiBs for this queue, which means I need a machine that costs me $y."

@matthewmodestino
Copy link

matthewmodestino commented Aug 7, 2024

I think a unit smaller than MiB should be used for batch sizes;

Agree with this 100%. smaller batches generally scale better as well. Splunk cloud we try and encourage 256KB as a starting point, well below the 1MB soft limit we recommend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:exporter discussion-needed Community discussion needed release:required-for-ga Must be resolved before GA release
Projects
Status: Todo
Development

No branches or pull requests

7 participants