How to determine what to set concurrency level while using Key ordering #728

amber13574 · 2024-03-04T17:12:50Z

amber13574
Mar 4, 2024

Hello Folks,

I hope this message finds you well. I have a few queries regarding the maxConcurrency property. Is it accurate to say that this property represents the size of the thread pool responsible for executing messages in parallel? If this is the case, could you clarify whether it refers to a fixed or dynamic thread pool? I came across examples where maxConcurrency was set to 1000, and if this corresponds to the thread count of a fixed pool, it raises concerns about potential system crashes.

As we are planning to implement parallel-consumer functionality in our production environment, we are keen to determine the optimal level of concurrency. Given that we are utilizing KEY ordering and there is no set limit on keys (as they are always newly generated for certain events), we are exploring ways to gauge the required concurrency. While we have the option to run scenarios in a Pre-prod environment, I'm curious if there are additional methods to assess this.

Additionally, I would appreciate insights into important considerations for testing and deployment. We are already verifying scenarios related to Kafka and server failures, but any specific recommendations for production would be valuable.

Thank you for providing clarity on these matters. Your assistance is much appreciated.

Answered by rkolesnev

Mar 7, 2024

Hi @amber13574,

There is no real inherent problem with having 1000 sized thread pool - default stack size for threads is 1MB (depending on OS / java version though) - so 1000 threads would have 1GB Ram reserved for their stacks. If the processing is purely IO and is reasonably slow - lets say 5 seconds - then 1000 threads - or even more might be necessary.
1000 threads with 5 second IO wait allows to achieve roughly 200 messages per second throughput (roughly as there is some overhead etc). So if desired throughput is for example 5,000 messages per second - with such slow processing - 25 instances of Parallel Consumer with 1000 thread pools would be required - and probably even more to al…

View full answer

rkolesnev · 2024-03-07T13:18:23Z

rkolesnev
Mar 7, 2024

Hi @amber13574,

There is no real inherent problem with having 1000 sized thread pool - default stack size for threads is 1MB (depending on OS / java version though) - so 1000 threads would have 1GB Ram reserved for their stacks. If the processing is purely IO and is reasonably slow - lets say 5 seconds - then 1000 threads - or even more might be necessary.
1000 threads with 5 second IO wait allows to achieve roughly 200 messages per second throughput (roughly as there is some overhead etc). So if desired throughput is for example 5,000 messages per second - with such slow processing - 25 instances of Parallel Consumer with 1000 thread pools would be required - and probably even more to allow for spikes, rolling restarts etc.
Of course it maybe better to use reactor / vertx for those scenarios, but that as well depends on overall application stack.

So i would approach it from the other end - how long does processing / user function take per message in seconds (timePerMessageSeconds)?
What is the desired / required throughput - account for possible spikes / uneven data flow (requiredMsgPerSecond)?

That gives you the number of processing threads to achieve the required throughput across all instances of ParallelConsumer in same group: threadsOverall = requiredMsgPerSecond * timePerMessageSeconds.

How many partitions there are total on topic - that is the maximum number of Parallel Consumer instances that you can have (numPartitions) ?

That gives you the minimum number of processing threads needed per Parallel Consumer instance if you run 1 instance of Parallel Consumer per partition as: threadsOverall / numPartition.
And from there i would see if its possible / makes sense to reduce number of instances of Parallel Consumer while still having reasonable number of threads per instance.

For example lets say message processing takes 200ms per message. Desired / required throughput is 2000 messages per second - adding 25% for overhead for spikes, rolling restarts etc - 2500 messages per second.
So threadsOverall = 2500 * 0.2 seconds = 500 threads overall.
With topic with 50 partitions - we can have 50 Parallel Consumer instances with 10 threads each, but it probably makes sense to do 5 instances with 100 threads each or 10 instances with 50 threads each.

One thing to note in terms of memory use - Parallel Consumer uses internal queue to buffer messages for processing and sharding - that queue is calculated as maxConcurrency * batchSize * loadFactor where load factor scales by default from 2 to 100.
See this question for discussion on how queue size is configurable.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to determine what to set concurrency level while using Key ordering #728

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to determine what to set concurrency level while using Key ordering #728

amber13574 Mar 4, 2024

Replies: 1 comment

rkolesnev Mar 7, 2024

amber13574
Mar 4, 2024

rkolesnev
Mar 7, 2024