-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread safety of take!, put! #16
Comments
I can reproduce the issue with the following code: https://gist.github.com/mpenet/a8266ac2d6e081701e74112bd20f3a5a You start both consumers producers, then after a while kill the producers and you will sometimes end up with a constant logs look this this I am not able to deadlock it just yet, but I suspect once I hit the same issue enough times per run all consumer threads are blocked. Might just be a matter of playing with parameters. edit: if you start with the following: (def c (start-consumers! queues 30))
(def p (start-producers! queues 50)) in a couple of seconds you'd get these kind of logs:
|
It seems if I try catch (both Exception and AssertionError) around dq/complete! and (deref task) I can get consistent results without locking, avoiding the issue from the previous comment and weird counters. But there's definitely something fishy going on with the handling of the internal slab buffers. what's caught :
or/and
|
Edited for clarity |
I can confirm this fixed our issue. It's been 2 days an hundreds of millions task later we didn't hit the issue again. Trying to summon the @ztellman in case he has some insight on this 📡 |
Hi,
Should we consider put! and take! threadsafe?
I have one thread doing puts in bursts at regular interval, and N workers on their own respective threads consuming the queue. Sometimes I get into a situation where I have 1 in progress task and all other workers waiting on tasks while the queue keeps growing (essentially deadlocked at this point). In the code I take!, deref and mark completed tasks immediately (did that to exclude a bug on complete!).
Should I serialize the takes or is this a potential bug? I ll try to come up with a repro at work tomorrow.
I also get the occasional negative in-progress while testing:
09:22:14.938 [clojure-agent-send-off-pool-0] INFO sink.testdq - {foo {:num-slabs 1, :num-active-slabs 1, :enqueued 10947, :retried 0, :completed 10947, :in-progress -1}}
Thanks for the lib by the way, it s super useful.
The text was updated successfully, but these errors were encountered: