-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processed request response stuck when workers are not available #247
Comments
Interesting. In the 2 requests vs. 3 requests case, can you please print for each request:
This way we can get a better understanding of how requests are distributed |
thanks, here it is. I will also update the src in the first post to you can eventually see where I put the logs
SERVER:
|
Interesting. It seems like the event loop of the server with the two requests prioritizes work on the new connection to closing the old one. Not sure why that is and if it's possible to change that behavior without changing the event loop implementation. I can have a look in the next couple of days. Is your code a reduced reproducing example from a real world problem or are we looking at a purely artificial test here? I wonder what's the impact of this behavior on larger scale and with non-trivial WSGI applications. |
Actually I noticed it in production and I tried to reduce it to as small example as possible. I was debugging why some requests are slower during heavy load and why there are bigger differences between what I see on the application monitoring vs proxy monitoring. Then I noticed that even though the request was handled, the response on the proxy level was sent at the same time as the response of the following request We don't log the worker id there, but I feel confident that it's the same issue |
Deleted some wrong insights that weren't reproducible. Behavior seems to be flaky. |
OK, it seems like the insight with If your application sets |
OK I think I understood the problem now. If you don't set Follow-up questions:
|
Thank you Jonas for the thorough investigation. I need to check the content-lenght. It's a customer facing api and also it's behind few proxies so it can be even getting lost somewhere. But I would never expect that this might be needed. Regarding the other two questions. I don't think that I am able to answer any of them. This isn't really my area. But the prioritization of outstanding chunks sounds like something that should happen. In that case the merge optimization wouldn't be necessary and even if that was done it's hard to say if it would have a big impact. |
I tried to set write event priority to max while setting read event priority to min, and it still shows the same behavior. |
I was also testing the Content-Length header but it still doesn't work for me I tried these cases requests.get(f"http://localhost:8080/request_id={request_id}", headers={"Content-Length": "0"})
requests.post(f"http://localhost:8080/request_id={request_id}", data="data")
requests.post(f"http://localhost:8080/request_id={request_id}", data="data", headers={"Content-Length": "4"}) the post has content-length automatically, but I even tried setting it manually with no luck I validated on the server side that the content-length is indeed received |
No the content length is missing in the response headers. Your WSGI application has to set it. |
ah 🤦 well that makes much more sense now. I was just confused. anyway it seems like it helps just partly. When it's added to the response it sometimes passes, but at least on my laptop it's around 1 pass vs 5 fails and that's both when I use latest bjoern with manually added content-length to the response or your pr that does it automatically I was checking the response headers and there is just this: {'Content-Length': '35', 'Connection': 'Keep-Alive'} it works reliably for you with the content-length? |
With the example application you've provided, or with a real application? If your real application sends too much data in the response then you'll end up with the response being sent in multiple pieces and we're back to square one. |
No just the example application. I will check the behavior of the real app later today |
so I checked our real api and the content length is usually around 100-1000, depending on the endpoint. But anyway, the content-length header is present in the response. We are using connexion + flaskapp. I will try to replicate the issue with some small app. To see if it behaves the same |
I played around with it again and I changed the app to connexion + flask and also just flask. In both cases the content-length is always in the response but the behavior stayed the same. Sometimes it responds fast but most of the time it's just stuck there for 2 seconds |
Unfortunately I can't reproduce this with the reproducer app. import random
from concurrent.futures import ThreadPoolExecutor
from datetime import datetime
import requests
for _ in range(10):
durations = []
def run_io_tasks_in_parallel(tasks):
with ThreadPoolExecutor() as executor:
running_tasks = [executor.submit(task) for task in tasks]
for running_task in running_tasks:
running_task.result()
def task():
now = datetime.now()
request_id = random.randint(1, 100)
print(f"request_id: {request_id} - start: {now}")
requests.get(f"http://localhost:8080/request_id={request_id}")
request_end = datetime.now()
print(f"request_id: {request_id} - stop: {request_end}")
durations.append((request_end - now).total_seconds())
return f"Request duration {(request_end - now)}, ID: {request_id}"
now = datetime.now()
run_io_tasks_in_parallel([
lambda: print(f'IO task 1: {task()}'),
lambda: print(f'IO task 2: {task()}'),
lambda: print(f'IO task 3: {task()}'),
# lambda: print(f'IO task 4: {task()}'),
# lambda: print(f'IO task 5: {task()}'),
# lambda: print(f'IO task 6: {task()}'),
])
print("Elapsed time:", datetime.now() - now)
print("Mean duration", sum(durations)/len(durations)) from time import sleep
import bjoern
import os, signal
from datetime import datetime
HOST = '0.0.0.0'
PORT = 8080
N_WORKERS = 2
worker_pids = []
def app(e, s):
print(f"worker id: {os.getpid()} - {e['PATH_INFO']} - start {datetime.now().strftime('%H:%M:%S.%f')}")
sleep(1)
print(f"worker id: {os.getpid()} - {e['PATH_INFO']} - stop {datetime.now().strftime('%H:%M:%S.%f')}")
res = b'%i: %s' % (
os.getpid(),
str(datetime.now()).encode('utf-8')
)
s('200 OK', [('Content-Length', str(len(res)))])
return [res]
bjoern.listen(app, HOST, PORT)
for _ in range(N_WORKERS):
pid = os.fork()
if pid > 0: # parent
worker_pids.append(pid)
elif pid == 0: # worker
try:
bjoern.run()
except KeyboardInterrupt:
pass
exit()
try:
for _ in range(N_WORKERS):
os.wait()
except KeyboardInterrupt:
for pid in worker_pids:
os.kill(pid, signal.SIGINT)
|
Interesting. That made me thinking that maybe I have some specific local setup, so I tried running the server in the debian docker image, because I am using opensuse. I also tried multiple python versions (3.6, 3.7, 3.8, 3.9) but the results are pretty random I did something like this: docker run --rm -p 8080:8080 -it -v $(pwd):/usr/src/project python:3.9 bash
apt-get update
apt-get install libev-dev
pip install bjoern
cd /usr/src/project
python main.py and then I just run the test against the docker image.
I even tried to run both server and test inside the docker with no difference. btw when I am running the tests in the loop like you did, it seems like it has the correct response time more often 🤷 but it can be just my feeling anyway it seems like it will stay a mystery 😁 |
don't stop do not surrender, let's try to solve the mystery maybe a library or something fedora vs ubuntu etc or way is compiled? try to use another instance etc we need to figure it out why this strange behavior |
I was able to reproduce in a Linux Docker container. So maybe it's a Linux thing. I can spend some time debugging this soon. |
Hello,
I started bjoern server with 2 workers and sent 6 requests in parallel - scripts below
the server just sleeps for 1 second and then responds
What I would expect to happen is that the request would finish twice every second and their duration would be like this
1,1, 2, 2, 3, 3
But the actual duration is 2, 3, 3 ,3, 3, 3
why is this happening? am I setting up the workers incorrectly?
the time before and after the sleep method is exactly what I would expect: the following output is just sorted by worker
If I send just 2 requests then both are returned in 1 second
if I send three then duration of 2 of them is 2 seconds already 👀
but the request doesn't end after the return? It seems like the worker is already busy with the new request before the response is sent
bjoern 3.2.2
The text was updated successfully, but these errors were encountered: