-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling "RuntimeError: can't start a new thread" error at production. #2873
Comments
Please include full stack trace and if possible a minimal reproducible example. There's not enough context to help now |
This comment was marked as duplicate.
This comment was marked as duplicate.
Thanks for the response. This error is generated at peak usage only. Above, I have provided an error trace. |
I added the error trace and my code that generates the runtime error. Let me know if you need more information. |
Based on the stack trace the issue isn't related at all to langgraph threads (Langgraph threads are just unique identifiers for a conversation and have nothing to do with hardware threads). The celery code is probably not set up properly for async workloads. I haven't used celery in an async setting in the past but have you looked into guidelines for doing so? |
@eyurtsev I am using celery in async setting for a year now. I used to utilize langchain and a simple for-loop for text generation. I had never had any threading issue back then. However, I am utilizing parallel node execution functionality of the langgraph instead of simple "for-loop" right now. I am under the impression that langgraph spawns a new thread for each node execution. Each parallelly executed node is a separate thread and gets closed when the node is finished. Honestly, I am not sure if I set celery in the right way for async task execution. I am doing more research on this. I believe that high celery task concurrency (30-60 celery task concurrency) + langgraph threads are combined to quickly blow away my system resources. Is my assumption of langgraph spawning a new thread for a node execution correct? How do other developers deploy langgraph at production and prevent threading error? |
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
[2024-11-28 05:20:12,030: ERROR/ForkPoolWorker-26] can't start new thread
Traceback (most recent call last):
File "/app/core/ai/agent.py", line 196, in r_brief_generator
response = await self.standart_writer_model.with_structured_output(Briefs).ainvoke(messages)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2921, in ainvoke
input = await asyncio.create_task(part(), context=context) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 5105, in ainvoke
return await self.bound.ainvoke(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 298, in ainvoke
llm_result = await self.agenerate_prompt(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 787, in agenerate_prompt
return await self.agenerate(
^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 747, in agenerate
raise exceptions[0]
File "/opt/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 923, in _agenerate_with_cache
result = await self._agenerate(
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/langchain_openai/chat_models/base.py", line 843, in _agenerate
response = await self.async_client.create(**payload)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 1490, in create
return await self._post(
^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1838, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1532, in request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1552, in _request
self._platform = await asyncify(get_platform)()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/openai/_utils/_sync.py", line 69, in wrapper
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2114, in run_sync_in_worker_thread
worker.start()
File "/root/.nix-profile/lib/python3.11/threading.py", line 964, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
Description
I have deployed a langgraph agent at production. The agent runs inside a Celery task. 10-20 nodes of the agent are executed parallelly. At peak usage, I get "can't start a new thread" error.
My agent is stateless, no thread ID is provided while invoking the agent. I am new to langgraph and multi threading.
Should I upgrade my CPU and memory? How can I improve my code so that more threads won't be a problem in the future?
System Info
CPU: 32 vCPU
Memory: 32 GB
The text was updated successfully, but these errors were encountered: