Handling "RuntimeError: can't start a new thread" error at production. #2873

alimardanov · 2024-12-25T15:02:00Z

Checked other resources

This is a bug, not a usage question. For questions, please use GitHub Discussions.
I added a clear and detailed title that summarizes the issue.
I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
I included a self-contained, minimal example that demonstrates the issue INCLUDING all the relevant imports. The code run AS IS to reproduce the issue.

Example Code

@celery_app.task(bind=True)
def content_gen_celery_task(self, task_id: str) -> str:
   loop = asyncio.get_event_loop()
   asyncio.set_event_loop(loop)
  loop.run_until_complete(generate(task_data=task_data, temerature=0.7))

async def generate(task_data:TaskData, temerature:float=0.7):
    content_maker_agent = ContentMakerAgent(
                input_data=input_data,
                llm_model=task_data.llm_model,
                temperature=temerature
            )
    generated_content = await content_maker_agent.graph.ainvoke(input_data )

class ContentMakerAgent():
    def __init__(self, input_data:ContentState, llm_model:str, temperature=0.7):
        builder = StateGraph(ContentState)
        self.writer_model = ChatOpenAI(model=llm_model, temperature=temperature)
        self.briefs = input_data["briefs"]

        for i, brief in enumerate(self.briefs): 
            node_name = f"node_{i}"
            function_declaration = GenContent(state=input_data,
                                              brief=brief, 
                                              llm_model=llm_model, 
                                              temperature=temperature, )
            
            builder.add_node(node_name, function_declaration)
            builder.add_edge(START, node_name)

            builder.add_edge(node_name, END)
        self.graph = builder.compile()

Error Message and Stack Trace (if applicable)

[2024-11-28 05:20:12,030: ERROR/ForkPoolWorker-26] can't start new thread

Traceback (most recent call last):

File "/app/core/ai/agent.py", line 196, in r_brief_generator

response = await self.standart_writer_model.with_structured_output(Briefs).ainvoke(messages)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2921, in ainvoke

input = await asyncio.create_task(part(), context=context) # type: ignore

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 5105, in ainvoke

return await self.bound.ainvoke(

^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 298, in ainvoke

llm_result = await self.agenerate_prompt(

^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 787, in agenerate_prompt

return await self.agenerate(

^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 747, in agenerate

raise exceptions[0]

File "/opt/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 923, in _agenerate_with_cache

result = await self._agenerate(

^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/langchain_openai/chat_models/base.py", line 843, in _agenerate

response = await self.async_client.create(**payload)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 1490, in create

return await self._post(

^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1838, in post

return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1532, in request

return await self._request(

^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1552, in _request

self._platform = await asyncify(get_platform)()

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/openai/_utils/_sync.py", line 69, in wrapper

return await anyio.to_thread.run_sync(

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync

return await get_async_backend().run_sync_in_worker_thread(

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2114, in run_sync_in_worker_thread

worker.start()

File "/root/.nix-profile/lib/python3.11/threading.py", line 964, in start

_start_new_thread(self._bootstrap, ())

RuntimeError: can't start new thread

Description

I have deployed a langgraph agent at production. The agent runs inside a Celery task. 10-20 nodes of the agent are executed parallelly. At peak usage, I get "can't start a new thread" error.

My agent is stateless, no thread ID is provided while invoking the agent. I am new to langgraph and multi threading.
Should I upgrade my CPU and memory? How can I improve my code so that more threads won't be a problem in the future?

System Info

CPU: 32 vCPU
Memory: 32 GB

eyurtsev · 2024-12-25T15:31:59Z

Please include full stack trace and if possible a minimal reproducible example. There's not enough context to help now

alimardanov · 2024-12-25T15:42:02Z

Please include full stack trace and if possible a minimal reproducible example. There's not enough context to help now

Thanks for the response. This error is generated at peak usage only. Above, I have provided an error trace.

alimardanov · 2024-12-25T16:34:09Z

I added the error trace and my code that generates the runtime error. Let me know if you need more information.

eyurtsev · 2024-12-25T19:26:11Z

My agent is stateless, no thread ID is provided while invoking the agent. I am new to langgraph and multi threading.

Based on the stack trace the issue isn't related at all to langgraph threads (Langgraph threads are just unique identifiers for a conversation and have nothing to do with hardware threads).

The celery code is probably not set up properly for async workloads. I haven't used celery in an async setting in the past but have you looked into guidelines for doing so?

alimardanov · 2024-12-25T19:47:16Z

@eyurtsev I am using celery in async setting for a year now. I used to utilize langchain and a simple for-loop for text generation. I had never had any threading issue back then.

However, I am utilizing parallel node execution functionality of the langgraph instead of simple "for-loop" right now. I am under the impression that langgraph spawns a new thread for each node execution. Each parallelly executed node is a separate thread and gets closed when the node is finished.

Honestly, I am not sure if I set celery in the right way for async task execution. I am doing more research on this.

I believe that high celery task concurrency (30-60 celery task concurrency) + langgraph threads are combined to quickly blow away my system resources.

Is my assumption of langgraph spawning a new thread for a node execution correct?

How do other developers deploy langgraph at production and prevent threading error?

This comment was marked as duplicate.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling "RuntimeError: can't start a new thread" error at production. #2873

Handling "RuntimeError: can't start a new thread" error at production. #2873

alimardanov commented Dec 25, 2024 •

edited

Loading

eyurtsev commented Dec 25, 2024

This comment was marked as duplicate.

alimardanov commented Dec 25, 2024 •

edited

Loading

alimardanov commented Dec 25, 2024

eyurtsev commented Dec 25, 2024

alimardanov commented Dec 25, 2024 •

edited

Loading

Handling "RuntimeError: can't start a new thread" error at production. #2873

Handling "RuntimeError: can't start a new thread" error at production. #2873

Comments

alimardanov commented Dec 25, 2024 • edited Loading

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

eyurtsev commented Dec 25, 2024

This comment was marked as duplicate.

alimardanov commented Dec 25, 2024 • edited Loading

alimardanov commented Dec 25, 2024

eyurtsev commented Dec 25, 2024

alimardanov commented Dec 25, 2024 • edited Loading

alimardanov commented Dec 25, 2024 •

edited

Loading

alimardanov commented Dec 25, 2024 •

edited

Loading

alimardanov commented Dec 25, 2024 •

edited

Loading