Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling "RuntimeError: can't start a new thread" error at production. #2873

Open
3 of 4 tasks
alimardanov opened this issue Dec 25, 2024 · 6 comments
Open
3 of 4 tasks

Comments

@alimardanov
Copy link

alimardanov commented Dec 25, 2024

Checked other resources

  • This is a bug, not a usage question. For questions, please use GitHub Discussions.
  • I added a clear and detailed title that summarizes the issue.
  • I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
  • I included a self-contained, minimal example that demonstrates the issue INCLUDING all the relevant imports. The code run AS IS to reproduce the issue.

Example Code

@celery_app.task(bind=True)
def content_gen_celery_task(self, task_id: str) -> str:
   loop = asyncio.get_event_loop()
   asyncio.set_event_loop(loop)
  loop.run_until_complete(generate(task_data=task_data, temerature=0.7))

async def generate(task_data:TaskData, temerature:float=0.7):
    content_maker_agent = ContentMakerAgent(
                input_data=input_data,
                llm_model=task_data.llm_model,
                temperature=temerature
            )
    generated_content = await content_maker_agent.graph.ainvoke(input_data )

class ContentMakerAgent():
    def __init__(self, input_data:ContentState, llm_model:str, temperature=0.7):
        builder = StateGraph(ContentState)
        self.writer_model = ChatOpenAI(model=llm_model, temperature=temperature)
        self.briefs = input_data["briefs"]

        for i, brief in enumerate(self.briefs): 
            node_name = f"node_{i}"
            function_declaration = GenContent(state=input_data,
                                              brief=brief, 
                                              llm_model=llm_model, 
                                              temperature=temperature, )
            
            builder.add_node(node_name, function_declaration)
            builder.add_edge(START, node_name)

            builder.add_edge(node_name, END)
        self.graph = builder.compile()

Error Message and Stack Trace (if applicable)

[2024-11-28 05:20:12,030: ERROR/ForkPoolWorker-26] can't start new thread

Traceback (most recent call last):

File "/app/core/ai/agent.py", line 196, in r_brief_generator

response = await self.standart_writer_model.with_structured_output(Briefs).ainvoke(messages)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2921, in ainvoke

input = await asyncio.create_task(part(), context=context) # type: ignore

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 5105, in ainvoke

return await self.bound.ainvoke(

^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 298, in ainvoke

llm_result = await self.agenerate_prompt(

^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 787, in agenerate_prompt

return await self.agenerate(

^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 747, in agenerate

raise exceptions[0]

File "/opt/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 923, in _agenerate_with_cache

result = await self._agenerate(

^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/langchain_openai/chat_models/base.py", line 843, in _agenerate

response = await self.async_client.create(**payload)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 1490, in create

return await self._post(

^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1838, in post

return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1532, in request

return await self._request(

^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1552, in _request

self._platform = await asyncify(get_platform)()

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/openai/_utils/_sync.py", line 69, in wrapper

return await anyio.to_thread.run_sync(

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync

return await get_async_backend().run_sync_in_worker_thread(

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2114, in run_sync_in_worker_thread

worker.start()

File "/root/.nix-profile/lib/python3.11/threading.py", line 964, in start

_start_new_thread(self._bootstrap, ())

RuntimeError: can't start new thread

Description

I have deployed a langgraph agent at production. The agent runs inside a Celery task. 10-20 nodes of the agent are executed parallelly. At peak usage, I get "can't start a new thread" error.

My agent is stateless, no thread ID is provided while invoking the agent. I am new to langgraph and multi threading.
Should I upgrade my CPU and memory? How can I improve my code so that more threads won't be a problem in the future?

System Info

CPU: 32 vCPU
Memory: 32 GB

@eyurtsev
Copy link
Contributor

Please include full stack trace and if possible a minimal reproducible example. There's not enough context to help now

@alimardanov

This comment was marked as duplicate.

@alimardanov
Copy link
Author

alimardanov commented Dec 25, 2024

Please include full stack trace and if possible a minimal reproducible example. There's not enough context to help now

Thanks for the response. This error is generated at peak usage only. Above, I have provided an error trace.

@alimardanov
Copy link
Author

I added the error trace and my code that generates the runtime error. Let me know if you need more information.

@eyurtsev
Copy link
Contributor

My agent is stateless, no thread ID is provided while invoking the agent. I am new to langgraph and multi threading.

Based on the stack trace the issue isn't related at all to langgraph threads (Langgraph threads are just unique identifiers for a conversation and have nothing to do with hardware threads).

The celery code is probably not set up properly for async workloads. I haven't used celery in an async setting in the past but have you looked into guidelines for doing so?

@alimardanov
Copy link
Author

alimardanov commented Dec 25, 2024

@eyurtsev I am using celery in async setting for a year now. I used to utilize langchain and a simple for-loop for text generation. I had never had any threading issue back then.

However, I am utilizing parallel node execution functionality of the langgraph instead of simple "for-loop" right now. I am under the impression that langgraph spawns a new thread for each node execution. Each parallelly executed node is a separate thread and gets closed when the node is finished.

Honestly, I am not sure if I set celery in the right way for async task execution. I am doing more research on this.

I believe that high celery task concurrency (30-60 celery task concurrency) + langgraph threads are combined to quickly blow away my system resources.

Is my assumption of langgraph spawning a new thread for a node execution correct?

How do other developers deploy langgraph at production and prevent threading error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants