You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are using Triton to serve a BLS model. Inside the model.py file for this BLS model, there is a function that uses the triton gRPC client to query another model hosted on the server. While this process works correctly, the issue arises in the execute function. When the final output tensor is extracted, I attempt to cast it as a pb_utils.Tensor object and append it to the InferenceResponse class as documented. However, during the pb_utils.Tensor casting, a Segmentation fault error occurs.
My triton inference server docker image is 24.07-py3
Cuda is 12.5
Error Stack
Output tensor has been extracted (1536, 1536)
Final output type: <class 'numpy.ndarray'>
Final output shape: (1536, 1536), dtype: uint8
Signal (11) received.
0# 0x00005C1039DE580D in tritonserver
1# 0x0000758CBE932520 in /usr/lib/x86_64-linux-gnu/libc.so.6
2# 0x0000758CB535CBD5 in /opt/tritonserver/backends/python/libtriton_python.so
3# 0x0000758CB53604F2 in /opt/tritonserver/backends/python/libtriton_python.so
4# 0x0000758CB5360943 in /opt/tritonserver/backends/python/libtriton_python.so
5# 0x0000758CB533DFF7 in /opt/tritonserver/backends/python/libtriton_python.so
6# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/python/libtriton_python.so
7# 0x0000758CBD311944 in /opt/tritonserver/bin/../lib/libtritonserver.so
8# 0x0000758CBD311CBB in /opt/tritonserver/bin/../lib/libtritonserver.so
9# 0x0000758CBD42D23D in /opt/tritonserver/bin/../lib/libtritonserver.so
10# 0x0000758CBD3160F4 in /opt/tritonserver/bin/../lib/libtritonserver.so
11# 0x0000758CBF01A253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
12# 0x0000758CBE984AC3 in /usr/lib/x86_64-linux-gnu/libc.so.6
13# clone in /usr/lib/x86_64-linux-gnu/libc.so.6
I've confirmed that the dtype, tensor shape that the config.pbtxt for the BLS model is in alignment with what is being sent in the execute function.
I reference my custom execution environment in config.pbtxt (the tarball), and I have a custom triton_python_backend_stub. If possible could you assist me in finding where the source of the error is coming from.
The text was updated successfully, but these errors were encountered:
After debugging with GBD, I've obtained a backtrace that shows the following logs indicating why the crash occurs:
#0 0x000076cf6215cbd5 in boost::intrusive::bstree_impl<boost::intrusive::bhtraits<boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::block_ctrl, boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, true>, (boost::intrusive::link_mode_type)0, boost::intrusive::dft_tag, 3u>, void, void, unsigned long, true, (boost::intrusive::algo_types)5, void>::erase(boost::intrusive::tree_iterator<boost::intrusive::bhtraits<boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::block_ctrl, boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, true>, (boost::intrusive::link_mode_type)0, boost::intrusive::dft_tag, 3u>, true>) ()
from /opt/tritonserver/backends/python/libtriton_python.so
#1 0x000076cf621604f2 in boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::priv_deallocate(void*) () from /opt/tritonserver/backends/python/libtriton_python.so
#2 0x000076cf62160943 in std::_Function_handler<void (triton::backend::python::ResponseBatch*), triton::backend::python::SharedMemoryManager::WrapObjectInUniquePtr<triton::backend::python::ResponseBatch>(triton::backend::python::ResponseBatch*, triton::backend::python::AllocatedShmOwnership*, long const&)::{lambda(triton::backend::python::ResponseBatch*)#1}>::_M_invoke(std::_Any_data const&, triton::backend::python::ResponseBatch*&&) () from /opt/tritonserver/backends/python/libtriton_python.so
#3 0x000076cf6213dff7 in triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, std::vector<std::unique_ptr<triton::backend::python::InferRequest, std::default_delete<triton::backend::python::InferRequest> >, std::allocator<std::unique_ptr<triton::backend::python::InferRequest, std::default_delete<triton::backend::python::InferRequest> > > >&, triton::backend::python::PbMetricReporter&) () from /opt/tritonserver/backends/python/libtriton_python.so
#4 0x000076cf6213e34a in TRITONBACKEND_ModelInstanceExecute ()
from /opt/tritonserver/backends/python/libtriton_python.so
From my understanding the segmentation fault occurs post-inference. I also noticed via ps aux, that a triton backend stub process is created when starting the tritonserver then another one is created when the server is being queried.
Does this extra context assist with debugging the issue?
We are using Triton to serve a BLS model. Inside the model.py file for this BLS model, there is a function that uses the triton gRPC client to query another model hosted on the server. While this process works correctly, the issue arises in the execute function. When the final output tensor is extracted, I attempt to cast it as a pb_utils.Tensor object and append it to the InferenceResponse class as documented. However, during the pb_utils.Tensor casting, a Segmentation fault error occurs.
My triton inference server docker image is 24.07-py3
Cuda is 12.5
Error Stack
I've confirmed that the dtype, tensor shape that the config.pbtxt for the BLS model is in alignment with what is being sent in the execute function.
Here is the execute function that is throwing the error in model.py:
My model repository is as shown:
I reference my custom execution environment in config.pbtxt (the tarball), and I have a custom triton_python_backend_stub. If possible could you assist me in finding where the source of the error is coming from.
The text was updated successfully, but these errors were encountered: