You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the failure modes I'm seeing for those looks like either a bug in the Fossilize Vulkan layer, or a bug in some other component that's made worse by how Fossilize's VK_LAYER_fossilize_GetDeviceProcAddr behaves.
Steps to reproduce
Install libstrangle. I used a Debian 11 machine with NVIDIA proprietary graphics, and installed libstrangle into /usr/local with the upstream makefile (make clean && make && sudo make install).
In Steam, set the launch options for a Vulkan game to DISABLE_VK_LAYER_VALVE_steam_overlay_1=1 stranglevk -f 3 %command% (I'm disabling the Steam overlay to reduce the number of moving parts, I get a different segfault if I leave both that and Fossilize enabled).
Set the game to launch in the "Steam Linux Runtime" compatibility tool.
Select the client_beta branches of both "Steam Linux Runtime" and "Steam Linux Runtime - Soldier" (they should be installed automatically, you just need to switch branch). steamapps/common/SteamLinuxRuntime_soldier/VERSIONS.txt needs to say pressure-vessel 0.20210809.1 or later, which is currently only in the beta; otherwise libstrangle accidentally gets disabled and you won't see the crash.
Launch the game.
Expected result
The game runs, throttled to approximately 3fps.
Actual result
Segmentation fault.
Steps to reproduce, with debugging
As above, but run Steam with PRESSURE_VESSEL_SHELL=instead in the environment. Now, when you launch a game in the container runtime, instead of the actual game you'll get an xterm, in which you can run "$@" to get the actual game.
Ideally, before running gdb, set the DEBUGINFOD_URLS environment variable to a space-separated list with a source of detached debug symbols for your host OS (e.g. https://debuginfod.debian.net for Debian), and also debuginfod reading from com.valvesoftware.SteamRuntime.Sdk-amd64,i386-soldier-debug.tar.gz from soldier.
On the host system, run gdb -ex 'target localhost:12345' to connect a remote debugger to the process in the container, and type cont to continue.
Stack trace
The crash seems to be infinite recursion in VK_LAYER_fossilize_GetDeviceProcAddr(), leading to a segfault when stack space runs out. The prlimit -s100000 in my reproducer is to make that happen sooner, so that there are "only" 1750 or so stack frames, rather than tens of thousands.
#0 0x00007ffff7b9829a in vkGetDeviceProcAddr (pName=0x7ffff7baa747 "vkQueueSubmit", device=0x5555558d8e48) at ./loader/trampoline.c:91
#1 vkGetDeviceProcAddr (device=0x5555558d8e48, pName=<optimized out>) at ./vulkan-headers/include/vulkan/vulkan_core.h:3330
#2 0x00007fffeb54e69e in VK_LAYER_fossilize_GetDeviceProcAddr ()
from target:/usr/lib/pressure-vessel/overrides/lib/x86_64-linux-gnu/vulkan_imp_layer/libVkLayer_steam_fossilize.so
#3 0x00007fffeb54e69e in VK_LAYER_fossilize_GetDeviceProcAddr ()
from target:/usr/lib/pressure-vessel/overrides/lib/x86_64-linux-gnu/vulkan_imp_layer/libVkLayer_steam_fossilize.so
...
#1755 0x00007fffeb54e69e in VK_LAYER_fossilize_GetDeviceProcAddr () from target:/usr/lib/pressure-vessel/overrides/lib/x86_64-linux-gnu/vulkan_imp_layer/libVkLayer_steam_fossilize.so
#1756 0x00007fffeb54e69e in VK_LAYER_fossilize_GetDeviceProcAddr () from target:/usr/lib/pressure-vessel/overrides/lib/x86_64-linux-gnu/vulkan_imp_layer/libVkLayer_steam_fossilize.so
#1757 0x00007ffff7b7f363 in loader_init_device_dispatch_table (dev_table=dev_table@entry=0x555555814cb0, gpa=gpa@entry=0x7fffeb54e600 <VK_LAYER_fossilize_GetDeviceProcAddr>, dev=0x5555558d8e48) at ./loader/generated/vk_loader_extensions.c:318
#1758 0x00007ffff7b9272e in loader_create_device_chain (pd=pd@entry=0x5555557f5c40, pCreateInfo=pCreateInfo@entry=0x7fffffffbdd0, pAllocator=pAllocator@entry=0x0, inst=inst@entry=0x55555558e450, dev=dev@entry=0x555555814cb0, callingLayer=callingLayer@entry=0x0, layerNextGDPA=0x0) at ./loader/loader.c:6291
#1759 0x00007ffff7b93559 in loader_layer_create_device (instance=instance@entry=0x0, physicalDevice=physicalDevice@entry=0x5555557f5de0, pCreateInfo=pCreateInfo@entry=0x7fffffffbdd0, pAllocator=pAllocator@entry=0x0, pDevice=pDevice@entry=0x7fffffffbe40, layerGIPA=layerGIPA@entry=0x0, nextGDPA=0x0) at ./loader/loader.c:5838
#1760 0x00007ffff7b9671f in vkCreateDevice (physicalDevice=0x5555557f5de0, pCreateInfo=0x7fffffffbdd0, pAllocator=0x0, pDevice=0x7fffffffbe40) at ./loader/trampoline.c:779
#1761 0x0000555555558949 in ?? ()
#1762 0x0000555555557902 in ?? ()
#1763 0x00007ffff7999d0a in __libc_start_main (main=0x555555557700, argc=1, argv=0x7fffffffc288, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffc278) at ../csu/libc-start.c:308
The text was updated successfully, but these errors were encountered:
I think what is happening here is that as a result of 6916486, if VK_LAYER_fossilize_GetDeviceProcAddr() somehow sees layer->getTable()->GetDeviceProcAddr == VK_LAYER_fossilize_GetDeviceProcAddr, it will call into itself until it runs out of stack space.
Perhaps GetDeviceProcAddr needs to be exempt from the check added in 6916486, so that if pName is "vkGetDeviceProcAddr", it short-circuits to returning VK_LAYER_fossilize_GetDeviceProcAddr immediately? (As though interceptCoreDeviceCommand had been called first, as it was before 6916486 - but it would likely be easier done as a special-case.)
It's entirely possible that one of the other layers involved is doing something wrong, and libstrangle certainly has other issues - but its GetDeviceProcAddr implementation seems to be heavily based on Mesa's overlay, which I would hope is doing the fallback dance correctly. It seems to me that infinite recursion is never going to be the correct answer, so it would probably be more robust if Fossilize avoids the recursion for GetDeviceProcAddr, even if the crash is not actually Fossilize's fault.
If layer->getTable()->GetDeviceProcAddr is fossilize's own vkGetDeviceProcAddr you have bug somewhere, and I'm pretty sure it's not in fossilize since all it does to get that point is call down the layer chain. Your hypothesis that this is caused by pName being "vkGetDeviceProcAddr" also makes little sense. The last function before VK_LAYER_fossilize_GetDeviceProcAddr in your stack trace is loader_init_device_dispatch_table, which never calls vkGetDeviceProcAddr with "vkGetDeviceProcAddr".
While investigating ValveSoftware/steam-runtime#443, I tried running Artifact (a Vulkan game) with the libstrangle frame-rate-limiter module.
One of the failure modes I'm seeing for those looks like either a bug in the Fossilize Vulkan layer, or a bug in some other component that's made worse by how Fossilize's
VK_LAYER_fossilize_GetDeviceProcAddr
behaves.Steps to reproduce
Install libstrangle. I used a Debian 11 machine with NVIDIA proprietary graphics, and installed libstrangle into /usr/local with the upstream makefile (
make clean && make && sudo make install
).In Steam, set the launch options for a Vulkan game to
DISABLE_VK_LAYER_VALVE_steam_overlay_1=1 stranglevk -f 3 %command%
(I'm disabling the Steam overlay to reduce the number of moving parts, I get a different segfault if I leave both that and Fossilize enabled).Set the game to launch in the "Steam Linux Runtime" compatibility tool.
Select the
client_beta
branches of both "Steam Linux Runtime" and "Steam Linux Runtime - Soldier" (they should be installed automatically, you just need to switch branch).steamapps/common/SteamLinuxRuntime_soldier/VERSIONS.txt
needs to saypressure-vessel 0.20210809.1
or later, which is currently only in the beta; otherwise libstrangle accidentally gets disabled and you won't see the crash.Launch the game.
Expected result
The game runs, throttled to approximately 3fps.
Actual result
Segmentation fault.
Steps to reproduce, with debugging
As above, but run Steam with
PRESSURE_VESSEL_SHELL=instead
in the environment. Now, when you launch a game in the container runtime, instead of the actual game you'll get anxterm
, in which you can run"$@"
to get the actual game.Instead of
"$@"
, run:Ideally, before running
gdb
, set theDEBUGINFOD_URLS
environment variable to a space-separated list with a source of detached debug symbols for your host OS (e.g.https://debuginfod.debian.net
for Debian), and also debuginfod reading fromcom.valvesoftware.SteamRuntime.Sdk-amd64,i386-soldier-debug.tar.gz
from soldier.On the host system, run
gdb -ex 'target localhost:12345'
to connect a remote debugger to the process in the container, and typecont
to continue.Stack trace
The crash seems to be infinite recursion in
VK_LAYER_fossilize_GetDeviceProcAddr()
, leading to a segfault when stack space runs out. Theprlimit -s100000
in my reproducer is to make that happen sooner, so that there are "only" 1750 or so stack frames, rather than tens of thousands.The text was updated successfully, but these errors were encountered: