Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wayland backend: high chance of aborting on subsequent attempts to create the gamescope window #1456

Open
layercak3 opened this issue Aug 5, 2024 · 7 comments · May be fixed by #1611
Open

Comments

@layercak3
Copy link

layercak3 commented Aug 5, 2024

Gamescope: f1963e9 (-Dforce_fallback_for=glm,stb,libdisplay-info,libliftoff,vkroots,wlroots)
Host compositor: sway (7e74a4914) wlroots (f9199bb), also tested sway 1.9/wlroots from Arch extra repository

Reproduce:

for count in {1..10}; do echo cycle $count; timeout 1 eglgears_x11; sleep 1; done; echo done

With gamescope --backend wayland.

Either outside of gamescope with DISPLAY correctly set to gamescope's X server or passed as the command to gamescope like gamescope --backend wayland -- sh -c 'for count in {1..10}; do echo cycle $count; timeout 1 eglgears_x11; sleep 1; done; echo done'

Expected behaviour: gamescope window closes after one second, comes back after another second, and so on.

Actual behaviour: There is a high chance of gamescope aborting when attempting to launch a window for the second time (after printing 'cycle 2'). After a few attempts I can get to cycle 2 without a crash but it would crash after printing cycle 3.

(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff73b6463 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
#2  0x00007ffff735d120 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff73444c3 in __GI_abort () at abort.c:79
#4  0x000055555558c9a5 in gamescope::CWaylandInputThread::ThreadFunc (this=0x5555558d94d0) at ../gamescope/src/Backends/WaylandBackend.cpp:2353
#5  operator() (__closure=<optimized out>) at ../gamescope/src/Backends/WaylandBackend.cpp:2283
#6  std::__invoke_impl<void, gamescope::CWaylandInputThread::CWaylandInputThread()::<lambda()> > (__f=...) at /usr/include/c++/14.1.1/bits/invoke.h:61
#7  std::__invoke<gamescope::CWaylandInputThread::CWaylandInputThread()::<lambda()> > (__fn=...) at /usr/include/c++/14.1.1/bits/invoke.h:96
#8  std::thread::_Invoker<std::tuple<gamescope::CWaylandInputThread::CWaylandInputThread()::<lambda()> > >::_M_invoke<0> (this=<optimized out>) at /usr/include/c++/14.1.1/bits/std_thread.h:301
#9  std::thread::_Invoker<std::tuple<gamescope::CWaylandInputThread::CWaylandInputThread()::<lambda()> > >::operator() (this=<optimized out>) at /usr/include/c++/14.1.1/bits/std_thread.h:308
#10 std::thread::_State_impl<std::thread::_Invoker<std::tuple<gamescope::CWaylandInputThread::CWaylandInputThread()::<lambda()> > > >::_M_run(void) (this=<optimized out>) at /usr/include/c++/14.1.1/bits/std_thread.h:253
#11 0x00007ffff76e1cf4 in std::execute_native_thread_routine (__p=0x5555558d4030) at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/thread.cc:104
#12 0x00007ffff73b439d in start_thread (arg=<optimized out>) at pthread_create.c:447
#13 0x00007ffff743949c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Specifically here inside CWaylandInputThread::ThreadFunc() it aborted:

        int nFD = wl_display_get_fd( m_pBackend->GetDisplay() );
        if ( nFD < 0 )
        {
                abort();
        }

I have had the abort in real-world cases like during wineboot's series of dialogs. (Proton can wineboot without spawning windows, which is nice.)

In games/game engines which cause gamescope to open and close a window very quickly before opening the main window (I could not replicate this scenario using the eglgears_x11 example above), there is a random chance that gamescope will abort like described above, but if it does launch, it launches into a seemingly usable state except the xdg-shell app id is not set and when turning the window from tiled into floating ('floating toggle' in sway), gamescope does not resize itself to the user specified dimensions (-W -H) but instead to whatever the window size was initially when starting gamescope (i.e. roughly half the screen in a tiling window manager when launching from a workspace with a single teriminal). I was able to avoid this behaviour using a package from an older commit (e.g. 0b492ad) but the random chance of gamescope aborting still existed. Also, using --synchronous-x11 has a random chance of avoiding this weird incomplete state on launch too (it still happens sometimes, but not always), but it doesn't stop the random chance of gamescope aborting. Anyway, the behaviour in this paragraph probably emerged from the wider issue.

Running gamescope with --debug-layers --debug-focus --debug-hud --debug-events --composite-debug somewhat/slightly increases the chances of avoiding an abort or launching with weird state in the scenario in the above paragraph.

@sharkautarch
Copy link

@layercak3
When doing the same stuff but with the argument --backend sdl, does everything work well?
Just checking if the issue is specific to the wayland backend

@layercak3
Copy link
Author

layercak3 commented Aug 5, 2024

The SDL backend doesn't have the same symptoms in the OP, but after testing more it has its own related issues. I wouldn't be surprised if the behaviour in both backends emerge from the same synchronization issues.

In the eglgears_x11 example, instead of the random chance of aborting there is a random chance that it simply does not open a window and never opens a window again. I stop seeing these log messages pop up once the window stops opening:

[gamescope] [Warn]  xwm: got the same buffer committed twice, ignoring.
[gamescope] [Info]  vulkan: Creating Gamescope nested swapchain with format 64 and colorspace 0

But when using --expose-wayland and eglgears_wayland, it runs all 10 cycles successfully. Keep in mind when using the wayland backend instead of SDL, --expose-wayland and eglgears_wayland doesn't prevent aborts from happening.

In the "game that rapidly opens and closes a window before opening the main window" example (using proton's winex11.drv), there is a random chance of gamescope either opening properly or not opening (and there is no sound, so it's not like it's rendering successfully offscreen), similar to the eglgears_x11 example.

Edit: this is with SDL_VIDEODRIVER=wayland. With SDL_VIDEODRIVER=x11, this behaviour doesn't occur in the SDL backend. It's also possible that I just got lucky. The SDL backend + SDL_VIDEODRIVER=x11 also has graphical corruption issues which are unrelated to this issue.

@MithicSpirit
Copy link

I also have this issue. This is particularly problematic, as some games (specifically, Warframe) have a launcher that closes right before opening the game, which triggers this bug (confirmed by analyzing the coredump).

@layercak3
Copy link
Author

layercak3 commented Oct 15, 2024

A workaround for games that do this is to start something like eglgears_x11, then your application, so that gamescope returns to displaying eglgears_x11 instead of removing the window when the application briefly goes between displaying a window -> not displaying a window -> displaying a window.

@layercak3
Copy link
Author

The fd wasn't available because the host compositor (sway/wlroots) actually just threw a protocol error (xdg_surface unconfigured_buffer, message="xdg_surface has never been configured") so the connection died.
Log is attached, annotated with @@@@. It's unfortunately mixed with gamescope-as-a-wayland-client and gamescope-as-a-wayland-server, though.
gamescope-wayland-debug.txt

Sometimes the gamescope window re-opens but without the app_id and sometimes without both the app_id and title.

@layercak3
Copy link
Author

layercak3 commented Oct 28, 2024

Log but with WAYLAND_DEBUG=client and an Xwayland wrapper to unset WAYLAND_DEBUG for Xwayland, so only the wayland backend <-> host compostor messages are shown.
gamescope-wayland-debug-2.txt

In count 2 the compositor sends wm_capabilities/configure and the gamescope window opens, but without app_id/title. For some reason the host compositor is forgetting it? Then in count 3 it doesn't send wm_capabilities/configure and so when gamescope tries to commit on the corresponding wl_surface it's a protocol error.

@layercak3
Copy link
Author

layercak3 commented Oct 28, 2024

So after @@@@ killall xev count 1, gamescope attaches a NULL buffer to the xdg_surface#50's wl_surface#47 then does a commit.
According to https://gitlab.freedesktop.org/wlroots/wlroots/-/blob/5c98d1a04a1439bf40c6e516086cfaff2d67f135/types/xdg_shell/wlr_xdg_surface.c#L18, wlroots is resetting its wlr_xdg_surface structure when that happens. "1) a surface is unmapped due to a commit with NULL buffer".
When I remove all references to reset_xdg_surface in wlr_xdg_surface.c, there's no more protocol error and everything works, except the "when turning the window from tiled into floating ('floating toggle' in sway), gamescope does not resize itself to the user specified dimensions (-W -H) but instead to whatever the window size was initially when starting gamescope" from OP applies.
Alternatively, I can remove wl_surface_attach( m_pSurface, nullptr, 0, 0 ); from WaylandBackend.cpp, except that would leave a frame which would not disappear and get in the way when the window is resized, and when closing the application, the gamescope window is still visible (it's still mapped).

Maybe gamescope needs to either never unmap the surface (just keep a dummy window up instead of unmapping/attaching null buffer), or properly remap and set the title/app_id again.

layercak3 added a commit to layercak3/gamescope-ghfork that referenced this issue Nov 3, 2024
When becoming invisible, a NULL buffer is attached to the toplevel's
surface which unmaps it. The compositor resets their state and the
surface may also be considered unconfigured. Upon becoming visible, the
surface must be re-mapped using the same process during initialization
(commit without a buffer and wait for configure) before we begin
attaching an actual buffer. The default properties should also be
recovered.

Fixes: ValveSoftware#1456
Fixes: ValveSoftware#1451 (probably)
Fixes: ValveSoftware#1488 (probably)
Fixes: ValveSoftware#1533
layercak3 added a commit to layercak3/gamescope-ghfork that referenced this issue Nov 3, 2024
When becoming invisible, a NULL buffer is attached to the toplevel's
surface which unmaps it. The compositor resets their state and the
surface may also be considered unconfigured. Upon becoming visible, the
surface must be re-mapped using the same process during initialization
(commit without a buffer and wait for configure) before we begin
attaching an actual buffer. The default properties should also be
recovered.

Fixes: ValveSoftware#1456
Fixes: ValveSoftware#1451 (probably)
Fixes: ValveSoftware#1488 (probably)
Fixes: ValveSoftware#1533
layercak3 added a commit to layercak3/gamescope-ghfork that referenced this issue Dec 26, 2024
When becoming invisible, a NULL buffer is attached to the toplevel's
surface which unmaps it. The compositor resets their state and the
surface may also be considered unconfigured. Upon becoming visible, the
surface must be re-mapped using the same process during initialization
(commit without a buffer and wait for configure) before we begin
attaching an actual buffer. The default properties should also be
recovered.

Fixes: ValveSoftware#1456
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants