Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wgpu branch] Vulkan Validation errors across all examples when run with --release #458

Closed
2 tasks
mitchmindtree opened this issue Feb 27, 2020 · 3 comments
Closed
2 tasks

Comments

@mitchmindtree
Copy link
Member

This issue is related to the the wgpu branch at #452.

I've been attempting to do some investigating into a bug that @MacTuitui reported when attempting to run any of the examples on the #452 wgpu branch. See their error log and vulkaninfo output here.

Before the panic! occurs, we can see that the validation layers emit 5 separate errors:

VUID-VkDescriptorPoolCreateInfo-poolSizeCount-arraylength(ERROR / SPEC): msgNum: 0 - vkCreateDescriptorPool: parameter pCreateInfo->poolSizeCount must be greater than 0. The Vulkan spec states: poolSizeCount must be greater than 0 (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-VkDescriptorPoolCreateInfo-poolSizeCount-arraylength)
    Objects: 1
        [0] 0, type: 0, name: NULL
UNASSIGNED-CoreValidation-DrawState-QueueForwardProgress(ERROR / SPEC): msgNum: 0 - VkQueue 0x556044c91630[] is waiting on VkSemaphore 0x260000000026[] that has no way to be signaled.
    Objects: 1
        [0] 0, type: 6, name: NULL
VUID-VkPresentInfoKHR-pImageIndices-01296(ERROR / SPEC): msgNum: 0 - Images passed to present must be in layout VK_IMAGE_LAYOUT_PRESENT_SRC_KHR or VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR but is in VK_IMAGE_LAYOUT_UNDEFINED. The Vulkan spec states: Each element of pImageIndices must be the index of a presentable image acquired from the swapchain specified by the corresponding element of the pSwapchains array, and the presented image subresource must be in the VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout at the time the operation is executed on a VkDevice (https://github.com/KhronosGroup/Vulkan-Docs/search?q=VUID-VkPresentInfoKHR-pImageIndices-01296)
    Objects: 1
        [0] 0x556044c91630, type: 4, name: NULL
VUID-VkPresentInfoKHR-pImageIndices-01296(ERROR / SPEC): msgNum: 0 - Images passed to present must be in layout VK_IMAGE_LAYOUT_PRESENT_SRC_KHR or VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR but is in VK_IMAGE_LAYOUT_UNDEFINED. The Vulkan spec states: Each element of pImageIndices must be the index of a presentable image acquired from the swapchain specified by the corresponding element of the pSwapchains array, and the presented image subresource must be in the VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout at the time the operation is executed on a VkDevice (https://github.com/KhronosGroup/Vulkan-Docs/search?q=VUID-VkPresentInfoKHR-pImageIndices-01296)
    Objects: 1
        [0] 0x556044c91630, type: 4, name: NULL
VUID-VkPresentInfoKHR-pImageIndices-01296(ERROR / SPEC): msgNum: 0 - Images passed to present must be in layout VK_IMAGE_LAYOUT_PRESENT_SRC_KHR or VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR but is in VK_IMAGE_LAYOUT_UNDEFINED. The Vulkan spec states: Each element of pImageIndices must be the index of a presentable image acquired from the swapchain specified by the corresponding element of the pSwapchains array, and the presented image subresource must be in the VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout at the time the operation is executed on a VkDevice (https://github.com/KhronosGroup/Vulkan-Docs/search?q=VUID-VkPresentInfoKHR-pImageIndices-01296)
    Objects: 1
        [0] 0x556044c91630, type: 4, name: NULL

Curiously, if I enable VK_INSTANCE_LAYERS=VK_LAYER_LUNARG_standard_validation I'm able to recreate this, however:

  1. These validation layer errors only appear if I build and run the examples with --release
  2. I don't run into a panic! whether or not the errors appear - my system seems to be able to keep running. That said, I have an integrated intel GPU (here is my vulkaninfo output) whereas @MacTuitui has a dedicated nvidia GPU, so their panic could very well still be related to these validation layer errors.

@MacTuitui also mentioned that they were able to successfully run the hello-triangle.rs example on the wgpu-rs master branch which implies the bug has something to do with the way nannou is using wgpu. I can also confirm that 4 out of 5 of the validation layer errors above do not show up when running the hello-triangle.rs example locally (with or without --release).

The nannou example that behaves most closely to the wgpu hello-triangle.rs example is the wgpu_triangle_raw_frame example. This example draws a single triangle directly to the swap chain image each time RedrawRequested occurs, as does the wgpu example. While nannou takes care of the adapter requesting, device requesting, swap chain creation and surface creation during window creation, I'm yet to find any obvious differences between this process and the wgpu hello-triangle example that could be causing these validation errors to show up.

Validation Errors

The first error is related to an attempt at creation of an empty descriptor set pool.

VUID-VkDescriptorPoolCreateInfo-poolSizeCount-arraylength(ERROR / SPEC): msgNum: 0 - vkCreateDescriptorPool: parameter pCreateInfo->poolSizeCount must be greater than 0. The Vulkan spec states: poolSizeCount must be greater than 0 (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-VkDescriptorPoolCreateInfo-poolSizeCount-arraylength)
    Objects: 1
        [0] 0, type: 0, name: NULL

The bug is already known and tracked at gfx-rs/wgpu-rs#149. While possibly related, I doubt this is the cause of the panic! as it occurs in both the nannou examples and the wgpu-rs hello-triangle example.

The second error reports a queue waiting on a semaphore that can never be signaled.

UNASSIGNED-CoreValidation-DrawState-QueueForwardProgress(ERROR / SPEC): msgNum: 0 - VkQueue 0x556044c91630[] is waiting on VkSemaphore 0x260000000026[] that has no way to be signaled.
    Objects: 1
        [0] 0, type: 6, name: NULL

I'm not sure how we could be causing this just yet, particularly as wgpu itself does not expose any of the device synchronisation primitive APIs directly. In the wgpu_triangle_raw_frame example we create a single device and queue during window creation and use this single queue to submit the command buffer built during the user's view function after it returns.

Finally, the last three errors (presumably one for each swap chain image) are the same and report that the swap chain image is not in the correct layout for presentation:

VUID-VkPresentInfoKHR-pImageIndices-01296(ERROR / SPEC): msgNum: 0 - Images passed to present must be in layout VK_IMAGE_LAYOUT_PRESENT_SRC_KHR or VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR but is in VK_IMAGE_LAYOUT_UNDEFINED. The Vulkan spec states: Each element of pImageIndices must be the index of a presentable image acquired from the swapchain specified by the corresponding element of the pSwapchains array, and the presented image subresource must be in the VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout at the time the operation is executed on a VkDevice (https://github.com/KhronosGroup/Vulkan-Docs/search?q=VUID-VkPresentInfoKHR-pImageIndices-01296)
    Objects: 1
        [0] 0x556044c91630, type: 4, name: NULL

Things to try

  • Check wgpu-rs and other wgpu repos for mention of the above validation errors.
  • Try stripping out parts of the example and nannou until we can isolate the source of the validation errors.

Recreation

You can recreate these errors by cloning my wgpu branch from #452 and running the following:

VK_INSTANCE_LAYERS=VK_LAYER_LUNARG_standard_validation cargo run --release --example wgpu_triangle_raw_frame

@kvark does anything obvious come to mind that we might be doing wrong here? Also, are you able to recreate the case where the validation errors show up only when built and run with --release?

@MacTuitui a couple more tests that might be useful in the meantime if you get a chance:

  • Run VK_INSTANCE_LAYERS=VK_LAYER_LUNARG_standard_validation cargo run --example wgpu_triangle_raw_frame. I'm curious if you still see these validation layer errors when run in debug mode or if they only show up with --release enabled for you too?
  • Run the other examples at the wgpu-rs repo. Is hello-triangle the only one that works? Or do all of the others work too? If any of those fail it might give us another hint as to what's going on.
@MacTuitui
Copy link
Contributor

Thanks for the time put into this!!!

On 18e07dd on your wgpu branch and on 814480 on wgpu-rs, with latest nvidia Arch drivers (440.59).

  • When I run the wgpu_triangle_raw_frame example in debug mode, the validation layer errors do not show up.
  • All the examples in wgpu-rs run fine, without any (well I haven't looked at what capture is supposed to do, but it runs). I also get one validation error when running hello-triangle that only shows up in release mode:
VUID-VkDescriptorPoolCreateInfo-poolSizeCount-arraylength(ERROR / SPEC): msgNum: 0 - vkCreateDescriptorPool: parameter pCreateInfo->poolSizeCount must be greater than 0. The Vulkan spec states: poolSizeCount must be greater than 0 (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-VkDescriptorPoolCreateInfo-poolSizeCount-arraylength)
    Objects: 1
        [0] 0, type: 0, name: NULL

Running the other examples in release mode does not raise any validation layer errors though.

If you need more tests, please let me know!

@kvark
Copy link

kvark commented Feb 28, 2020

Validation errors related to the semaphores are a sign that something is wrong with the swapchain. We've seen this done before: someone would get the next texture from the swapchain, but then drop it before submitting any work. The trick is that dropping a swapchain texture is a command to present it... but in this (early dropping) case no work has been done on it, and after presentation any submitted work shouldn't be using this any more. So it confuses wgpu and ends up confusing the driver. Some drivers let it pass and still work, others rightfully fail.

We've actually patched this with an explicit error in gfx-rs/wgpu#497, but it didn't make it into 0.4 release, which by now is quite far behind. If you ask, we'll be happy to back-port the check. Fixing this is also trivial - just make sure the swapchain texture is alive by the time the relevant work is submit()-ed.

I made a fix in mitchmindtree#1 . Two things are important there:

  1. we drop the swapchain frame after submission, not before it
  2. if we take the swapchain frame, we render at least something into it

Please let me know if you have any further issues :)

@mitchmindtree
Copy link
Member Author

Ahhh this seems obvious in retrospect! I think I still had the old vulkano model in mind where all images were reference counted and so I assumed they would live as long as they need to - this fix makes sense though. Thanks a lot kvark, I can confirm this clears those validation errors locally too :)

@MacTuitui thanks for doing those tests! I'll tentatively close this for now and consider it solved - feel free to re-open if you're still running into that panic!

Closed by mitchmindtree#1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants