mirror of
https://github.com/shadps4-emu/shadPS4.git
synced 2026-04-27 00:36:00 +03:00
[PR #434] [MERGED] control_flow_graph: Initial divergence handling #1560
Labels
No labels
Bloodborne
bug
contributor wanted
documentation
enhancement
frontend
good first issue
help wanted
linux
pull-request
question
release
verification progress
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/shadPS4#1560
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/shadps4-emu/shadPS4/pull/434
Author: @raphaelthegreat
Created: 8/14/2024
Status: ✅ Merged
Merged: 8/16/2024
Merged by: @raphaelthegreat
Base:
main← Head:diver📝 Commits (6)
ca674b4control_flow_graph: Initial divergence handling1ca8a5ccfg: Handle additional casee47a61dspirv: Handle tgid enable bits07169e5clang format7fb811aspirv: Use proper format0a31dd6translator: Add more instructions📊 Changes
14 files changed (+154 additions, -36 deletions)
View changed files
📝
src/core/libraries/network/net.cpp(+1 -1)📝
src/shader_recompiler/backend/spirv/emit_spirv_context_get_set.cpp(+5 -5)📝
src/shader_recompiler/backend/spirv/spirv_emit_context.cpp(+3 -1)📝
src/shader_recompiler/backend/spirv/spirv_emit_context.h(+3 -1)📝
src/shader_recompiler/frontend/control_flow_graph.cpp(+97 -23)📝
src/shader_recompiler/frontend/control_flow_graph.h(+19 -0)📝
src/shader_recompiler/frontend/translate/scalar_alu.cpp(+2 -0)📝
src/shader_recompiler/frontend/translate/translate.cpp(+9 -3)📝
src/shader_recompiler/frontend/translate/vector_memory.cpp(+5 -0)📝
src/shader_recompiler/runtime_info.h(+1 -0)📝
src/video_core/amdgpu/liverpool.h(+5 -0)📝
src/video_core/buffer_cache/buffer_cache.cpp(+1 -1)📝
src/video_core/buffer_cache/buffer_cache.h(+0 -1)📝
src/video_core/renderer_vulkan/vk_pipeline_cache.cpp(+3 -0)📄 Description
AMD hardware is inherently incapable of "diverging" (some threads in a warp to take a different code path than others). The GPU maintains a separate register named EXEC which is a bit-mask of all currently active threads. If the corresponding EXEC bit of a thread is zero, vector operations will be suppressed.
Normally this would be quite a pain to handle, however the compiler helps us quite a bit here. Since masking threads means that the GPU will still execute the instruction, but discard its result, the compiler will often add checks if EXEC = 0 (i.e all threads agree on a certain condition = condition is uniform) and use S_CBRANCH_EXECZ to skip over code blocks in more rapid succession. Our recompiler uses these branches as hints for reconstructing control flow in SPIR-V.
However that is often not enough. If the code that needs to be conditionally executed is small enough, where the branch instruction itself might be more costly, then the compiler will instead mask EXEC with condition, perform necessary operations and then restore it. This is currently not handled in the recompiler which causes a variety of bugs.
A simple example of this is the following code, a bitwise or needs to be conditionally executed so it wrapped in EXEC saving and restoring instructions.
This can happen with more than 1 instruction as well as shown here
Sometimes the compiler can also insert an instruction between an EXEC saving instruction and a branch. This is not strictly necessary but might be chosen by the compiler for more optimal instruction pipelining
Instructions inside theses EXEC "scopes" need to be wrapped in conditionals to be emulated properly. This is the purpose of this PR. The handling of this is done at the CFG level, where we can easily insert new basic blocks and have later stages auto-magically convert these into valid SPIR-V scopes. Between label emissions from branches and block linking we insert a new stage which attempts to annotate these scopes with additional labels so they get dedicated basic blocks. Most cases should be handled, however not all. There are various combinations of open and close scope instructions (some like S_ANDN2_B64 can even be used for both!)
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.