[PR #434] [MERGED] control_flow_graph: Initial divergence handling #1560

Closed
opened 2026-02-27 21:13:04 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/shadps4-emu/shadPS4/pull/434
Author: @raphaelthegreat
Created: 8/14/2024
Status: Merged
Merged: 8/16/2024
Merged by: @raphaelthegreat

Base: mainHead: diver


📝 Commits (6)

  • ca674b4 control_flow_graph: Initial divergence handling
  • 1ca8a5c cfg: Handle additional case
  • e47a61d spirv: Handle tgid enable bits
  • 07169e5 clang format
  • 7fb811a spirv: Use proper format
  • 0a31dd6 translator: Add more instructions

📊 Changes

14 files changed (+154 additions, -36 deletions)

View changed files

📝 src/core/libraries/network/net.cpp (+1 -1)
📝 src/shader_recompiler/backend/spirv/emit_spirv_context_get_set.cpp (+5 -5)
📝 src/shader_recompiler/backend/spirv/spirv_emit_context.cpp (+3 -1)
📝 src/shader_recompiler/backend/spirv/spirv_emit_context.h (+3 -1)
📝 src/shader_recompiler/frontend/control_flow_graph.cpp (+97 -23)
📝 src/shader_recompiler/frontend/control_flow_graph.h (+19 -0)
📝 src/shader_recompiler/frontend/translate/scalar_alu.cpp (+2 -0)
📝 src/shader_recompiler/frontend/translate/translate.cpp (+9 -3)
📝 src/shader_recompiler/frontend/translate/vector_memory.cpp (+5 -0)
📝 src/shader_recompiler/runtime_info.h (+1 -0)
📝 src/video_core/amdgpu/liverpool.h (+5 -0)
📝 src/video_core/buffer_cache/buffer_cache.cpp (+1 -1)
📝 src/video_core/buffer_cache/buffer_cache.h (+0 -1)
📝 src/video_core/renderer_vulkan/vk_pipeline_cache.cpp (+3 -0)

📄 Description

AMD hardware is inherently incapable of "diverging" (some threads in a warp to take a different code path than others). The GPU maintains a separate register named EXEC which is a bit-mask of all currently active threads. If the corresponding EXEC bit of a thread is zero, vector operations will be suppressed.

Normally this would be quite a pain to handle, however the compiler helps us quite a bit here. Since masking threads means that the GPU will still execute the instruction, but discard its result, the compiler will often add checks if EXEC = 0 (i.e all threads agree on a certain condition = condition is uniform) and use S_CBRANCH_EXECZ to skip over code blocks in more rapid succession. Our recompiler uses these branches as hints for reconstructing control flow in SPIR-V.

However that is often not enough. If the code that needs to be conditionally executed is small enough, where the branch instruction itself might be more costly, then the compiler will instead mask EXEC with condition, perform necessary operations and then restore it. This is currently not handled in the recompiler which causes a variety of bugs.

A simple example of this is the following code, a bitwise or needs to be conditionally executed so it wrapped in EXEC saving and restoring instructions.

/*0000000001a8*/ s_and_saveexec_b64 s[4:5], vcc
/*0000000001ac*/ v_or_b32        v3, 4, v3
/*0000000001b0*/ s_mov_b64       exec, s[4:5]

This can happen with more than 1 instruction as well as shown here

/*0000000004ec*/ s_and_saveexec_b64 vcc, vcc
/*0000000004f0*/ v_rcp_f32       v0, v17
/*0000000004f4*/ v_mad_f32       v2, v0, -v16, 1.0 clamp
/*0000000004fc*/ s_load_dwordx8  s[20:27], s[8:9], 0x0
/*000000000500*/ s_mov_b64       exec, vcc

Sometimes the compiler can also insert an instruction between an EXEC saving instruction and a branch. This is not strictly necessary but might be chosen by the compiler for more optimal instruction pipelining

/*00000000028c*/ s_and_saveexec_b64 s[2:3], vcc
/*000000000290*/ s_cbranch_execz .L700_0
...
.L700_0:
/*0000000002bc*/ s_andn2_b64     exec, s[2:3], exec
/*0000000002c0*/ v_madak_f32     v17, 2.0, v15, 0x3f800000
/*0000000002c8*/ s_cbranch_execz .L860_0

Instructions inside theses EXEC "scopes" need to be wrapped in conditionals to be emulated properly. This is the purpose of this PR. The handling of this is done at the CFG level, where we can easily insert new basic blocks and have later stages auto-magically convert these into valid SPIR-V scopes. Between label emissions from branches and block linking we insert a new stage which attempts to annotate these scopes with additional labels so they get dedicated basic blocks. Most cases should be handled, however not all. There are various combinations of open and close scope instructions (some like S_ANDN2_B64 can even be used for both!)


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/shadps4-emu/shadPS4/pull/434 **Author:** [@raphaelthegreat](https://github.com/raphaelthegreat) **Created:** 8/14/2024 **Status:** ✅ Merged **Merged:** 8/16/2024 **Merged by:** [@raphaelthegreat](https://github.com/raphaelthegreat) **Base:** `main` ← **Head:** `diver` --- ### 📝 Commits (6) - [`ca674b4`](https://github.com/shadps4-emu/shadPS4/commit/ca674b4ea93cc88adf814cca9940f0457468fbb1) control_flow_graph: Initial divergence handling - [`1ca8a5c`](https://github.com/shadps4-emu/shadPS4/commit/1ca8a5c3c96d33ee96ce9238a44caf7d5cbd404d) cfg: Handle additional case - [`e47a61d`](https://github.com/shadps4-emu/shadPS4/commit/e47a61dec9bc5fdfb7ee5d18fd33e355c1adfae7) spirv: Handle tgid enable bits - [`07169e5`](https://github.com/shadps4-emu/shadPS4/commit/07169e5fef0076fec44eef30c28ef3143911ccdd) clang format - [`7fb811a`](https://github.com/shadps4-emu/shadPS4/commit/7fb811ace503775408ae407f604e119073cb9d8e) spirv: Use proper format - [`0a31dd6`](https://github.com/shadps4-emu/shadPS4/commit/0a31dd6efb701ef3a0ebaeea8151333a4e6ba9bd) translator: Add more instructions ### 📊 Changes **14 files changed** (+154 additions, -36 deletions) <details> <summary>View changed files</summary> 📝 `src/core/libraries/network/net.cpp` (+1 -1) 📝 `src/shader_recompiler/backend/spirv/emit_spirv_context_get_set.cpp` (+5 -5) 📝 `src/shader_recompiler/backend/spirv/spirv_emit_context.cpp` (+3 -1) 📝 `src/shader_recompiler/backend/spirv/spirv_emit_context.h` (+3 -1) 📝 `src/shader_recompiler/frontend/control_flow_graph.cpp` (+97 -23) 📝 `src/shader_recompiler/frontend/control_flow_graph.h` (+19 -0) 📝 `src/shader_recompiler/frontend/translate/scalar_alu.cpp` (+2 -0) 📝 `src/shader_recompiler/frontend/translate/translate.cpp` (+9 -3) 📝 `src/shader_recompiler/frontend/translate/vector_memory.cpp` (+5 -0) 📝 `src/shader_recompiler/runtime_info.h` (+1 -0) 📝 `src/video_core/amdgpu/liverpool.h` (+5 -0) 📝 `src/video_core/buffer_cache/buffer_cache.cpp` (+1 -1) 📝 `src/video_core/buffer_cache/buffer_cache.h` (+0 -1) 📝 `src/video_core/renderer_vulkan/vk_pipeline_cache.cpp` (+3 -0) </details> ### 📄 Description AMD hardware is inherently incapable of "diverging" (some threads in a warp to take a different code path than others). The GPU maintains a separate register named EXEC which is a bit-mask of all currently active threads. If the corresponding EXEC bit of a thread is zero, vector operations will be suppressed. Normally this would be quite a pain to handle, however the compiler helps us quite a bit here. Since masking threads means that the GPU will still execute the instruction, but discard its result, the compiler will often add checks if EXEC = 0 (i.e all threads agree on a certain condition = condition is uniform) and use S_CBRANCH_EXECZ to skip over code blocks in more rapid succession. Our recompiler uses these branches as hints for reconstructing control flow in SPIR-V. However that is often not enough. If the code that needs to be conditionally executed is small enough, where the branch instruction itself might be more costly, then the compiler will instead mask EXEC with condition, perform necessary operations and then restore it. This is currently not handled in the recompiler which causes a variety of bugs. A simple example of this is the following code, a bitwise or needs to be conditionally executed so it wrapped in EXEC saving and restoring instructions. ``` /*0000000001a8*/ s_and_saveexec_b64 s[4:5], vcc /*0000000001ac*/ v_or_b32 v3, 4, v3 /*0000000001b0*/ s_mov_b64 exec, s[4:5] ``` This can happen with more than 1 instruction as well as shown here ``` /*0000000004ec*/ s_and_saveexec_b64 vcc, vcc /*0000000004f0*/ v_rcp_f32 v0, v17 /*0000000004f4*/ v_mad_f32 v2, v0, -v16, 1.0 clamp /*0000000004fc*/ s_load_dwordx8 s[20:27], s[8:9], 0x0 /*000000000500*/ s_mov_b64 exec, vcc ``` Sometimes the compiler can also insert an instruction between an EXEC saving instruction and a branch. This is not strictly necessary but might be chosen by the compiler for more optimal instruction pipelining ``` /*00000000028c*/ s_and_saveexec_b64 s[2:3], vcc /*000000000290*/ s_cbranch_execz .L700_0 ... .L700_0: /*0000000002bc*/ s_andn2_b64 exec, s[2:3], exec /*0000000002c0*/ v_madak_f32 v17, 2.0, v15, 0x3f800000 /*0000000002c8*/ s_cbranch_execz .L860_0 ``` Instructions inside theses EXEC "scopes" need to be wrapped in conditionals to be emulated properly. This is the purpose of this PR. The handling of this is done at the CFG level, where we can easily insert new basic blocks and have later stages auto-magically convert these into valid SPIR-V scopes. Between label emissions from branches and block linking we insert a new stage which attempts to annotate these scopes with additional labels so they get dedicated basic blocks. Most cases should be handled, however not all. There are various combinations of open and close scope instructions (some like S_ANDN2_B64 can even be used for both!) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-02-27 21:13:04 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/shadPS4#1560
No description provided.