[PR #1667] [MERGED] shader_recompilers: Improvements to SSA phi generation and lane instruction elimination #2232

Closed
opened 2026-02-27 21:15:41 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/shadps4-emu/shadPS4/pull/1667
Author: @raphaelthegreat
Created: 12/5/2024
Status: Merged
Merged: 12/5/2024
Merged by: @raphaelthegreat

Base: mainHead: ssa-fix


📝 Commits (7)

  • 38ac002 shader_recompiler: Add use tracking for Insts
  • b5ac4c5 ssa_rewrite: Recursively remove phis
  • a86b855 ssa_rewrite: Correct recursive trivial phi elimination
  • 6244d00 ir: Improve read lane folding pass
  • 29b5156 control_flow: Avoid adding unnecessary divergant blocks
  • 3c0ec51 clang format
  • 7a82365 externals: Update ext-boost

📊 Changes

10 files changed (+174 additions, -75 deletions)

View changed files

📝 externals/ext-boost (+1 -1)
📝 src/shader_recompiler/frontend/control_flow_graph.cpp (+29 -14)
📝 src/shader_recompiler/ir/basic_block.cpp (+2 -0)
📝 src/shader_recompiler/ir/microinstruction.cpp (+30 -18)
📝 src/shader_recompiler/ir/passes/constant_propagation_pass.cpp (+59 -22)
📝 src/shader_recompiler/ir/passes/lower_shared_mem_to_registers.cpp (+1 -1)
📝 src/shader_recompiler/ir/passes/resource_tracking_pass.cpp (+1 -1)
📝 src/shader_recompiler/ir/passes/ssa_rewrite_pass.cpp (+10 -9)
📝 src/shader_recompiler/ir/value.h (+40 -8)
📝 src/video_core/amdgpu/liverpool.cpp (+1 -1)

📄 Description

This PR is mostly aimed at solving issues with fur simulation and lightvolume shaders in The Last Guardian. No bugs are expected to be fixed in other games but some regression testing would be appreciated, especially in that game since the latter pass was originally made to fix it.

Both of these shaders had issues with lingering ReadLane instructions that couldn't be eliminated by the simple pass because of phi nodes being in the way. Lane instructions on AMD hw allow the broadcasting of an SGPR to a specific VGPR or the opposite, the scalarization of a specific VGPR. The compiler seeks to eliminate them for two main reasons, firstly is that they are most often the byproduct of compiler (possibly trying to conserve SGPR space) and secondly is that their proper emulation on NVIDIA requires expensive shared memory setup which we would rather avoid.

Lane instructions however present an interesting challenge, as they expose the dimensionality of the GPU registers (a single VGPR can have up to warp size distinct copies). Because we don't want to have to represent that in our IR, we perform the same trick AMD did in their SPIRV WriteInvocationAMD where the input SSA value is passed to the instruction itself. This means that WriteLane instructions form chains where one follows another. When such a chain leads directly to a ReadLane chain, our work is simple and this has been handled fine for a while. This time however the WriteLane instructions were at the start of the shader and the ReadLane ones inside a nested loop, thus separated by a Phi.

But in fact those phi nodes were all trivial. A phi node is named trivial when it references itself or the same value some number of times, which means it can be substituted in all its users with the value it references. However ssa_rewrite pass was missing the required recursive elimination to perform this (See Algorithm 3, page 106 of the original SSA paper)

So the first commit of this PR adds a subset of the instruction use tracking implementation from @baggins183 we need and implements this elimination. In addition it solves a bug in TryRemoveTrivialPhi which prevented elimination of many trivial phis (missing Resolve call when comparing phi arg to itself). Shaders with many loops are now noticeably reduced in size/bloat (I've even seen some with 1000 less lines of GLSL code) and lane elimination pass now works with more cases than before.

However there are still cases where a phi cannot be eliminated, at the moment we only consider 2 of them. First case is the simplest which involves a phi node with distinct arguments, but following either one, leads to the same matching WriteLane

[000000012dff2948] %186   = WriteLane %170, %185, #30 (uses: 1)
[000000012dff2bc8] %187   = WriteLane %186, %184, #29 (uses: 2)
[000000012dff2e48] %188   = UGreaterThan %182, %181 (uses: 35)
[000000012dff3348] %189   = ConditionRef %188 (uses: 0)

Block $1
[000000012dff35c8] %190   = WriteLane %187, %183, #28 (uses: 1)
[000000012dff3848] %191   = WriteLane %190, %182, #27 (uses: 1)

Block $2
[000000012e242388] %192   = Phi [ %191, {Block $1} ], [ %187, {Block $0} ] (uses: 2)
[000000012dff3c08] %193   = ConditionRef %188 (uses: 0)

Block $3
[000000012dff3de8] %194   = ReadLane %192, #30 (uses: 1)

In this case we can directly replace the read lane with the value of found write lane and ignore the phi.
The second case is more involved but seen in the lightvolume shader. This shader maintains a loop counter inside a VGPR which is initialized at the start, and then loaded with read lane, incremented and stored back with write lane as seen in below pseudocode

v26[27] = 0;
for (;;) {
    s14 = readlane(v26, 27);
    s15 = s14 + 1;
    writelane(v26, 27, s15);
    if (!(s15 < 0x80)) {
      break;
    }  
}

In this case we have to search the chains of each phi argument, and insert a new phi just for this write and read lane pair, replacing the read lane instruction with the phi value. This must be done to preserve the original control flow of the shader. If multiple read write pairs were inside the loop, each would get a separate phi as each dimension of VGPR must be considered a separate value,


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/shadps4-emu/shadPS4/pull/1667 **Author:** [@raphaelthegreat](https://github.com/raphaelthegreat) **Created:** 12/5/2024 **Status:** ✅ Merged **Merged:** 12/5/2024 **Merged by:** [@raphaelthegreat](https://github.com/raphaelthegreat) **Base:** `main` ← **Head:** `ssa-fix` --- ### 📝 Commits (7) - [`38ac002`](https://github.com/shadps4-emu/shadPS4/commit/38ac0024bfdf6812e949c416117e36877247abc8) shader_recompiler: Add use tracking for Insts - [`b5ac4c5`](https://github.com/shadps4-emu/shadPS4/commit/b5ac4c5569008776427fb7a5d9ca8c37bd83325a) ssa_rewrite: Recursively remove phis - [`a86b855`](https://github.com/shadps4-emu/shadPS4/commit/a86b855b3852cae090ec0b8e84d4880ab4e327f0) ssa_rewrite: Correct recursive trivial phi elimination - [`6244d00`](https://github.com/shadps4-emu/shadPS4/commit/6244d00795fa9833eb54a6dd9657c358ef6c1004) ir: Improve read lane folding pass - [`29b5156`](https://github.com/shadps4-emu/shadPS4/commit/29b51567ea013466d0405b270953ced1bf502721) control_flow: Avoid adding unnecessary divergant blocks - [`3c0ec51`](https://github.com/shadps4-emu/shadPS4/commit/3c0ec51fa84e68d5a111d3428964a6d631e14bd3) clang format - [`7a82365`](https://github.com/shadps4-emu/shadPS4/commit/7a82365325192c18919a8cade7e0eb6739af2af5) externals: Update ext-boost ### 📊 Changes **10 files changed** (+174 additions, -75 deletions) <details> <summary>View changed files</summary> 📝 `externals/ext-boost` (+1 -1) 📝 `src/shader_recompiler/frontend/control_flow_graph.cpp` (+29 -14) 📝 `src/shader_recompiler/ir/basic_block.cpp` (+2 -0) 📝 `src/shader_recompiler/ir/microinstruction.cpp` (+30 -18) 📝 `src/shader_recompiler/ir/passes/constant_propagation_pass.cpp` (+59 -22) 📝 `src/shader_recompiler/ir/passes/lower_shared_mem_to_registers.cpp` (+1 -1) 📝 `src/shader_recompiler/ir/passes/resource_tracking_pass.cpp` (+1 -1) 📝 `src/shader_recompiler/ir/passes/ssa_rewrite_pass.cpp` (+10 -9) 📝 `src/shader_recompiler/ir/value.h` (+40 -8) 📝 `src/video_core/amdgpu/liverpool.cpp` (+1 -1) </details> ### 📄 Description This PR is mostly aimed at solving issues with fur simulation and lightvolume shaders in The Last Guardian. No bugs are expected to be fixed in other games but some regression testing would be appreciated, especially in _that game_ since the latter pass was originally made to fix it. Both of these shaders had issues with lingering ReadLane instructions that couldn't be eliminated by the simple pass because of phi nodes being in the way. Lane instructions on AMD hw allow the broadcasting of an SGPR to a specific VGPR or the opposite, the scalarization of a specific VGPR. The compiler seeks to eliminate them for two main reasons, firstly is that they are most often the byproduct of compiler (possibly trying to conserve SGPR space) and secondly is that their proper emulation on NVIDIA requires expensive shared memory setup which we would rather avoid. Lane instructions however present an interesting challenge, as they expose the dimensionality of the GPU registers (a single VGPR can have up to warp size distinct copies). Because we don't want to have to represent that in our IR, we perform the same trick AMD did in their SPIRV [WriteInvocationAMD](https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/AMD/SPV_AMD_shader_ballot.asciidoc#writeinvocationamd) where the input SSA value is passed to the instruction itself. This means that WriteLane instructions form chains where one follows another. When such a chain leads directly to a ReadLane chain, our work is simple and this has been handled fine for a while. This time however the WriteLane instructions were at the start of the shader and the ReadLane ones inside a nested loop, thus separated by a Phi. But in fact those phi nodes were all _trivial_. A phi node is named trivial when it references itself or the same value some number of times, which means it can be substituted in all its users with the value it references. However ssa_rewrite pass was missing the required recursive elimination to perform this (See Algorithm 3, page 106 of the original SSA [paper](https://link.springer.com/chapter/10.1007/978-3-642-37051-9_6)) So the first commit of this PR adds a subset of the instruction use tracking implementation from @baggins183 we need and implements this elimination. In addition it solves a bug in TryRemoveTrivialPhi which prevented elimination of many trivial phis (missing Resolve call when comparing phi arg to itself). Shaders with many loops are now noticeably reduced in size/bloat (I've even seen some with 1000 less lines of GLSL code) and lane elimination pass now works with more cases than before. However there are still cases where a phi cannot be eliminated, at the moment we only consider 2 of them. First case is the simplest which involves a phi node with distinct arguments, but following either one, leads to the same matching WriteLane ``` [000000012dff2948] %186 = WriteLane %170, %185, #30 (uses: 1) [000000012dff2bc8] %187 = WriteLane %186, %184, #29 (uses: 2) [000000012dff2e48] %188 = UGreaterThan %182, %181 (uses: 35) [000000012dff3348] %189 = ConditionRef %188 (uses: 0) Block $1 [000000012dff35c8] %190 = WriteLane %187, %183, #28 (uses: 1) [000000012dff3848] %191 = WriteLane %190, %182, #27 (uses: 1) Block $2 [000000012e242388] %192 = Phi [ %191, {Block $1} ], [ %187, {Block $0} ] (uses: 2) [000000012dff3c08] %193 = ConditionRef %188 (uses: 0) Block $3 [000000012dff3de8] %194 = ReadLane %192, #30 (uses: 1) ``` In this case we can directly replace the read lane with the value of found write lane and ignore the phi. The second case is more involved but seen in the lightvolume shader. This shader maintains a loop counter inside a VGPR which is initialized at the start, and then loaded with read lane, incremented and stored back with write lane as seen in below pseudocode ```glsl v26[27] = 0; for (;;) { s14 = readlane(v26, 27); s15 = s14 + 1; writelane(v26, 27, s15); if (!(s15 < 0x80)) { break; } } ``` In this case we have to search the chains of each phi argument, and insert a new phi just for this write and read lane pair, replacing the read lane instruction with the phi value. This must be done to preserve the original control flow of the shader. If multiple read write pairs were inside the loop, each would get a separate phi as each dimension of VGPR must be considered a separate value, --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-02-27 21:15:41 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/shadPS4#2232
No description provided.