mirror of
https://github.com/shadps4-emu/shadPS4.git
synced 2026-04-25 07:46:01 +03:00
[PR #1667] [MERGED] shader_recompilers: Improvements to SSA phi generation and lane instruction elimination #2232
Labels
No labels
Bloodborne
bug
contributor wanted
documentation
enhancement
frontend
good first issue
help wanted
linux
pull-request
question
release
verification progress
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/shadPS4#2232
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/shadps4-emu/shadPS4/pull/1667
Author: @raphaelthegreat
Created: 12/5/2024
Status: ✅ Merged
Merged: 12/5/2024
Merged by: @raphaelthegreat
Base:
main← Head:ssa-fix📝 Commits (7)
38ac002shader_recompiler: Add use tracking for Instsb5ac4c5ssa_rewrite: Recursively remove phisa86b855ssa_rewrite: Correct recursive trivial phi elimination6244d00ir: Improve read lane folding pass29b5156control_flow: Avoid adding unnecessary divergant blocks3c0ec51clang format7a82365externals: Update ext-boost📊 Changes
10 files changed (+174 additions, -75 deletions)
View changed files
📝
externals/ext-boost(+1 -1)📝
src/shader_recompiler/frontend/control_flow_graph.cpp(+29 -14)📝
src/shader_recompiler/ir/basic_block.cpp(+2 -0)📝
src/shader_recompiler/ir/microinstruction.cpp(+30 -18)📝
src/shader_recompiler/ir/passes/constant_propagation_pass.cpp(+59 -22)📝
src/shader_recompiler/ir/passes/lower_shared_mem_to_registers.cpp(+1 -1)📝
src/shader_recompiler/ir/passes/resource_tracking_pass.cpp(+1 -1)📝
src/shader_recompiler/ir/passes/ssa_rewrite_pass.cpp(+10 -9)📝
src/shader_recompiler/ir/value.h(+40 -8)📝
src/video_core/amdgpu/liverpool.cpp(+1 -1)📄 Description
This PR is mostly aimed at solving issues with fur simulation and lightvolume shaders in The Last Guardian. No bugs are expected to be fixed in other games but some regression testing would be appreciated, especially in that game since the latter pass was originally made to fix it.
Both of these shaders had issues with lingering ReadLane instructions that couldn't be eliminated by the simple pass because of phi nodes being in the way. Lane instructions on AMD hw allow the broadcasting of an SGPR to a specific VGPR or the opposite, the scalarization of a specific VGPR. The compiler seeks to eliminate them for two main reasons, firstly is that they are most often the byproduct of compiler (possibly trying to conserve SGPR space) and secondly is that their proper emulation on NVIDIA requires expensive shared memory setup which we would rather avoid.
Lane instructions however present an interesting challenge, as they expose the dimensionality of the GPU registers (a single VGPR can have up to warp size distinct copies). Because we don't want to have to represent that in our IR, we perform the same trick AMD did in their SPIRV WriteInvocationAMD where the input SSA value is passed to the instruction itself. This means that WriteLane instructions form chains where one follows another. When such a chain leads directly to a ReadLane chain, our work is simple and this has been handled fine for a while. This time however the WriteLane instructions were at the start of the shader and the ReadLane ones inside a nested loop, thus separated by a Phi.
But in fact those phi nodes were all trivial. A phi node is named trivial when it references itself or the same value some number of times, which means it can be substituted in all its users with the value it references. However ssa_rewrite pass was missing the required recursive elimination to perform this (See Algorithm 3, page 106 of the original SSA paper)
So the first commit of this PR adds a subset of the instruction use tracking implementation from @baggins183 we need and implements this elimination. In addition it solves a bug in TryRemoveTrivialPhi which prevented elimination of many trivial phis (missing Resolve call when comparing phi arg to itself). Shaders with many loops are now noticeably reduced in size/bloat (I've even seen some with 1000 less lines of GLSL code) and lane elimination pass now works with more cases than before.
However there are still cases where a phi cannot be eliminated, at the moment we only consider 2 of them. First case is the simplest which involves a phi node with distinct arguments, but following either one, leads to the same matching WriteLane
In this case we can directly replace the read lane with the value of found write lane and ignore the phi.
The second case is more involved but seen in the lightvolume shader. This shader maintains a loop counter inside a VGPR which is initialized at the start, and then loaded with read lane, incremented and stored back with write lane as seen in below pseudocode
In this case we have to search the chains of each phi argument, and insert a new phi just for this write and read lane pair, replacing the read lane instruction with the phi value. This must be done to preserve the original control flow of the shader. If multiple read write pairs were inside the loop, each would get a separate phi as each dimension of VGPR must be considered a separate value,
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.