[PR #3159] [MERGED] shader_recompiler: Optimize general case of buffer addressing #3271

New issue

Closed

opened 2026-02-27 22:03:04 +03:00 by kerem · 0 comments

kerem commented

2026-02-27 22:03:04 +03:00

Owner

📋 Pull Request Information

Original PR: https://github.com/shadps4-emu/shadPS4/pull/3159
Author: @raphaelthegreat
Created: 6/25/2025
Status: ✅ Merged
Merged: 6/26/2025
Merged by: @georgemoralis

Base: main ← Head: buffer-opts

📝 Commits (5)

a0c1542 shader_recompiler: Simplify dma types
6fa5f51 shader_recompiler: Perform address shift on IR level
e741c3c shader_recompiler: Optimize common buffer access pattern
002aeba emit_spirv: Use 32-bit integer ops for fault buffer
808fe6a resource_tracking_pass: Fix texel buffer shift

📊 Changes

12 files changed (+272 additions, -234 deletions)

View changed files

📝 src/shader_recompiler/backend/spirv/emit_spirv.cpp (+1 -1)
📝 src/shader_recompiler/backend/spirv/emit_spirv_atomic.cpp (+26 -26)
📝 src/shader_recompiler/backend/spirv/emit_spirv_context_get_set.cpp (+53 -58)
📝 src/shader_recompiler/backend/spirv/spirv_emit_context.cpp (+81 -76)
📝 src/shader_recompiler/backend/spirv/spirv_emit_context.h (+34 -56)
📝 src/shader_recompiler/frontend/translate/scalar_alu.cpp (+0 -1)
📝 src/shader_recompiler/info.h (+1 -1)
📝 src/shader_recompiler/ir/passes/resource_tracking_pass.cpp (+63 -1)
📝 src/shader_recompiler/ir/passes/shader_info_collection_pass.cpp (+8 -7)
📝 src/shader_recompiler/profile.h (+1 -1)
📝 src/video_core/renderer_vulkan/vk_pipeline_cache.cpp (+1 -0)
📝 src/video_core/renderer_vulkan/vk_rasterizer.cpp (+3 -6)

📄 Description

Buffer instructions have always had a strange API where regardless of their element size, they would receive a byte address from IR, add the offset in bytes and shift the address to get the array index. This makes buffer reads harder to read and has additional overhead in the form of a shift operation. For example:

uint _118 = (((_113 * 64u) + 32u) >> 2u) + buf0_dword_off;
uint _120 = ssbo_1_1.data[_118];

With this PR buffer instruction now directly accept the array index of the buffer and add the correctly sized offset. By doing the shift in IR the most common buffer addressing mode can be detected and optimize away the shift by directly shifting the constants instead.

uint _116 = ((_113 * 16u) + 8u) + buf0_dword_off;
uint _118 = ssbo_1_1.data[_116];

On platforms where minStorageBufferOffsetAlignment = 4 (AMD, Intel) we can go a step further and eliminate the buffer offset addition, saving another ALU operation per access

uint _87 = (_83 * 16u) + 8u;
uint _89 = ssbo_1_1.data[_87];

This might have some impact on GPU performance especially if many shaders do many buffer accesses, as the saved ALU ops can add up

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/shadps4-emu/shadPS4/pull/3159 **Author:** [@raphaelthegreat](https://github.com/raphaelthegreat) **Created:** 6/25/2025 **Status:** ✅ Merged **Merged:** 6/26/2025 **Merged by:** [@georgemoralis](https://github.com/georgemoralis) **Base:** `main` ← **Head:** `buffer-opts` --- ### 📝 Commits (5) - [`a0c1542`](https://github.com/shadps4-emu/shadPS4/commit/a0c1542691f9c05e1edafa5d4b5223308c6357dd) shader_recompiler: Simplify dma types - [`6fa5f51`](https://github.com/shadps4-emu/shadPS4/commit/6fa5f51702e9ec98cfd85cc5ed4df59a09bf13ad) shader_recompiler: Perform address shift on IR level - [`e741c3c`](https://github.com/shadps4-emu/shadPS4/commit/e741c3cc1067073dbaa743e2e1cf64549c083b66) shader_recompiler: Optimize common buffer access pattern - [`002aeba`](https://github.com/shadps4-emu/shadPS4/commit/002aeba9e41aad9916357a85ca70ff396535ce25) emit_spirv: Use 32-bit integer ops for fault buffer - [`808fe6a`](https://github.com/shadps4-emu/shadPS4/commit/808fe6ad66d88637c8ad325932029c3c14d92d7a) resource_tracking_pass: Fix texel buffer shift ### 📊 Changes **12 files changed** (+272 additions, -234 deletions) <details> <summary>View changed files</summary> 📝 `src/shader_recompiler/backend/spirv/emit_spirv.cpp` (+1 -1) 📝 `src/shader_recompiler/backend/spirv/emit_spirv_atomic.cpp` (+26 -26) 📝 `src/shader_recompiler/backend/spirv/emit_spirv_context_get_set.cpp` (+53 -58) 📝 `src/shader_recompiler/backend/spirv/spirv_emit_context.cpp` (+81 -76) 📝 `src/shader_recompiler/backend/spirv/spirv_emit_context.h` (+34 -56) 📝 `src/shader_recompiler/frontend/translate/scalar_alu.cpp` (+0 -1) 📝 `src/shader_recompiler/info.h` (+1 -1) 📝 `src/shader_recompiler/ir/passes/resource_tracking_pass.cpp` (+63 -1) 📝 `src/shader_recompiler/ir/passes/shader_info_collection_pass.cpp` (+8 -7) 📝 `src/shader_recompiler/profile.h` (+1 -1) 📝 `src/video_core/renderer_vulkan/vk_pipeline_cache.cpp` (+1 -0) 📝 `src/video_core/renderer_vulkan/vk_rasterizer.cpp` (+3 -6) </details> ### 📄 Description Buffer instructions have always had a strange API where regardless of their element size, they would receive a byte address from IR, add the offset in bytes and shift the address to get the array index. This makes buffer reads harder to read and has additional overhead in the form of a shift operation. For example: ```glsl uint _118 = (((_113 * 64u) + 32u) >> 2u) + buf0_dword_off; uint _120 = ssbo_1_1.data[_118]; ``` With this PR buffer instruction now directly accept the array index of the buffer and add the correctly sized offset. By doing the shift in IR the most common buffer addressing mode can be detected and optimize away the shift by directly shifting the constants instead. ```glsl uint _116 = ((_113 * 16u) + 8u) + buf0_dword_off; uint _118 = ssbo_1_1.data[_116]; ``` On platforms where minStorageBufferOffsetAlignment = 4 (AMD, Intel) we can go a step further and eliminate the buffer offset addition, saving another ALU operation per access ```glsl uint _87 = (_83 * 16u) + 8u; uint _89 = ssbo_1_1.data[_87]; ``` This might have some impact on GPU performance especially if many shaders do many buffer accesses, as the saved ALU ops can add up --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>