[PR #3159] [MERGED] shader_recompiler: Optimize general case of buffer addressing #3271

Closed
opened 2026-02-27 22:03:04 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/shadps4-emu/shadPS4/pull/3159
Author: @raphaelthegreat
Created: 6/25/2025
Status: Merged
Merged: 6/26/2025
Merged by: @georgemoralis

Base: mainHead: buffer-opts


📝 Commits (5)

  • a0c1542 shader_recompiler: Simplify dma types
  • 6fa5f51 shader_recompiler: Perform address shift on IR level
  • e741c3c shader_recompiler: Optimize common buffer access pattern
  • 002aeba emit_spirv: Use 32-bit integer ops for fault buffer
  • 808fe6a resource_tracking_pass: Fix texel buffer shift

📊 Changes

12 files changed (+272 additions, -234 deletions)

View changed files

📝 src/shader_recompiler/backend/spirv/emit_spirv.cpp (+1 -1)
📝 src/shader_recompiler/backend/spirv/emit_spirv_atomic.cpp (+26 -26)
📝 src/shader_recompiler/backend/spirv/emit_spirv_context_get_set.cpp (+53 -58)
📝 src/shader_recompiler/backend/spirv/spirv_emit_context.cpp (+81 -76)
📝 src/shader_recompiler/backend/spirv/spirv_emit_context.h (+34 -56)
📝 src/shader_recompiler/frontend/translate/scalar_alu.cpp (+0 -1)
📝 src/shader_recompiler/info.h (+1 -1)
📝 src/shader_recompiler/ir/passes/resource_tracking_pass.cpp (+63 -1)
📝 src/shader_recompiler/ir/passes/shader_info_collection_pass.cpp (+8 -7)
📝 src/shader_recompiler/profile.h (+1 -1)
📝 src/video_core/renderer_vulkan/vk_pipeline_cache.cpp (+1 -0)
📝 src/video_core/renderer_vulkan/vk_rasterizer.cpp (+3 -6)

📄 Description

Buffer instructions have always had a strange API where regardless of their element size, they would receive a byte address from IR, add the offset in bytes and shift the address to get the array index. This makes buffer reads harder to read and has additional overhead in the form of a shift operation. For example:

uint _118 = (((_113 * 64u) + 32u) >> 2u) + buf0_dword_off;
uint _120 = ssbo_1_1.data[_118];

With this PR buffer instruction now directly accept the array index of the buffer and add the correctly sized offset. By doing the shift in IR the most common buffer addressing mode can be detected and optimize away the shift by directly shifting the constants instead.

uint _116 = ((_113 * 16u) + 8u) + buf0_dword_off;
uint _118 = ssbo_1_1.data[_116];

On platforms where minStorageBufferOffsetAlignment = 4 (AMD, Intel) we can go a step further and eliminate the buffer offset addition, saving another ALU operation per access

uint _87 = (_83 * 16u) + 8u;
uint _89 = ssbo_1_1.data[_87];

This might have some impact on GPU performance especially if many shaders do many buffer accesses, as the saved ALU ops can add up


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/shadps4-emu/shadPS4/pull/3159 **Author:** [@raphaelthegreat](https://github.com/raphaelthegreat) **Created:** 6/25/2025 **Status:** ✅ Merged **Merged:** 6/26/2025 **Merged by:** [@georgemoralis](https://github.com/georgemoralis) **Base:** `main` ← **Head:** `buffer-opts` --- ### 📝 Commits (5) - [`a0c1542`](https://github.com/shadps4-emu/shadPS4/commit/a0c1542691f9c05e1edafa5d4b5223308c6357dd) shader_recompiler: Simplify dma types - [`6fa5f51`](https://github.com/shadps4-emu/shadPS4/commit/6fa5f51702e9ec98cfd85cc5ed4df59a09bf13ad) shader_recompiler: Perform address shift on IR level - [`e741c3c`](https://github.com/shadps4-emu/shadPS4/commit/e741c3cc1067073dbaa743e2e1cf64549c083b66) shader_recompiler: Optimize common buffer access pattern - [`002aeba`](https://github.com/shadps4-emu/shadPS4/commit/002aeba9e41aad9916357a85ca70ff396535ce25) emit_spirv: Use 32-bit integer ops for fault buffer - [`808fe6a`](https://github.com/shadps4-emu/shadPS4/commit/808fe6ad66d88637c8ad325932029c3c14d92d7a) resource_tracking_pass: Fix texel buffer shift ### 📊 Changes **12 files changed** (+272 additions, -234 deletions) <details> <summary>View changed files</summary> 📝 `src/shader_recompiler/backend/spirv/emit_spirv.cpp` (+1 -1) 📝 `src/shader_recompiler/backend/spirv/emit_spirv_atomic.cpp` (+26 -26) 📝 `src/shader_recompiler/backend/spirv/emit_spirv_context_get_set.cpp` (+53 -58) 📝 `src/shader_recompiler/backend/spirv/spirv_emit_context.cpp` (+81 -76) 📝 `src/shader_recompiler/backend/spirv/spirv_emit_context.h` (+34 -56) 📝 `src/shader_recompiler/frontend/translate/scalar_alu.cpp` (+0 -1) 📝 `src/shader_recompiler/info.h` (+1 -1) 📝 `src/shader_recompiler/ir/passes/resource_tracking_pass.cpp` (+63 -1) 📝 `src/shader_recompiler/ir/passes/shader_info_collection_pass.cpp` (+8 -7) 📝 `src/shader_recompiler/profile.h` (+1 -1) 📝 `src/video_core/renderer_vulkan/vk_pipeline_cache.cpp` (+1 -0) 📝 `src/video_core/renderer_vulkan/vk_rasterizer.cpp` (+3 -6) </details> ### 📄 Description Buffer instructions have always had a strange API where regardless of their element size, they would receive a byte address from IR, add the offset in bytes and shift the address to get the array index. This makes buffer reads harder to read and has additional overhead in the form of a shift operation. For example: ```glsl uint _118 = (((_113 * 64u) + 32u) >> 2u) + buf0_dword_off; uint _120 = ssbo_1_1.data[_118]; ``` With this PR buffer instruction now directly accept the array index of the buffer and add the correctly sized offset. By doing the shift in IR the most common buffer addressing mode can be detected and optimize away the shift by directly shifting the constants instead. ```glsl uint _116 = ((_113 * 16u) + 8u) + buf0_dword_off; uint _118 = ssbo_1_1.data[_116]; ``` On platforms where minStorageBufferOffsetAlignment = 4 (AMD, Intel) we can go a step further and eliminate the buffer offset addition, saving another ALU operation per access ```glsl uint _87 = (_83 * 16u) + 8u; uint _89 = ssbo_1_1.data[_87]; ``` This might have some impact on GPU performance especially if many shaders do many buffer accesses, as the saved ALU ops can add up --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-02-27 22:03:04 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/shadPS4#3271
No description provided.