[PR #3374] [MERGED] video_core: Rework tile manager #3424

Closed
opened 2026-02-27 22:03:38 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/shadps4-emu/shadPS4/pull/3374
Author: @raphaelthegreat
Created: 8/3/2025
Status: Merged
Merged: 8/8/2025
Merged by: @squidbus

Base: mainHead: tiling-rework


📝 Commits (10+)

  • a8b5b06 video_core: Rework detiling
  • a710487 video_core: Support tiling and macrotile detiling
  • 1be60e7 clang format
  • 6e55635 image_info: Cleanups
  • 336e0b9 resource: Revert some changes
  • 74484dc texture_cache: Fix small error
  • 52058c9 image_info: Set depth flag on depth promote
  • 6e0a894 buffer_cache: Remove level check
  • 7324929 tile_manager: Handle case of staging buffer causing flush
  • 3a14843 image_info: Add 2D thick array mode

📊 Changes

30 files changed (+2001 additions, -799 deletions)

View changed files

📝 CMakeLists.txt (+2 -0)
📝 src/core/devtools/widget/reg_popup.cpp (+1 -1)
📝 src/shader_recompiler/info.h (+2 -2)
📝 src/shader_recompiler/specialization.h (+1 -1)
📝 src/video_core/amdgpu/liverpool.h (+14 -6)
📝 src/video_core/amdgpu/pixel_format.cpp (+61 -10)
📝 src/video_core/amdgpu/pixel_format.h (+7 -2)
📝 src/video_core/amdgpu/resource.h (+39 -64)
src/video_core/amdgpu/tiling.cpp (+554 -0)
src/video_core/amdgpu/tiling.h (+149 -0)
📝 src/video_core/buffer_cache/buffer_cache.cpp (+102 -134)
📝 src/video_core/buffer_cache/buffer_cache.h (+6 -7)
📝 src/video_core/host_shaders/CMakeLists.txt (+1 -0)
src/video_core/host_shaders/tiling.comp (+444 -0)
📝 src/video_core/renderer_vulkan/vk_instance.cpp (+7 -0)
📝 src/video_core/renderer_vulkan/vk_instance.h (+6 -0)
📝 src/video_core/renderer_vulkan/vk_presenter.h (+1 -0)
📝 src/video_core/renderer_vulkan/vk_rasterizer.cpp (+75 -17)
📝 src/video_core/renderer_vulkan/vk_rasterizer.h (+1 -0)
📝 src/video_core/texture_cache/image.cpp (+70 -55)

...and 10 more files

📄 Description

This started with the goal of supporting Texture_Macrotiled detiling and turned into a rework of the tiling system

General changes

  • Fleshed out the tiling mode enum and added more tiling properties from GB_TILE_MODE such as pipe config, array mode, micro tile mode
  • MipInfo structure is now consistent in terms of padding. On main the pitch is padded by the ImageSize functions but the height was not, now it also returns the padded height.
  • Replaced some vulkan types in ImageInfo with appropriate AmdGpu types, helps with decoupling it from the vulkan backend
  • Properly query tile mode for depth buffer as well, on main all depth buffer image infos had been left with linear tiling
  • Property queries from image info now consistently use the props member instead of functions that decide on the pixel format

Tile manager changes

  • Replaced the numerous detile shaders with a single unified tiling shader based on r800 addrlib, which supports all tile modes of the guest hardware. Most of the parameters of the shader are pre-processor macros to avoid overhead. In the most common case of 1D thin tiling it produces the following code which is basically generating the LUT values at runtime and should not be much slower
_260.out_data[_402] = _265.in_data[((((_402 / (_404 * _405)) * ((((_240 * _405) * 32u) + 7u) / 8u)) + ((((_229 / 8u) * (_240 / 8u)) + (_223 / 8u)) * 256u)) + (((((((bitfieldExtract(_223, 0, 1) | (bitfieldExtract(_229, 0, 1) << uint(1))) | (bitfieldExtract(_223, 1, 1) << uint(2))) | (bitfieldExtract(_229, 1, 1) << uint(3))) | (bitfieldExtract(_223, 2, 1) << uint(4))) | (bitfieldExtract(_229, 2, 1) << uint(5))) * 32u) / 8u)) / 4u];
  • Detilers are now consistent in terms of padding. On main, 1D thin detilers would produce padded output, while volume detilers would produce packed output, because the latter would index the output buffer instead of input (side note, but on main the volume detiler is named macro which seems incorrect from my research, as volume textures are 1D thick not 2D tiled). Padded output was chosen because it is more convenient, as it allows to offset the linear surface using the mip info structures generated during image size calculation and removes the need for address bounds checking in shader (number of texels processed is always multiple of 64)

  • Support for tiling has been added as well with a simple macro parameter. This is used for macrotile handling (read below)

  • Textures with bpp < 32 now have better GPU occupancy during detiling as it can use the full subgroup length on AMD GPUs

Macrotile support changes

Since the original goal was this, support for detiling macrotile images is also added and I've verified it works from Knack, which uses Texture_Macrotile image for its fonts. However this isn't as simple because this path is also used by image copies with compute and it implicitly relies on the detiling failing (SynchronizeBufferFromImage copies the linear image data to buffers without any tiling). This is where tile shader comes in, now the buffer cache will tile the image data before copying it to the buffer, so texture cache can allow detiling of macrotile images. This adds some overhead so before merging I will add a fast path that detects image copies and uses vkCmdCopyImage but it's also more accurate to real hardware.

Note: This still doesn't handle degrading of macro tile mode to micro tile for mipmaps, but it should be easier to implement now

This fixes the font in Knack

main PR
knack_bad knack_good

TODO

  • Add fast path for image copies
  • Hookup bank swizzle to tile manager

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/shadps4-emu/shadPS4/pull/3374 **Author:** [@raphaelthegreat](https://github.com/raphaelthegreat) **Created:** 8/3/2025 **Status:** ✅ Merged **Merged:** 8/8/2025 **Merged by:** [@squidbus](https://github.com/squidbus) **Base:** `main` ← **Head:** `tiling-rework` --- ### 📝 Commits (10+) - [`a8b5b06`](https://github.com/shadps4-emu/shadPS4/commit/a8b5b06e0e27f8d0cab2c4b305b4107224f83354) video_core: Rework detiling - [`a710487`](https://github.com/shadps4-emu/shadPS4/commit/a710487e8023052f328d680538423d521ba6f646) video_core: Support tiling and macrotile detiling - [`1be60e7`](https://github.com/shadps4-emu/shadPS4/commit/1be60e7119a2f9ffd299da3d928c5b7aad3a5f11) clang format - [`6e55635`](https://github.com/shadps4-emu/shadPS4/commit/6e5563538ff921ced562a7cbc6dc999a9652a38e) image_info: Cleanups - [`336e0b9`](https://github.com/shadps4-emu/shadPS4/commit/336e0b95c5f60243fee3c625205d1a28b19e5916) resource: Revert some changes - [`74484dc`](https://github.com/shadps4-emu/shadPS4/commit/74484dc768d0e79da0e67387830fae497033ec51) texture_cache: Fix small error - [`52058c9`](https://github.com/shadps4-emu/shadPS4/commit/52058c98f5916b6a178f6ab584cc42acdfda9115) image_info: Set depth flag on depth promote - [`6e0a894`](https://github.com/shadps4-emu/shadPS4/commit/6e0a8946bd11455bc5c4f6464215213e256cc635) buffer_cache: Remove level check - [`7324929`](https://github.com/shadps4-emu/shadPS4/commit/73249293c2ec184af63d19d4ec0a9e00b0b7b74a) tile_manager: Handle case of staging buffer causing flush - [`3a14843`](https://github.com/shadps4-emu/shadPS4/commit/3a148437f6fa6b0562d5368ec12acb6d6c7193ae) image_info: Add 2D thick array mode ### 📊 Changes **30 files changed** (+2001 additions, -799 deletions) <details> <summary>View changed files</summary> 📝 `CMakeLists.txt` (+2 -0) 📝 `src/core/devtools/widget/reg_popup.cpp` (+1 -1) 📝 `src/shader_recompiler/info.h` (+2 -2) 📝 `src/shader_recompiler/specialization.h` (+1 -1) 📝 `src/video_core/amdgpu/liverpool.h` (+14 -6) 📝 `src/video_core/amdgpu/pixel_format.cpp` (+61 -10) 📝 `src/video_core/amdgpu/pixel_format.h` (+7 -2) 📝 `src/video_core/amdgpu/resource.h` (+39 -64) ➕ `src/video_core/amdgpu/tiling.cpp` (+554 -0) ➕ `src/video_core/amdgpu/tiling.h` (+149 -0) 📝 `src/video_core/buffer_cache/buffer_cache.cpp` (+102 -134) 📝 `src/video_core/buffer_cache/buffer_cache.h` (+6 -7) 📝 `src/video_core/host_shaders/CMakeLists.txt` (+1 -0) ➕ `src/video_core/host_shaders/tiling.comp` (+444 -0) 📝 `src/video_core/renderer_vulkan/vk_instance.cpp` (+7 -0) 📝 `src/video_core/renderer_vulkan/vk_instance.h` (+6 -0) 📝 `src/video_core/renderer_vulkan/vk_presenter.h` (+1 -0) 📝 `src/video_core/renderer_vulkan/vk_rasterizer.cpp` (+75 -17) 📝 `src/video_core/renderer_vulkan/vk_rasterizer.h` (+1 -0) 📝 `src/video_core/texture_cache/image.cpp` (+70 -55) _...and 10 more files_ </details> ### 📄 Description This started with the goal of supporting Texture_Macrotiled detiling and turned into a rework of the tiling system ## General changes * Fleshed out the tiling mode enum and added more tiling properties from GB_TILE_MODE such as pipe config, array mode, micro tile mode * MipInfo structure is now consistent in terms of padding. On main the pitch is padded by the ImageSize functions but the height was not, now it also returns the padded height. * Replaced some vulkan types in ImageInfo with appropriate AmdGpu types, helps with decoupling it from the vulkan backend * Properly query tile mode for depth buffer as well, on main all depth buffer image infos had been left with linear tiling * Property queries from image info now consistently use the props member instead of functions that decide on the pixel format ## Tile manager changes * Replaced the numerous detile shaders with a single unified tiling shader based on r800 addrlib, which supports all tile modes of the guest hardware. Most of the parameters of the shader are pre-processor macros to avoid overhead. In the most common case of 1D thin tiling it produces the following code which is basically generating the LUT values at runtime and should not be much slower ```glsl _260.out_data[_402] = _265.in_data[((((_402 / (_404 * _405)) * ((((_240 * _405) * 32u) + 7u) / 8u)) + ((((_229 / 8u) * (_240 / 8u)) + (_223 / 8u)) * 256u)) + (((((((bitfieldExtract(_223, 0, 1) | (bitfieldExtract(_229, 0, 1) << uint(1))) | (bitfieldExtract(_223, 1, 1) << uint(2))) | (bitfieldExtract(_229, 1, 1) << uint(3))) | (bitfieldExtract(_223, 2, 1) << uint(4))) | (bitfieldExtract(_229, 2, 1) << uint(5))) * 32u) / 8u)) / 4u]; ``` * Detilers are now consistent in terms of padding. On main, 1D thin detilers would produce padded output, while volume detilers would produce packed output, because the latter would index the output buffer instead of input (side note, but on main the volume detiler is named macro which seems incorrect from my research, as volume textures are 1D thick not 2D tiled). Padded output was chosen because it is more convenient, as it allows to offset the linear surface using the mip info structures generated during image size calculation and removes the need for address bounds checking in shader (number of texels processed is always multiple of 64) * Support for tiling has been added as well with a simple macro parameter. This is used for macrotile handling (read below) * Textures with bpp < 32 now have better GPU occupancy during detiling as it can use the full subgroup length on AMD GPUs ## Macrotile support changes Since the original goal was this, support for detiling macrotile images is also added and I've verified it works from Knack, which uses Texture_Macrotile image for its fonts. However this isn't as simple because this path is also used by image copies with compute and it implicitly relies on the detiling failing (SynchronizeBufferFromImage copies the linear image data to buffers without any tiling). This is where tile shader comes in, now the buffer cache will tile the image data before copying it to the buffer, so texture cache can allow detiling of macrotile images. This adds some overhead so before merging I will add a fast path that detects image copies and uses vkCmdCopyImage but it's also more accurate to real hardware. Note: This still doesn't handle degrading of macro tile mode to micro tile for mipmaps, but it should be easier to implement now This fixes the font in Knack | main | PR | | ------------- | ------------- | | <img width="1282" height="719" alt="knack_bad" src="https://github.com/user-attachments/assets/86847458-05f7-4d67-8213-5dd4ea3147a7" /> | <img width="1280" height="721" alt="knack_good" src="https://github.com/user-attachments/assets/fb79e91e-b23f-4b41-befc-377cbd845abd" />| TODO - [x] Add fast path for image copies - [x] Hookup bank swizzle to tile manager --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-02-27 22:03:38 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/shadPS4#3424
No description provided.