[PR #2819] [MERGED] video_core: Implement DMA. #3006

Closed
opened 2026-02-27 22:02:05 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/shadps4-emu/shadPS4/pull/2819
Author: @LNDF
Created: 4/20/2025
Status: Merged
Merged: 5/22/2025
Merged by: @georgemoralis

Base: mainHead: hybrid


📝 Commits (10+)

  • 52253b4 Import memory
  • d5e45fb 64K pages and fix memory mapping
  • 83255ee Queue coverage
  • 94a0782 Buffer syncing, faulted readback adn BDA in Buffer
  • c077fb9 Base DMA implementation
  • 9356779 Preparations for implementing SPV DMA access
  • 68a33cd Base impl (pending 16K pages and getbuffersize)
  • 31df795 16K pages and stack overflow fix
  • d89d937 clang-format
  • 20aacec clang-format but for real this time

📊 Changes

40 files changed (+1641 additions, -311 deletions)

View changed files

📝 CMakeLists.txt (+3 -0)
📝 externals/sirit (+1 -1)
src/common/recursive_lock.cpp (+37 -0)
src/common/recursive_lock.h (+67 -0)
📝 src/common/slot_vector.h (+109 -1)
📝 src/shader_recompiler/backend/spirv/emit_spirv.cpp (+9 -2)
📝 src/shader_recompiler/backend/spirv/emit_spirv_atomic.cpp (+3 -3)
📝 src/shader_recompiler/backend/spirv/emit_spirv_context_get_set.cpp (+40 -41)
📝 src/shader_recompiler/backend/spirv/emit_spirv_instructions.h (+1 -1)
📝 src/shader_recompiler/backend/spirv/spirv_emit_context.cpp (+185 -13)
📝 src/shader_recompiler/backend/spirv/spirv_emit_context.h (+106 -11)
📝 src/shader_recompiler/frontend/translate/scalar_memory.cpp (+7 -6)
📝 src/shader_recompiler/info.h (+12 -2)
src/shader_recompiler/ir/abstract_syntax_list.cpp (+44 -0)
📝 src/shader_recompiler/ir/abstract_syntax_list.h (+5 -0)
📝 src/shader_recompiler/ir/passes/shader_info_collection_pass.cpp (+27 -4)
📝 src/shader_recompiler/ir/program.cpp (+30 -4)
📝 src/shader_recompiler/ir/program.h (+1 -1)
📝 src/shader_recompiler/recompiler.cpp (+2 -0)
📝 src/video_core/amdgpu/liverpool.cpp (+1 -0)

...and 20 more files

📄 Description

This implements arbitrary memory access from the GPU to CPU memory.

When mapping memory, tries to import host memory with VK_KHR_external_memory. When a buffer is created in the same address range of an improted region, device local memory is used.

Buffer Device Addresses are used to get pointers to the data in shaders. A BDA pagetable buffer is created on the GPU that holds the BDA for every 16K page.

Additionaly a "fault" buffer is also created which holds a bitfield that tells the CPU if a non GPU local address has been accessed from a shader. This is done by embedding in the LSB of the BDA if a page is cached by the buffer cache. If a page has been accessed by the GPU and is not cached by the buffer cache it will be cached before the next frame.

It may happen that the host memory import fails. (on AMD GPUs on Linux sometimes), in those cases, a fallback value is used and the page bit is set in the "fault" buffer.

Problems:

  • I am not able to make the "fault" buffer work. Either the shader is not writing or the CPU side is not reading.
  • Seems wrong to do scheduler.Finish(); every time I download the "fault" buffer. in BufferCache::CreateFaultBuffers, If there is a better way to sync I can implement it.
  • @raphaelthegreat, you told me in the other PR that I need 64K pages to do the BDA LSB optimization. I can't use 64K pages because we could accidetly try to track non-GPU memory. Hpe 16K is fine.
  • Can't be merged until https://github.com/shadps4-emu/sirit/pull/11 is merged. Currently Sirit is pointing to my fork.
  • I am not able to do a lot of testing right now, so this needs testing.
  • Probably something I forgot to mention.

This is currently used for ReadConst, and work like this:

  1. First, we try to access the memory if it exists in device local memory
  2. If it fails (will only fail the first frame), we try to access host memory.
  3. If it fails, we fallback to the old flatbuf method.
  4. If it fails (the offset is synamic), we return zero.

This can also potentialy be used in GetBufferSize.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/shadps4-emu/shadPS4/pull/2819 **Author:** [@LNDF](https://github.com/LNDF) **Created:** 4/20/2025 **Status:** ✅ Merged **Merged:** 5/22/2025 **Merged by:** [@georgemoralis](https://github.com/georgemoralis) **Base:** `main` ← **Head:** `hybrid` --- ### 📝 Commits (10+) - [`52253b4`](https://github.com/shadps4-emu/shadPS4/commit/52253b45fbf7a7b3eedffd55a026a5c4f3264387) Import memory - [`d5e45fb`](https://github.com/shadps4-emu/shadPS4/commit/d5e45fb4923f32e22b0be57a2bf7c822e1744a58) 64K pages and fix memory mapping - [`83255ee`](https://github.com/shadps4-emu/shadPS4/commit/83255ee68f74a2dc6b64c99529434044d220d413) Queue coverage - [`94a0782`](https://github.com/shadps4-emu/shadPS4/commit/94a078207fec06881ff8d7dbf90f76b13ea045b6) Buffer syncing, faulted readback adn BDA in Buffer - [`c077fb9`](https://github.com/shadps4-emu/shadPS4/commit/c077fb97da133529ca164558758b06d2c02a81b8) Base DMA implementation - [`9356779`](https://github.com/shadps4-emu/shadPS4/commit/9356779bb38455a752cce6d444aff0209203cbc7) Preparations for implementing SPV DMA access - [`68a33cd`](https://github.com/shadps4-emu/shadPS4/commit/68a33cd38cc2ed45a544601c952fe7ea38adf08e) Base impl (pending 16K pages and getbuffersize) - [`31df795`](https://github.com/shadps4-emu/shadPS4/commit/31df79570138f360f92e6ef4a4db01be6923bc35) 16K pages and stack overflow fix - [`d89d937`](https://github.com/shadps4-emu/shadPS4/commit/d89d937501fed5f2992c53f6aa01148a6a0cf287) clang-format - [`20aacec`](https://github.com/shadps4-emu/shadPS4/commit/20aacec41f342ddc47c689621917fddc0d375a64) clang-format but for real this time ### 📊 Changes **40 files changed** (+1641 additions, -311 deletions) <details> <summary>View changed files</summary> 📝 `CMakeLists.txt` (+3 -0) 📝 `externals/sirit` (+1 -1) ➕ `src/common/recursive_lock.cpp` (+37 -0) ➕ `src/common/recursive_lock.h` (+67 -0) 📝 `src/common/slot_vector.h` (+109 -1) 📝 `src/shader_recompiler/backend/spirv/emit_spirv.cpp` (+9 -2) 📝 `src/shader_recompiler/backend/spirv/emit_spirv_atomic.cpp` (+3 -3) 📝 `src/shader_recompiler/backend/spirv/emit_spirv_context_get_set.cpp` (+40 -41) 📝 `src/shader_recompiler/backend/spirv/emit_spirv_instructions.h` (+1 -1) 📝 `src/shader_recompiler/backend/spirv/spirv_emit_context.cpp` (+185 -13) 📝 `src/shader_recompiler/backend/spirv/spirv_emit_context.h` (+106 -11) 📝 `src/shader_recompiler/frontend/translate/scalar_memory.cpp` (+7 -6) 📝 `src/shader_recompiler/info.h` (+12 -2) ➕ `src/shader_recompiler/ir/abstract_syntax_list.cpp` (+44 -0) 📝 `src/shader_recompiler/ir/abstract_syntax_list.h` (+5 -0) 📝 `src/shader_recompiler/ir/passes/shader_info_collection_pass.cpp` (+27 -4) 📝 `src/shader_recompiler/ir/program.cpp` (+30 -4) 📝 `src/shader_recompiler/ir/program.h` (+1 -1) 📝 `src/shader_recompiler/recompiler.cpp` (+2 -0) 📝 `src/video_core/amdgpu/liverpool.cpp` (+1 -0) _...and 20 more files_ </details> ### 📄 Description This implements arbitrary memory access from the GPU to CPU memory. When mapping memory, tries to import host memory with VK_KHR_external_memory. When a buffer is created in the same address range of an improted region, device local memory is used. Buffer Device Addresses are used to get pointers to the data in shaders. A BDA pagetable buffer is created on the GPU that holds the BDA for every 16K page. Additionaly a "fault" buffer is also created which holds a bitfield that tells the CPU if a non GPU local address has been accessed from a shader. This is done by embedding in the LSB of the BDA if a page is cached by the buffer cache. If a page has been accessed by the GPU and is not cached by the buffer cache it will be cached before the next frame. It may happen that the host memory import fails. (on AMD GPUs on Linux sometimes), in those cases, a fallback value is used and the page bit is set in the "fault" buffer. Problems: * I am not able to make the "fault" buffer work. Either the shader is not writing or the CPU side is not reading. * Seems wrong to do `scheduler.Finish();` every time I download the "fault" buffer. in `BufferCache::CreateFaultBuffers`, If there is a better way to sync I can implement it. * @raphaelthegreat, you told me in the other PR that I need 64K pages to do the BDA LSB optimization. I can't use 64K pages because we could accidetly try to track non-GPU memory. Hpe 16K is fine. * Can't be merged until https://github.com/shadps4-emu/sirit/pull/11 is merged. Currently Sirit is pointing to my fork. * I am not able to do a lot of testing right now, so this needs testing. * Probably something I forgot to mention. This is currently used for ReadConst, and work like this: 1. First, we try to access the memory if it exists in device local memory 2. If it fails (will only fail the first frame), we try to access host memory. 3. If it fails, we fallback to the old flatbuf method. 4. If it fails (the offset is synamic), we return zero. This can also potentialy be used in GetBufferSize. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-02-27 22:02:05 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/shadPS4#3006
No description provided.